Biometrics 70, 398–408 June 2014

DOI: 10.1111/biom.12135

Calibration Using Constrained Smoothing with Applications to Mass Spectrometry Data Xingdong Feng,1, * Nell Sedransk,2 and Jessie Q. Xia3 1

School of Statistics and Management, Shanghai University of Finance and Economics Key Laboratory of Mathematical Economics (SUFE), Ministry of Education, Shanghai 200433, China 2 National Institute of Statistical Sciences, Research Triangle Park, North Carolina 27709, U.S.A. 3 Verisk Innovative Analytics, San Francisco, California 94111, U.S.A. ∗ email: [email protected]

Summary. Linear regressions are commonly used to calibrate the signal measurements in proteomic analysis by mass spectrometry. However, with or without a monotone (e.g., log) transformation, data from such functional proteomic experiments are not necessarily linear or even monotone functions of protein (or peptide) concentration except over a very restricted range. A computationally efficient spline procedure improves upon linear regression. However, mass spectrometry data are not necessarily homoscedastic; more often the variation of measured concentrations increases disproportionately near the boundaries of the instruments measurement capability (dynamic range), that is, the upper and lower limits of quantitation. These calibration difficulties exist with other applications of mass spectrometry as well as with other broad-scale calibrations. Therefore the method proposed here uses a functional data approach to define the calibration curve and also the limits of quantitation under the two assumptions: (i) that the variance is a bounded, convex function of concentration; and (ii) that the calibration curve itself is monotone at least between the limits of quantitation, but not necessarily outside these limits. Within this paradigm, the limit of detection, where the signal is definitely present but not measurable with any accuracy, is also defined. An iterative approach draws on existing smoothing methods to account simultaneously for both restrictions and is shown to achieve the global optimal convergence rate under weak conditions. This approach can also be implemented when convexity is replaced by other (bounded) restrictions. Examples from Addona et al. (2009, Nature Biotechnology 27, 663–641) both motivate and illustrate the effectiveness of this functional data methodology when compared with the simpler linear regressions and spline techniques. Key words:

Functional data; LoD and LoQ; Piecewise smoothing; Proteomics; Regression splines; Shape restrictions.

1. Introduction Theoretically, protein abundance in biologic fluids and/or tissues, detected through the proteins signature peptides (primary protein components) is a natural choice for biomarkers to diagnose or to monitor many disease states. The objective is either to detect the presence of specific aberrant proteins or to identify abnormal patterns of change in the relative abundances of normal proteins in order to distinguish disease from normal states and/or to monitor changes indicative of disease progression or remission. The usefulness of proteomic, or indeed any, biomarkers depends critically on the sensitivity and specificity of the measurement system and on the precision and reproducibility of measurements. While various peptides have been identified as potential biomarkers by research conducted at many respected laboratories, subsequent independent studies have failed to reproduce many early published results. Consequently, attention turned to the technology and the measurement process itself, in particular to a liquid chromatography-mass spectrometry (LC-MS) system and to the determination of reproducibility and potential for successful calibration of LC-MS systems. Conventional calibration in the LC-MS content has depended heavily on regression, primarily linear regression, to relate the output (observed measurements) to the input (known concentrations

398

or gold standard measurements). Complexities arise due to unavoidable background noise, heteroscedasticity, and nonlinearity (with or without a monotonic transformation). For a calibration curve to be useful, observed measurements must meet three requirements: accuracy (lying along the calibration curve), specificity (lying outside the presumptive range of background noise), precision (pointwise criterion for variability). In the case of proteomics biomarkers, each of the complexities cited above is commonly present. Methods for linear calibration methods in common practice are reviewed in Danzer and Currie (1998). These methods can work well enough when the level of background noise is low compare to the desired sensitivity for a biomarker and when the assumptions of linearity plus additional assumptions about measurement distributions and variance are met. A (linear) spline approach that is computationally efficient for this special case is discussed in Section 5. However the nonlinearity that is typical across a wide range of concentrations distorts the slope of the linear regression, and the heteroscedasticity of measurements is rarely as simple as directly proportional to concentration. In addition sensitivity requirements for a diagnostic biomarker usually necessitate confident measurements close to the background noise level; and the background component may create both an estimation problem and an inherent nonlinearity. The © 2014, The International Biometric Society

Calibration under Constraints goal of this article is to estimate the calibration curves under a monotonicity restriction, using a functional data approach that requires only convexity and boundedness of the variance function. This approach performs better than linear regression or B-spline methods in fitting LC-MS data and in leading to definition of limits on the range of accurate measurement and on sensitivity to trace presence of target peptides. Examples are taken from a very extensive proteomics study by Addona et al. (2009) in Section 6. Theoretical results and proofs for consistency of the method are deferred to the Web-based Supplementary Materials. 2.

Mass Spectrometry Data

2.1. Requirements for Calibration The calibration problem for proteomics shares with calibration in many other applications the goal of defining the (monotonic) functional relationship that links the known values (intensities) for standard samples and the measurements recorded by the instrument, in this case intensities measured by the mass spectrometer. The measurement capability of an instrument or measurement system is the range of actual intensities for which measurements meet a (specified) standard for precision as well as accuracy with a monotone relationship between actual value and measurement. The limits of this range are the lower limit of quantitation (denoted by LLoQ) and the upper limit of quantitation (ULoQ). These limits are determined either by failure of monotonicity or by explosion of variance or by both, thus making measurement outside the range of measurement capability unreliable. In the case of some biomarkers, where any presence of the biomarker regardless of level constitutes a positive response, even nonmeasurable amounts are diagnostic. In these cases the limit of detection (LoD, i.e., trace level of minimally detectable above background) that is most often below the range of measurement capability is equally important for diagnosis. Specific complicating issues when dealing with protein biomarkers include the heterogeneity of variance especially at the limits of quantitation, the potential for instrument failure especially at high intensities, the need to analyze multiple proteins simultaneously in each sample and the inescapable background of irrelevant materials including exogenous proteins present in biosamples. 2.2. Measurement System The measurement system is complicated and highly automated (Karpievitch et al., 2010), but the essential features of an LC-MS system and its chief sources of variation are given here. The biosample is prepared and processed in liquid form according to a carefully controlled protocol (standard operating procedure or SOP) to digest whatever proteins are present into their critical components or peptides. This liquid mixture then flows through an LC column that serves to sort particles according to their mass so that the time (retention time) it takes a particle to flow out from the column corresponds to its mass or molecular weight. A multiple reaction monitoring (MRM) experiment is designed to obtain maximum sensitivity for detection and quantification of multiple pre-specified target compounds simultaneously. An MRM is unlike shotgun proteomics experiments that seek to discover

399

new candidate biomarkers. In an MRM, since the molecular weights are known for the biomarker peptides of interest their retention times can be predicted in advance. A time window around each of the predicted retention times is determined to ensure that the peptides of interest will be captured within those windows even if there is a time drift or operational variation in the preprocessing or in the LC column or the MS instrument itself. The MS is programmed to operate only during those retention time windows. Then thousands of MS scans are carried out for each sample to yield a single mass spectrum with several peaks (for subtypes of the peptide referred to as transitions) associated with each peptide plus peaks for irrelevant peptides or any other particles whose retention times fall into those time windows. The log of the area of each peak is the measure of intensity; usually a fixed number of peaks are measured. The most reliable peaks for quantitation are those that do not overlap other peaks, especially irrelevant peaks; most often the largest peaks can be measured most precisely. 2.3. Data Characteristics Peptide intensity measurements are subject to several sources of variability from the interaction of the peptides with the measurement process as well as from endogenous proteins present in the biosample. The differing chemistries of individual peptides can result in different responses to the chemical digestion, different rates of degradation, etc., as well as competition with different irrelevant peptides and particles that share their particular retention time windows. In consequence, calibration curves can differ among peptides. Laboratory effects due to the preprocessing of the sample and also due to the settings on the MS itself yield differing ranges of capability (i.e., LoD, LLoQ, ULoQ) and differing calibration curves. Across a broad range of dilutions, the variance of intensity measurements is heteroscedastic, most commonly with high variances at very low concentrations due to the comparative prominence of background noise and at high concentrations as a function of scale. Also, at very high concentrations an erratic clogging of the flow through the system can cause a violation of monotonicity. Four plots drawn from the publicly available MRM data set of Addona et al. exhibit these phenomena as shown in Figure 1. The left and right columns respectively are plots for the results from two laboratories (54a and 52a); the top and bottom rows show results for peptides 202 and 169. The points on each graph are the measurements of intensity (log peak area) for each of the four technical replicates at nine concentrations (B through J) plus the blank sample (A1). The two laboratories differ in their precision of measurement; but at each laboratory measurements are heteroscedastic with greater variation between technical replicates at the low (A1, B, C) and high (H, I, J) ends of the concentration range. Comparing peptides, the relationship of measured intensity to concentration is steeper at both laboratories for peptide 169 than for peptide 202. Nonlinearities for the two peptides also differ, with strict monotonicity beginning at lower concentrations for peptide 169 than for peptide 202. At laboratory 54a, monotonicity also breaks down altogether at high concentration (I, J) of peptide 202; this breakdown of monotonicity indicates that for laboratory 54a the ULoQ must be below concentration I. Consequently the range of concentrations for which

400

Biometrics, June 2014

Figure 1. Circles indicate the concentration, log peak area measurement pairs (x, y) for two peptides (202-transition 1 and 169-transition 3) while 52a and 54a identify two different laboratories, both using 4000 QTRAP MS platforms. the laboratory 54a MS is calibrated lies somewhere between concentrations. 2.4. Experiment Design Determining reliability requires calibration across a wide range of intensities for each of the multiple test proteins. Determining reproducibility requires independent testing on multiple measurement systems in multiple laboratories. The basic design is hierarchical since proteins are not measured directly but are cleaved into their component peptides that are each measured by their several transitions (subtypes): technical replicates within transitions; transitions within peptides; peptides within proteins. All laboratories measure all proteins at all concentrations (intensities), but each laboratory chooses which transitions to monitor for each peptide. Calibration functions and the limits of quantitation and of detection are expected to differ among proteins; and the calibration functions and the limits of quantitation and of detection for any given protein may differ among laboratories. Note that intensity is defined in terms of a log transformation, both of known concentrations in a test mixture and of measured peak area. Also for blank samples, concentration that is expected to be zero is replaced by a small positive number before transformation. The real data analyzed here was taken from the third and most rigorous of the set of three linked studies reported in Addona et al. A test mixture of 11 peptides from 7 target

proteins was prepared and diluted to provide samples at 9 concentrations (denoted B through J) approximately equally spaced on a log scale, plus a blank (plasma only, denoted A1). Stable isotope internal standards were prepared as well for each of the proteins/peptides. Measurements were made for four technical replicates, and peak areas were quantified for three transitions for each peptide. The study was conducted in parallel at eight laboratories, all except one using 4000ATRAP MS platforms and processing the test samples with the same rigorous SOP. 3.

Method

3.1. Intensity Model A hierarchical model for data from a single site is introduced as follows, yijkct = b(xc ) + ri (xc ) + pij (xc ) + qijk (xc ) + εijkct ,

(1)

where the variables x and y represent the log concentration level of a certain protein (and hence of its peptides) and the log peak area from MRM, respectively; the variable εijkct is symmetrically distributed with mean 0 and variance σijk (xc ), where σijk is assumed to be a convex function of x; i = 1, . . . , I, refers to proteins; j = 1, . . . , Ji refers to peptides, which are nested in each protein; k = 1, . . . , K, refers to transitions,

Calibration under Constraints which are nested in each peptide; c = 1, . . . , C, and t = 1, . . . , T refer to concentration levels and technical replicates, respectively. In this model, the function b(·) used to describe the basic trend relating log peak area to log concentration level is assumed to be monotone. The function ri (·) models the deviation from the basic function that is specific to the ith protein; while the function pij (·) represents the effect associated with the jth peptide of the ith protein, and the function qijk (·) gives the transition level effect within the peptide. Further, fi = b + ri , gij = b + ri + pij , and hijk = b + ri + pij + qijk are assumed to be monotone. The  usual assumptions are made for  the purpose of identifiability: i ri = 0, j pij = 0 for each i,  and q = 0 for each pair of i and j at all concentrations. k ijk 3.2. Estimation Procedure The functions in Model (1) can be estimated sequentially following their hierarchy, beginning with the transition level functions (i.e., hijk ). The convergence rate of the estimating method is deferred to the Supplementary Materials. To simplify notation in illustrating the general method with the data (xl , yl ), l = 1, . . . , n, the following model is considered, yl = m(xl ) + l ,

(2)

where l is symmetrically distributed with mean 0 and variance v(xl ), and m(·) and v(·) are functions satisfying monotonicity and convexity, respectively. A rich literature on smoothing data with certain restrictions and on smoothing tools is relevant to the analysis of Model (2). Brunk (1955) considers monotonicity in regression problem with the maximum likelihood method. Later, by combining local averaging and isotonic regression, Friedman and Tibshirani (1984) incorporate monotonicity in smoothing data. Asymptotic properties are discussed by Mammen (1991), although the performance of these methods in actual practice is not clear. Smoothing splines are also considered for data with a monotonicity restriction (Utreras, 1985; Villalobos and Wahba, 1987). Hall and Huang (2001) propose a method to impose monotonicity on kernel estimates. Ramsay (1988) develops I splines by integrating B splines with positive coefficients to guarantee monotonicity, and He and Shi (1998) use B splines based on L1 optimization, which is more robust against outliers. More recently, Bayesian methods for isotonic regression are studied by Holmes and Heard (2003) and Neelon and Dunson (2004), and for the case of regression splines by Shively, Sager, and Walker (2009). These methods place a specific restriction on the curve shape; application to Model (2) requires an additional restriction on the variation pattern of the measurements. To account for both restrictions, the following iterative procedure is proposed, for which convergence to the true functions m and v is guaranteed. (Theoretical results and all proofs appear in the Supplementary Material.) (S1) Smooth the data (xl , yl ) with the regression splines method given the monotonicity restriction, and deˆ (1) ; note the estimated smoothing function as m

401 



ˆ (s−1) (xl )]2 (S2) at the sth step, smooth the pairs xl , [yl − m with the regression splines method given the convexity restriction, and denote the estimated smoothing function as ˆ v(s) ; (S3) smooth the data (xl , yl ) using the same method as step (1) with the weights {[ˆ v(s) (xl )]−1/2 }, and denote ˆ (s) ; the estimated smoothing function as m (S4) repeat steps (S2)–(S3) until the chosen stopping rules are satisfied.

Remark 1. It is possible that the nonnegative property of variance curve is violated in step (S2), so a small positive value will be used as the threshold, denoted as cn , for calculating weights in step (S3). Theoretically, the threshold cn needs to converge to 0 as n → ∞. With the cutoff imposed, none of the weights are less than cn1/2 , and the convex property of variance curves still holds. to  The methods  smooth the data (xl , yl ) and the pairs (s) 2

ˆ (xl )] xl , [yl − m can be any regression splines methods that achieve the optimal rates (Stone, 1982) given the corresponding restrictions. This article uses the method by He and Shi (1998) to smooth the original data (xl , yl ) with the monotonicity restriction because their method has demonstrated robust performance against outliers,  and the(s)method  ˆ (xl )]2 by Meyer (2008) to smooth the pairs xl , [yl − m given the convexity restriction, which is impressively efficient. Within each iteration, the knots automatically selected by the method of He and Shi in (S1) will be used as the knots in (S2). With these methods, the global optimal rates for nonparametric regression can be reached by the iterations for estimating both functions m and v. With this method, each transition level function hijk can be ˆijk , estimated, where the estimated functions are denoted as h for i = 1, . . . , I, j = 1, . . . , Ji , and k = 1, . . . , K, respectively. ˆij = Next, the higher level functions can be estimated by g  ˆ  ˆ −1 ˆ K−1 k h h , respectively. When the ijk and fi = (Ji K) ijk j,k number of peptides and transitions are fixed, if the estimated ˆijk can achieve the optimal rate, then the estimates function h ˆ ˆij and fi also attain the optimal rate. Note that only when g the behaviors of the proteins are known to be similar, is the estimation of function b useful or interpretable. In practice, this is not always the case.

3.3. Other Cases There are cases where different restrictions on the variation of the measured concentrations are more appropriate than the convexity restriction: for example, when the variation of the measured intensities is known to increase linearly with respect to the true concentration. A similar algorithm as described in the previous sub-section can still be used by smoothing the  ˆ l )]2 under the alternative restriction. pairs of xl , [yl − m(x The iterative method is quite flexible and permits different combinations. The key to this method is to select the appropriate smoothing methods so that the iterations lead to the solutions satisfying these restrictions with optimal convergence rate for nonparametric smoothing.

402

Biometrics, June 2014

3.4. Assessment of Variation In this section, the bootstrap method of Rao and Zhao (1992) is used to assess the variation of the estimated functions from the iterated method of Section 3.2 to account for the heterogeneity present in Model 1. Accordingly, pointwise confidence intervals for the true functions m and v are constructed based on the bootstrap method. Zhou, Shen, and Wolfe (1998) established the local asymptotics for regression splines, and the pointwise confidence intervals can be constructed based on the asymptotic results. However, when the model errors are heterogeneous and the weights are introduced as described in the previous section, their asymptotic variances may not be achieved even approximately. The bootstrap is a suitable alternative method. In nonparametric smoothing based on the resamples, the same knots are utilized as those selected by the last iteration of the nonparametric estimating procedure of Section 3.2 with the original data. Random weights for the individual losses are generated from the exponential distribution exp(1), which satisfies the requirements of producing random weights of Rao and Zhao (1992). Mathematically, the following objective functions are minimized respectively in the bootstrapped samples for assessing the variation of the estimates of m and v:

⎧ T ⎨ minα n w(b) |yi − a(xi ) α| , i=1 i (∗b) 1/2 [ˆ v (xi )]  n ⎩ (b) (∗b) minδ

w i=1 i

ˆ [yi − m

2

(3)

(xi )]2 − d(xi )T δ ,

where a and d are the B spline basis used by the nonpara(b) metric smoothing procedures, wi is independently sampled ˆ (∗b) and ˆ from the distribution exp(1), and the curves m v(∗b) are iteratively estimated as described in Section 3.2 based on the bth resample. After the estimates of m and v from the bootstrap procedure are obtained, the variances of the estimates from the bootstrapped samples given the x values can be estimated to construct the pointwise symmetric confidence ˆ and ˆ intervals for m v at the level of α, respectively, by





ˆ ˆ m(x) ± zαˆs{m(x)}

and





ˆ v(x) ± zαˆs{ˆ v(x)} ,

where zα is the Z score at the level of α, and ˆs represents the estimated standard error. 4.

Calibration Limits Based on the First Derivatives of Response Functions With the model and the method proposed in Section 3, the response functions can be estimated at transition, peptide and protein levels. In this section, some possible methods evaluating LoD, LLoQ, and ULoQ are discussed with the estimated response functions, and a case study is used to illustrate the method and its application. Currie (1968) defined the LoD as the concentration level at which the equipments can detect the signals that are significantly greater than 0, and the LLoQ as the concentration level that is significantly greater than the LoD under the linear regression model and Gaussian error assumptions. These con-

cepts are widely used by biochemists. However, it is possible that neither linearity between true and observed concentration pairs nor stable variance above this LLoQ value hold for the observed data. An alternative is developed to define LoD and LLoQ in the nonlinear case. In real mass spectrometry data it is common that the fitted calibration curve is almost flat for very low concentrations, and this is commonly attributed to the detection ability (or inability) of the systems. Hence, the LoD can be defined as that concentration level, above which the derivative of the calibration curve is significantly greater than 0. In this article, the bootstrap discussed in Section 3.4 is used to obtain the pointwise confidence band for the derivative of calibration curves. From this band, it is straightforward to determine the concentration level where the corresponding lower bound of the confidence interval intersects 0, and this level is the LoD. LLoQ and ULoQ specify a range of concentration, within which a satisfactory accuracy of measurements is reached. Since the coefficient of variation (cv) is widely used by biochemists, it is natural to use this statistic to define LLoQ and ULoQ. Note that stability of the first derivatives of response functions also reflects stability of measurements, so cv of the derivative curve is used as follows. Determine the range for the true concentration x, denoted as (xLLoQ , xULoQ ), such that cv ≤ u∗ , where u∗ is a threshold selected by experimenters. The value xLLoQ and xULoQ are defined as LLoQ and ULoQ values, respectively. Usually, LLoQ should be carefully chosen to be greater than LoD. The threshold on cv is subjective and reflects the scientists own standards for acceptable quality.

5.

Calibration Limits for the Special Case of Linear Splines Either the well known physics of a system or extensive experience with a system, even with a particular platform, sometimes allows restriction of the calibration curve to monotonically piecewise linear coupled with an assumption that the variance is a convex function of concentration. In this case a less computationally intensive method for determining the calibration limits can be defined. Consider the determination of LoD and LLoQ when the calibration curve consists of a linear–linear spline under monotonicity restriction. Two cases are possible: either the blank samples provide an adequate (accurate and precise) baseline measurement, or the blank samples are insufficient in number or are overly variable to serve as a baseline. Assume first that the blank samples are adequate. Then the LoD is taken to be the concentration at which the pointwise prediction (log) peak area equals the (1 − α)% quantile for the blank (log) peak area. With an assumption of Gaussian error, at the LoD presence of the peptide is expected to be detected 50% of the time. This conforms to a common definition for LoD. For LLoQ, the requirement sets the (1 − α)% quantile for the blank (log) peak area equal to that of the distribution of (log) peak area at the LLoQ concentration. In Figure 4, the solid black line, y = 7.8 is the mean for the blank samples; and the dotted lines, y = 6.7 and y = 8.95 are the α% and (1 − α)% quantiles, respectively. The linear spline calibration curve is the solid line with dashed lines showing the α% and (1 − α)% quantiles; and the LoD and LLoQ are defined by the

Calibration under Constraints intersections of the upper quantile for the blank samples with the mean curve and with the lower quantile limit, respectively, for the calibration spline. In the absence of adequate blank samples, the comparison of log peak area at any particular concentration level to a baseline (zero concentration) log peak area becomes impossible. One common solution is to pool whatever blank samples are available with samples at the lowest one or more concentration on the assumption that these low concentration samples will behave like the blanks. Since a major difficulty is often the precision at low concentrations and/or for blank samples, an alternative heuristic is possible in the case of linear splines. If noise dominates signal at low concentrations, the initial segment of a linear spline can be expected to be nearly flat, with variance stabilized by fitting the spline rather than computing it pointwise. Therefore to provide a reliable, if conservative, baseline, the predicted value at the midpoint of the first spline segment and the quantiles at that point are substituted for the mean of the blank samples and its associated quantiles. ˆ and the variThe procedure based on the mean curve m ance curve ˆ v obtained under the monotonicity and convexity restrictions, respectively, as described in Section 3.2, is as follows: ˆ to obtain (i) add and minus Zα ˆ v1/2 to the mean curve m upper and lower limits, where Zα is z score under significance level α; (ii) calculate the upper limits of blank samples by using ˆ + Zα Vˆ 1/2 , where M ˆ and Vˆ are the sample mean and M variance of blank samples, respectively; (iii) let y˜1 be the minimum of the (1 − α) quantiles of (log) peak area for the blank samples and for the midpoint of the first segment, and use xLoD and xLLoQ as the LoD and the LLoQ values, respectively, ˆ LoD ), xLLoQ satisfies y˜1 = where xLoD satisfies y˜1 = m(x ˆ LLoQ ) − Zα {ˆ m(x v(xLLoQ )}1/2 . The ULoQ is defined analogously as xULoQ , where y˜2 = ˆ ULoQ ) + Zα {ˆ v(xULoQ )}1/2 , and y˜2 is the lower limit of (log) m(x peak area at the midpoint of the last spline segment. These limits are defined at transition levels, so the interactions of these limits can be treated as the limits at higher levels. 6. Case Study In this section the data from the first experiment of the third study of Addona et al. (2009) are used to illustrate two methods of statistical analysis: the constrained splines method of Section 3.2 and the B-splines of He and Shi (1998). The same labeling system for laboratories, proteins, peptides and transitions is taken from Addona et al. As described in Section 3.2 intensity is measured as log peak area, the nine true concentration levels (B–J) are roughly equally spaced in the logarithm scale and A1 labels a blank sample. Four technical replicates were run. Seven laboratories used the same model mass spectrometer. Log transformations were applied to peak areas and to true concentrations before smoothing the data, and a large negative number is used to represent negative infinity for practical computation.

403

In Figure 2 the scatter plot from laboratory 54a is smoothed by both methods. The sudden drop in the measurements at the highest concentrations, levels I and J, at this laboratory are attributable to technical issues with the instrument. Note that the right hand panel in Figure 2 graphs squared deviations from fit and therefore incorporates both pointwise variance and squared bias. Recall that either excessive variation or non-monotonicity indicates that the true concentration is outside the measurement capability of the instrument, that is, outside the interval [LLoQ, ULoQ]. The calibration method in Section 3.2 tolerates quite well the effects of observations away from the monotone trend in the sense that there is little distortion of the curve within the interval [LLoQ, ULoQ]. By contrast this deviation from monotonicity seriously affects estimated curve using the B-spline method. Extensive simulation not reported here verifies this observation. Since the B-spline method is outperformed it is not considered further. The methods of Sections 4 and 5 also can be used to define calibration limits. For the method of Section 4, The R package “cobs” is used to smooth the original data pairs, and the residual pairs are smoothed with the method of Meyer (2008). 100 bootstrap samples are generated to obtain the symmetric 99% pointwise confidence interval based on the normal approximation approach, and the cv curve is also calculated accordingly. The LoD value is located between concentration levels B and C, and LLoQ is 50% higher than LoD by using a threshold value of 0.25 for the cv curve. See Figure 3 for details. For the special case described in Section 5, the R package “cobs” is used to estimate the mean and the variance curves under the monotonicity and the convexity restrictions, respectively. Linear splines are used to smooth data, and the estimated variance curve is used to construct symmetric 1% and 99% pointwise quantile bands. Figure 4 illustrates the calibration limits, determined by intersection of the estimated 99% quantile of blank samples and either the fitted splines or their lower limits, respectively; so they are actually nonparametric analogues of LoD, LLoQ, and ULoQ as defined by Currie (1968) under a parametric model assumption. The second method leads to tighter limits than those obtained with the first method, when the specifications for the special case (i.e., known linearities) are met. As discussed in Section 3.2, after the response function for each transition is estimated, the response function for each peptide can be estimated by averaging the response functions of the corresponding transitions. The fitted functions for the laboratory 52a are plotted in Figure 5. Note that the behaviors of the individual peptides are clearly different, which implies that linear fits may not be good. For example, the peptide 202 is quite flat until the concentration level reaches D; and the peptides 171 and 37 accumulate quickly before the level F; but they are saturated at high concentrations. Similar averaging of the response functions from the peptides estimates the protein-level functions. Figure 6 shows that some properties are specific to particular proteins because these are present across the laboratories. For example, C-reactive protein (labeled CRP), which is composed of the peptides 231, 202, and ni01, responds minimally to increasing concentration until the level reaches D; and this is true at all the laboratories except 86a. The method of Cuesta-Albertos and Febrero-Bande (2010) for one-way ANOVA is used to

404

Biometrics, June 2014

C

D

E

F

Concentration level (x)

G

H

I

J

A1

B

C

D

E

F

ULoQ

B

LLoQ

A1

LoD

0.0

−1.0

0.2

−0.5

0.0

0.4

0.5

0.6

1.0

0.8

1.0

1.5

ˆ l )]2 ) in the right. Weights used in step (S3) of Figure 2. Circles indicate the (xl , yl ) pairs in the left panel and (xl , [yl − m(x Section 3.2 are based on dashed lines in the left panel.

G

H

I

J

Concentration level (x)

Figure 3. The left panel contains the estimated derivative curve and its pointwise confidence interval in dashed lines. The right panel contains the cv curve corresponding to the derivative curve.

Calibration under Constraints

405

Figure 4. The left panel contains the estimated mean curves in solid lines and their 1% and 99% pointwise quantile bands in dashed lines; the flat lines refer to the sample mean and the estimated 1% and 99% quantiles for blank samples. The right panel contains the estimated variance curve. test the significance of laboratory as a factor. (The function “anova.RPm” of the R package “fda.usc” is used for this ANOVA with 30 random projections and 1000 Bootstrap samples used to adjust the p-value.) Indeed, laboratory is a significant factor, with adjusted p-value less than 0.001, which is consistent with Figure 6. In calibration curves, the shape is one key focus; to examine this, the shift between curves should be ignored when determining a distance measure between curves. In the second ANOVA, the distance metric between curves is:

 ||f − g|| = min d

b

{f (x) − g(x) − d}2 dx, a

where f and g are functions defined on the interval [a, b]; and b d = 2 a {f (x) − g(x)} dx/(b − a). After shifting each curve such that the integral is 0, the same function of the R package “fda.usc” is used to carry out the ANOVA with the same settings as the first ANOVA. The bootstrap adjusted p-value is still less than 0.001. This indicates that reproducibility across laboratories is questionable for these proteins even under the specified SOP for this experiment. For example the results from laborotary 73 are quite different from other laboratories based on Figure 6. The analysis should draw sufficient attention from biochemists, and an improved, more stringent SOP is needed to improve the reproducibility. This case study documents the inadequacy of linear fits for these mass spectrometry data across the dilution range studied, especially at concentrations near LoD and below LLoQ. Based on this study, the weighted method should be preferred because of the heterogeneity present. Note also that outliers are not unusual, increasing the importance of robustness of

the method and requiring careful evaluation before determining the limits of detection and of quantitation. 7. Discussion The hierarchical functional model developed here allows analysis of the mass spectrometry intensity data at each level of data aggregation: from transition to peptide to protein. This iterative smoothing method for estimating the functional relationship between the measured and the true concentrations (with reasonable choice of smoothing algorithm) achieves the global optimal rate of nonparametric regression by considering the restrictions on both mean and variance curves. Furthermore, this iterative method is flexible in combining different existing methods to incorporate different shape restrictions as may be suited to each individual application. In the hierarchical model, peptides (and proteins) are treated as fixed effects. The justification for treating transitions as fixed is that for this calibration each laboratory determined which transitions to measure for each specific instrument based on preliminary work that looks at technical aspects (e.g., peak shape, potential overlap among peaks, etc.). The curves produced are quite robust against outliers, confirming the impression from looking at all the curves produces for the data analyzed here. This property is partly due to the monotonicity restriction, either global or piecewise. An extensive simulation reported in the Web-based Supplementary Materials, based on mass spectrometry data acquired under experimental study conditions (with specified SOP), was conducted to investigate the performance and in particular the robustness of calibration curve methods described earlier. The data were deconstructed into the fitted curves, the residuals and the outliers. Then each of the three pools was repeatedly sampled and reassembled randomly without express attention

406

Biometrics, June 2014

C

D

E

F

G

H

I

J

B

C

D

E

F

G

H

I

J

D

E

F

G

H

I

J

A1

B

C

D

E

F

G

H

I

171

173

202

D

E

F

G

H

I

J

A1

B

C

D

E

F

G

H

I

J

6

Log peak Area (y) 10 12 14 8

16 6

6

Log peak Area (y) 8 10 12 14

Log peak Area (y) 8 10 12 14

16

C

A1

B

C

D

E

F

G

H

I

Concentration level (x)

Concentration level (x)

231

37

ni01

D

E

F

G

H

I

J

A1

B

C

D

E

F

G

H

I

J

16

Concentration level (x)

6

6 C

J

Log peak Area (y) 10 12 14 8

Log peak Area (y) 10 12 14 8

16

16

Concentration level (x)

Concentration level (x)

J

16

170

Log peak Area (y) 10 12 14 8

B

C

Concentration level (x)

6 A1

B

Concentration level (x)

Log peak Area (y) 8 10 12 14

B

A1

Concentration level (x)

6 A1

6

6 A1

Concentration level (x)

16

B

Log peak Area (y) 8 10 12 14

Log peak Area (y) 10 12 14 8

Log peak Area (y) 8 10 12 14 6

Log peak Area (y) 8 10 12 14 6 A1

169

16

16

167 16

166

16

161

A1

B

C

D

E

F

G

H

I

J

Concentration level (x)

A1

B

C

D

E

F

G

H

I

J

Concentration level (x)

ˆ, from the observed pairs of (xl , yl ) for each peptide at laboratory 52a. Figure 5. Panels contain the smooth curves, g

to outliers and again with outliers substituted. As expected, the simple linear regression approach performed poorly with and without outliers; and the iterative method proposed here performed best in fitting these data. With the calibration method proposed here, outliers within the measurement capability are usually readily recognizable upon inspection or even automated checking by the failure of accordance among the multiple technical replicates across the several transitions and also across the peptides that make up the protein of interest. The fitted calibration curves are the basis for defining the dynamic range of an instrument; and alternative methods are proposed here to define the critical calibration limits (LoD, LLoQ, and ULoQ). Of these, LoD determines the range of “non-zero” concentrations that can be detected. For LLoQ and ULoQ the method in Section 4 uses cv as the standard measure of stability that is familiar to scientists and hence is used to set the level of variation for the data that is acceptable to the scientists for analysis purposes. The method in Section 5 is a nonparametric variation of Currie (1968) method that

also adjusts calibration limits based on join points for linear splines in the absence of sufficient data for blank samples. It is not so unusual that the data from blank samples are sparse or show extremely high variation or disproportionately include outliers, in which case these are inappropriate for calculation of calibration limits. In this case, a neat theoretical solution does not exist. For a linear regression method of calibration, the absence of a accurate and reliable baseline measurement adds to the problem of fitting a nonlinear response with a line. With the iterative method proposed here, the absence of a good baseline measurement also poses a problem. What has been suggested here is an ad hoc and slightly conservative approach that chooses the midpoint of the initial spline segment as substitute. This choice is suggested because the variance at the midpoint is smaller and because the likelihood is relatively small that the midpoint concentration might truly belong within the range of the second segment. Other choices might have other rationales; but the use of the first segment midpoint has performed well with actual data and in simulation.

Calibration under Constraints HRP

LEP

C

D

E

F

G

H

I

J

B

C

D

E

F

G

H

I

J

Log peak Area (y) 10 12 8 B

C

D

E

F

G

H

8 D

E

F

G

H

I

J

J

A1

B

C

D

E

F

G

H

I

J

Concentration level (x)

19a 52a 54a 56a 73a 86a 95a

6

6 C

Concentration level (x)

I

8

14

CRP

Log peak Area (y) 10 12

APR

14

MYO

Log peak Area (y) 10 12

Concentration level (x)

8

B

6 A1

Concentration level (x)

6 A1

14

14 Log peak Area (y) 10 12 8 A1

Concentration level (x)

14

B

Log peak Area (y) 10 12

A1

MBP

6

6

6

8

8

Log peak Area (y) 10 12

Log peak Area (y) 10 12

14

14

PSA

407

A1

B

C

D

E

F

G

H

I

J

Concentration level (x)

A1

B

C

D

E

F

G

H

I

J

Concentration level (x)

Figure 6. Each panel contains the smooth curves, fˆ , from the seven laboratories for one of the seven proteins in the test mixture. MS instruments at all laboratories were 4000 QTRAP models.

The real example presented here demonstrates how to estimate the calibration limits and also is used to compare behaviors of different transitions, peptides and proteins based on fitted curves. The analysis of these data shows that the nonparametric smoothing method delivers more information than the linear regression models widely used by scientists. Further, while the reproducibility conclusions in Addona et al. (2009) are tenable in the aggregate, it is clear from the one-way ANOVA reported here that the individual calibration curves are not homogeneous, either peptide-to-peptide or laboratoryto-laboratory, and that the calibration limits calculated from the peptides and laboratories separately differ both in systematic and random ways.

8. Supplementary Materials Theoretical and simulation results referenced in Sections 1, 3.2, and 7 are available with this paper at the Biometrics website on Wiley Online Library.

Acknowledgements This research was supported in part by National Cancer Institute, NNSF of China Grant 11101254, Shanghai Pujiang Program, and Key Laboratory of Mathematical Economics (SUFE), Ministry of Education, China.

References Addona, T., Abbatiello, S., Schilling, B., Skates, S. J., Mani, D. R., Bunk, D. M., et al. (2009). Multi-site assessment of the precision and reproducibility of multiple reaction monitoring-based measurements of proteins in plasma. Nature Biotechnology 27, 633–641. Brunk, H. D. (1955). Maximum likelihood estimates of monotone parameters. The Annals of Mathematical Statistics 26, 607– 616. Cuesta-Albertos, J. and Febrero-Bande, M. (2010). A simple multiway anova for functional data. Test 19, 537–557. Currie, L. A. (1968). Limits for qualitative detection and quantitative determination. Analytical Chemistry 40, 586– 593.

408

Biometrics, June 2014

Danzer, K. and Currie, L. A. (1998). Guidelines for calibration in analytical chemistry. Pure and Applied Chemistry 70, 993– 1014. Friedman, J. and Tibshirani, R. (1984). The monotone smoothing of scatterplots. Technometrics 26, 242–250. Hall, P. and Huang, L. (2001). Nonparametric kernel regression subject to monotonicity constraints. The Annals of Statistics 29, 624–647. He, X. and Shi, P. (1998). Monotone b-spline smoothing. Journal of the American Statistical Association 93, 643–650. Holmes, C. C. and Heard, N. A. (2003). Generalized monotonic regression using random change points. Statistics in Medicine 22, 623–638. Karpievitch, Y., Polpitiya, A., Anderson, G., Smith, R. D., and Dabney, A. R. (2010). Liquid chromatography mass spectrometry-based proteomics: Biological and technological aspects. The Annals of Applied Statistics 4, 1797– 1823. Mammen, E. (1991). Estimating a smooth monotone regression function. The Annals of Statistics 19, 724–740. Meyer, M. (2008). Inference using shape-restricted regression splines. The Annals of Applied Statistics 2, 1013–1033. Neelon, B. and Dunson, D. B. (2004). Bayesian isotonic regression and trend analysis. Biometrics 60, 724–740. Ramsay, J. O. (1988). Monotone regression splines in action. Statistical Science 3, 425–461.

Rao, C. R. and Zhao, L. C. (1992). Approximation to the distribution of m-estimates in linear models by randomly weighted bootstrap. Sankhya: The Indian Journal of Statistics 54, 323–331. Shively, T. S., Sager, T. W., and Walker, S. G. (2009). A Bayesian approach to non-parametric monotone function estimation. Journal of the Royal Statistical Society: Series B 71, 159– 175. Stone, C. (1982). Optimal global rates of convergence for nonparametric regression. The Annals of Statistics 10, 1040–1053. Utreras, F. (1985). Smoothing noisy data under monotonicity constraints—Existence, characterization and convergence rates. Numerische Mathematik 47, 611–625. Villalobos, M. and Wahba, G. (1987). Inequality-constrained multivariate smoothing splines with application to the estimation of posterior probabilities. Journal of the American Statistical Association 82, 239–248. Zhou, S., Shen, X., and Wolfe, D. A. (1998). Local asymptotics for regression splines and confidence regions. The Annals of Statistics 26, 1760–1782.

Received December 2012. Revised October 2013. Accepted December 2013.

Calibration using constrained smoothing with applications to mass spectrometry data.

Linear regressions are commonly used to calibrate the signal measurements in proteomic analysis by mass spectrometry. However, with or without a monot...
587KB Sizes 0 Downloads 0 Views