Verifying Performance Characteristics of Quantitative Analytical Systems Calibration Verification, Linearity, and Analytical Measurement Range Anthony A. Killeen, MB, BCh, PhD; Tom Long, MPH; Rhona Souers, MS; Patricia Styer, PhD; Christina B. Ventura, MT(ASCP); George G. Klee, MD, PhD

 Context.—Both the regulations in the Clinical Laboratory Improvement Amendments of 1988 (CLIA) and the checklists of the College of American Pathologists (CAP) Laboratory Accreditation Program require clinical laboratories to verify performance characteristics of quantitative test systems. Laboratories must verify performance claims when introducing an unmodified, US Food and Drug Administration–cleared or approved test system, and they must comply with requirements for periodic calibration and calibration verification for existing test systems. They must also periodically verify the analytical measurement range of many quantitative test systems. Objectives.—To provide definitions for many of the terms used in these regulations, to describe a set of basic analyses that laboratories may adapt to demonstrate compliance with both CLIA and the CAP Laboratory Accreditation Program checklists for performing calibration verification and for verifying the analytical measure-

ment range of test systems, to review some of the recommended procedures for establishing performance goals, and to provide data illustrating the performance goals used in some of the CAP’s calibration verification and linearity surveys. Data Sources.—The CAP’s calibration verification and linearity survey programs, the CLIA regulations, the Laboratory Accreditation Program requirements, and published literature were used to meet these objectives. Conclusions.—Calibration verification and linearity and analytical measurement range verification should be performed using suitable materials with assessment of results using well-defined evaluation protocols. We describe the CAP’s calibration verification and linearity programs that may be used for these purposes. (Arch Pathol Lab Med. 2014;138:1173–1181; doi: 10.5858/arpa.2013-0051-CP)

C

verification and AMR are analyzed in the Calibration Verification and Linearity (CVL) program.

linical laboratories that are regulated by the Clinical Laboratory Improvement Amendments of 1988 (CLIA) and/or accredited by the College of American Pathologists (CAP) are required to demonstrate acceptable performance criteria for many quantitative test systems, including verification of calibration settings and the reportable or analytic measurement range (AMR). The purpose of this article is to define some of the key terms that are used in meeting those requirements and to outline how these performance criteria may be assessed with particular reference to the CAP’s calibration verification and linearity surveys. We show, with several examples, how calibration Accepted for publication October 18, 2013. From the Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis (Dr Killeen); the Department of Biostatistics (Mr Long, Ms Souers, and Dr Styer) and the Instrumentation Resource Committee (Ms Ventura), College of American Pathologists, Northfield, Illinois; and the Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota (Dr Klee). The authors have no relevant financial interest in the products or companies described in this article. Reprints: Anthony A. Killeen, MD, PhD, Department of Laboratory Medicine and Pathology, University of Minnesota, 420 Delaware St SE, MMC 609, Minneapolis, MN 55455 (e-mail: [email protected]). Arch Pathol Lab Med—Vol 138, September 2014

DEFINITIONS OF TERMS USED Linearity is a fundamental characteristic of good analytic measurement methods, whereby there is a straight-line relationship between ‘‘true’’ analyte concentrations and measured concentrations. In this context, linearity refers to the relationship between final analytic results and not to the relationship between instrument signal output and analyte concentration, which may be nonlinear; for example, a competitive immunoassay may show a sigmoidal relationship between signal and analyte concentration. Although linearity is a basic characteristic of valid analytic measurement systems, this term is not explicitly used in CLIA. Three closely related terms used by CLIA are calibration, calibration verification, and reportable range.1–3 Several CAP inspection checklists use 4 related terms: calibration, calibration verification, AMR, and AMR verification (eg, CAP4). These terms are defined as follows. Calibration (CLIA §493.2)2 means ‘‘a process of testing and adjusting an instrument or test system to establishes a correlation between the measurement response and the concentration or amount of the substance that is being measured by the test procedure.’’ Verifying Performance Characteristics—Killeen et al 1173

Calibration (CAP)4 is the ‘‘set of operations that establishes, under specified conditions, the relationship between reagent system/instrument response and the corresponding concentration/activity values of an analyte.’’ Calibration verification (CLIA §493.2)2 means the ‘‘assaying of materials of known concentration in the same manner as patient samples to substantiate the instrument or test system’s calibration throughout the reportable range for patient test results.’’ For the US Food and Drug Administration–cleared systems, the CLIA regulations (§493.1255)2 require that laboratories perform calibration or calibration verification according to the manufacturer’s schedule and instructions, but ‘‘at least once every 6 months’’ or when there is ‘‘a complete change of reagents’’ (unless it is shown that the lot number change does not affect the reportable range or quality control results), when there is a ‘‘major preventive maintenance or replacement’’ of critical instrument parts, when quality control procedures show an unacceptable trend or shift that has not been corrected by other means, and when the laboratory’s policies require more frequent evaluation of the reportable range. Calibration verification (CAP)4 denotes ‘‘the process of confirming that the current calibration settings for each analyte remain valid for a method. If calibration verification confirms that the current calibration settings for each analyte are valid, it is not necessary to perform a complete calibration or recalibration of the test system.’’ The CLIA requirements of §493.1255 apply. Proficiency testing (PT)4 denotes the ‘‘determination of laboratory testing performance by means of interlaboratory comparisons, in which a PT program periodically sends multiple specimens to members of a group of laboratories for analysis and/or identification; the program then compares each laboratory’s results to those of other laboratories in the group and/or with an assigned value.’’ Reportable range (CLIA §493.2)3 means ‘‘the span of test result values over which the laboratory can establish or verify the accuracy of the instrument or test system measurement response.’’ (In CAP’s definitions, the reportable range involves two distinct concepts, which are described below: the analytical measurement range and the clinical reportable range). Analytic Measurement Range (AMR)4 is the ‘‘range of analyte values that a method can directly measure on the specimen without any dilution, concentration, or other pretreatment not part of the usual assay process.’’ This concept derives from the CLIA description of reportable range, but is limited to measurements obtained without such sample modification. The AMR, which is commonly specified by instrument manufacturers, should not exceed the linear range observed when comparing measured and true analyte concentrations. AMR verification (CAP)4 is the process of confirming the results of the assay system ‘‘by using matrix-appropriate materials, which include the low, mid, and high concentration or activity range of the AMR, and recovering appropriate target values.’’ When validating the AMR, the concentration of the highest calibrators or other test material should ideally be within 10% to 15% of the upper end of the AMR and reasonably close to the lower end of the AMR (Figure 1). The term calibration has the same meaning in the CLIA regulations and the CAP checklists (eg, see CAP4). However, the term ‘‘calibration verification,’’ as used in 1174 Arch Pathol Lab Med—Vol 138, September 2014

Figure 1. A plot of measured versus true values should show a linear relationship between the lower and upper ends of the analytic measurement range (AMR). Beyond the AMR, the relationship may not be linear (shown by the dashed line), and measurements obtained may, therefore, be invalid. The AMR may extend slightly beyond routine calibrator concentrations, but it is recommended to use calibrators within 10% to 15% of the ends of the AMR. Figure 2. A plot of measured values against true values may be linear but show unacceptable bias, in this case, a constant bias (line A). If an insufficient number of samples are analyzed, a plot of measured values versus true values may fail to reveal nonlinearity (line B). The line of equality is shown in gray, labeled y ¼ x.

the CAP’s checklists, carries a more restrictive meaning than it does in CLIA. In the CLIA definitions, calibration verification refers to two distinct processes: (1) verification of correct method calibration, and (2) verification of the reportable range. The CAP checklists restrict the use of the term ‘‘calibration verification’’ to the first process. The CAP checklists use the term, AMR verification to refer to the second process. An additional term that the CAP checklists recently used is clinically reportable range (CRR). Like AMR, this concept is derived from the CLIA term reportable range and was intended to indicate the full range of assay results that a laboratory could report for an analyte, including results obtained following dilution or concentration of a Verifying Performance Characteristics—Killeen et al

Table 1.

Performance Specifications for College of American Pathologists (CAP) Chemistry (LN2) Surveya

Analyteb Albumin, g/dL Alkaline phosphatase, U/L ALT (SGPT), U/L Amylase, U/L AST (SGOT), U/L Calcium, mg/dL Chloride, mEq/L Cholesterol, mg/dL CK2 (CK-MB) mass, ng/mL Creatine kinase, U/L Creatinine, mg/dL Direct bilirubin, mg/dL GGT, U/L Glucose, mg/dL HDL cholesterol, mg/dL Iron, lg/dL LD, U/L Lipase, U/L Magnesium, mg/dL Osmolality, mOsm/kg H2O Phosphorus, mg/dL Potassium, mEq/L Sodium, mEq/L Total bilirubin, mg/dL Total protein, g/dL Triglycerides, mg/dL Urea nitrogen (BUN), mg/dL Uric acid, mg/dL

CILA Limits, Target 6% or Indicated Unit

CAP PT Limits, Target 6% or Indicated Unit

TE,c %

CAP CVL TE Goal, %

10 30 20 30 20 1 mg/dL 5 10

10 30 20 30 20 1 mg/dL 5 10 3 SD 30 15 or 0.3 mg/dL 20 or 0.4 mg/dL 3 SD 10 or 6 mg/dL 30 20 20 30 25 3 SD 10.7 or 0.3 mg/dL 0.5 mEq/L 4 mEq/L 20 or 0.4 mg/dL 10 25 9 or 2 mg/dL 17

3.9 11.7 26.3 14.6 15.2 2.4 1.5 8.5 31.2 30.3 8.9 44.5 22.2 6.9 11.1 30.7 11.4 29.1 4.8 1.5 10.2 5.8 0.9 31.1 3.4 27.9 15.7 12.4

10.0 25.0 25.0 25.0 20.0 8.0 8.0 9.0 25 25.0 10.0 22.0 25.0 12.0 20.0 22.0 20.0 35.0 25.0 10.0 20.0 10.0 5.0 22.0 10.0 15.0 10.0 17.0

30 15 or 0.3 mg/dL 20 or 0.4 mg/dL 10 or 6 mg/dL 30 20 20 25 0.5 mEq/L 4 mEq/L 20 or 0.4 mg/dL 10 25 9 or 2 mg/dL 17

CV Error Limits, 6% or (Analyte Units) 5 12.5 12.5 12.5 10 4 4 4.5 12.5 12.5 5 11 12.5 6 10 11 10 17.5 12.5 5 10 5 2.5 11 5 7.5 5 8.5

(0.2 g/dL) (5 U/L) (5 U/L) (5 U/L) (5 U/L) (0.5 mg/dL) (2 mEq/L) (3 mg/dL) (1 ng/mL) (5 U/L) (0.2 mg/dL) (0.3 mg/dL) (5 U/L) (3 mg/dL) (2 mg/dL) (5 lg/dL) (5 U/L) (8 U/L) (0.2 mg/dL) (5 mOsm/kg H2O) (0.2 mg/dL) (0.2 mEq/L) (2 mEq/L) (0.3 mg/dL) (0.2 g/dL) (4 mg/dL) (1.1 mg/dL) (0.5 mg/dL)

Linearity Assessment Goal, 6% 2.50 6.25 6.25 6.25 5.00 2.00 2.00 2.25 6.25 6.25 2.50 5.5 6.25 3.00 5.00 5.50 5.00 8.75 6.25 2.50 5.00 2.50 1.25 5.50 2.50 3.75 2.50 4.25

Abbreviations: ALT, alanine aminotransferase; AST, aspartate aminotransferase; BUN, blood urea nitrogen; CLIA, Clinical Laboratory Improvement Act of 1988; CK2 (CK-MB), an isoenzyme of creatine kinase with muscle and brain subunits; CV, calibration verification; CVL, calibration verification and linearity; GGT, c-glutamyltransferase; HDL, high-density lipoprotein; LD, lactate dehydrogenase; PT, proficiency testing; SD, standard deviation; SGOT, serum glutamic-oxaloacetic transaminase; SGPT, serum glutamic-pyruvic transaminase; TE, total error. a Where 2 performance criteria are shown, the larger is allowed. b Conventional unit to SI unit conversions for dual-reported values: to convert from mg/dL to mmol/L, for calcium, multiply by 0.25; for cholesterol and HDL, 0.259; for creatinine, 88.4; for glucose, 0.0555; for triglycerides, 0.0113; and BUN, 0.357; and to convert from lg/dL to lmol/L, for iron, multiply by 0.179. c From Westgard,10 2013.

Table 2.

Performance Specifications for College of American Pathologists (CAP) Therapeutic Drug Monitoring (LN3) Surveya

Analyte Acetaminophen, lg/mL Amikacin, lg/mL Carbamazepine, lg/mL Digoxin, ng/mL Gentamicin, lg/mL Lidocaine, lg/mL Lithium, mEq/L N-acetyl-procainamide, lg/mL Phenobarbital, lg/mL Phenytoin, lg/mL Primidone, lg/mL Procainamide, lg/mL Quinidine, lg/mL Salicylates, mg/dL Theophylline, lg/mL Tobramycin, lg/mL Valproic acid, lg/mL Vancomycin, lg/mL

CILA Limits, Target 6% or Indicated Unit

25 20 or 0.2 ng/mL 25 20 or 0.3 mEq/L 25 20 25 25 25 25 25 25 25

CAP PT Limits, Target 6% or Indicated Unit 10 10 25 20 25 10 20 25 20 25 25 25 25 10 25 25 25 10

or 3 SD or 3 SD or 0.2 ng/mL or 3 SD or 0.3 mEq/L

or 3 SD

or 3 SD

TE, %

CAP CVL TE Goal, %

N/A N/A N/A 10.2 N/A N/A 10.1 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A

20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

b

CV Error Limits, 6% or (Analyte Units) 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

(2.4 lg/mL) (0.5 lg/mL) (0.5 lg/mL) (0.2 ng/mL) (0.2 lg/mL) (0.1 lg/mL) (0.1 mEq/L) (0.2 lg/mL) (1 lg/mL) (0.5 lg/mL) (0.3 lg/mL) (0.2 lg/mL) (0.1 lg/mL) (1.1 mg/dL) (0.5 lg/mL) (0.2 lg/mL) (2 lg/mL) (0.8 lg/mL)

Linearity Assessment Goals, 6% 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

Abbreviations: CLIA, Clinical Laboratory Improvement Act of 1988; CV, calibration verification; CVL, calibration verification and linearity; N/A, not available; PT, proficiency testing; SD, standard deviation; TE, total error. a Where 2 performance criteria are shown, the larger is allowed. b From Westgard,10 2013. Arch Pathol Lab Med—Vol 138, September 2014

Verifying Performance Characteristics—Killeen et al 1175

Table 3.

Performance Specifications for College of American Pathologists (CAP) Ligand (LN5) Surveya CILA Limits, Target 6% or Indicated Unit

Analyte CEA, ng/mL Cortisol, lg/dL Ferritin, ng/mL Folate, ng/mL hCG, mIU/mL Thyroxine (total) T4, lg/dL Triiodothyronine T3, ng/dL TSH, lU/mL Vitamin B12, pg/mL

25% 3 SD 3 SD 3 SD

CAP PT Limits, Target 6% or Indicated Unit 3 SD 25% 3 SD 3 SD 3 SD 20% or 1.0 lg/dL 3 SD 3 SD 3 SD

TE,b % 24.7 29.8 16.9 39.0 7.0 12.0 23.7

CAP CVL TE Goal, % 25 25 30 30 25 20 25 25 30

CV Error Limits, 6% (Analyte Units) 12.5 12.5 15 15 12.5 10 12.5 12.5 15

(2.3 ng/mL) (1 lg/dL) (16.5 ng/mL) (0.3 ng/mL) (12 mIU/mL) (0.5 lg/dL) (20 ng/dL) (1 lU/mL) (30 pg/mL)

Linearity Assessment Goals, 6% 6.25 6.25 7.50 7.50 6.25 5.00 6.25 6.25 7.50

Abbreviations: CEA, carcinoembryonic antigen; CLIA, Clinical Laboratory Improvement Act of 1988; CV, calibration verification; CVL, calibration verification and linearity; hCG, human chorionic gonadotropin; PT, proficiency testing; SD, standard deviation; TE, total error; TSH, thyroidstimulating hormone. a Where 2 performance criteria are shown, the larger is allowed. b From Westgard,10 2013.

patient sample, that is, from samples with analyte concentrations that are outside the AMR. For analyte concentrations outside the CRR, laboratories report results as ‘‘greater than’’ or ‘‘less than’’ the limits of the CRR. The CRR is established at the time a test method is first placed in use, it may vary among laboratories for any analyte, and it does not need to be redefined on a regular basis. Although the CAP checklists no longer refer to the term CRR, the concept is still important and is reflected in several checklist questions. ANALYTIC PERFORMANCE GOALS The performance goals for laboratory tests are established to support good medical decisions. Fraser5 published a hierarchical approach to classification strategies for setting quality specifications in laboratory medicine. This hierarchical list of strategies was endorsed at an international conference, Strategies to Set Global Quality Specifications in Laboratory Medicine, and has been termed The Stockholm Conference Hierarchy.6,7 Fraser noted that although specifications based on how quality affects medical decisionmaking are at the top of the hierarchy, that approach is difficult to apply and seldom used.5 A well-accepted strategy is to keep the analytic variability small, compared with the biologic variability. Westgard and colleagues8 have used this strategy to define goals for total analytic error. This term describes the combined effect of random and systematic errors in a test system. This ‘‘error budget’’ must cover multiple sources of error inherent in the assay manufacturing and laboratory operational procedures as well as the analytic characteristics of the assay. The CAP’s CVL surveys enable participating laboratories to perform calibration verification and assessment of method linearity for many analytes in chemistry and hematology. For calibration verification, participant results Table 4. Principles of Assessment of Imprecision Used in the College of American Pathologists (CAP) Linearity Surveys Imprecision Screen

Best-Fit Model

Fail

Line or curve

Pass Pass

Line Curve

CAP’s Evaluation Result Imprecise (poor repeatability and/or fit) Linear Undetermined; assess nonlinearity in step 3

1176 Arch Pathol Lab Med—Vol 138, September 2014

are compared with target values that are determined by reference methods or by peer group means, and grading is based on a bias from the target value. Assay linearity is assessed for each laboratory based on its reported values alone. In method evaluation, it is an accepted practice to specify a total error allowance and to use proportional allocations of that total error for performance goals, which is the approach used in the CVL surveys. In general, targets equal to onehalf of the total error goal are used for the assessment of calibration verification, and one-quarter of the total error goal is used for the assessment of linearity. In the linearity assessment, participants take duplicate measurements of multiple samples within the same run. Hence, the precision component of the linearity assessment is expected to be similar in magnitude to within-run precision. The performance goal for imprecision has been set as a small fraction of the total error because many sources of variation are excluded from the assessment. The use of one-quarter of the total error allowance aligns with well-established recommendations to use performance goals that will reduce unsatisfactory performance on proficiency testing challenges.9 In the calibration verification evaluation, the means of the duplicated measurements are compared with the target values. This performance goal reflects between-laboratory variability and has been set relative to grading criteria for external assessment programs, which generally use CLIA limits. Specifically, the calibration verification evaluation is intended to have more-stringent performance goals than proficiency testing does to allow early identification of analytic error. In addition, because the evaluation uses the means of replicate measurements, it is appropriate to consider some amount of reduction in the error limits. The use of one-half of the total error allowance includes a reduction in error limits because of the use of means. Table 5. Principles of Assessment of Nonlinearity Used in the College of American Pathologists (CAP) Linearity Surveys Nonlinearity Check

CAP’s Evaluation Result

ADL . limit ADL  limit

Nonlinear Linear

Abbreviation: ADL, average deviation from linearity. Verifying Performance Characteristics—Killeen et al

Figure 3. A sample report from a linearity survey showing a result of linearity in a partial range. In this example, the first 6 points (LN-14 through LN19) are linear, but the evaluation including the seventh point (LN-20) is nonlinear. Note that because the line is fit to the first 6 specimens only, the differences between the best-fit line and the results for the highest specimen are very pronounced in linearity plot 2. Vitros 250 and 350 CHEM SYST refer to the instruments used by the participant, a Vitros 250 or 350 chemistry system (Ortho Clinical Diagnostics, Rochester, New York). Abbreviations: ENZ, enzymatic; GB, glycerol blanking; GPO, L-a-glycerol phosphate oxidase; SB, serum blank; TE, total error; TE/4, one-quarter total error; w/, with; w/o, without.

It is apparent from this description that the grading criteria are more stringent for linearity assessment than they are for calibration verification. However, some of these limits (linearity assessment and calibration verification values) have been modified to accommodate state-of-the-art system performance and/or special clinical needs. Total error goals are based on data maintained by Westgard,10 on historic performance of methods by CVL program participants and on input from members of the CAP’s Instrumentation Resource Committee. Tables 1 through 3 illustrate some of the performance limits used by the CAP surveys. The choice of performance limits involves a trade-off between 2 important functions. First, the criteria need to be Arch Pathol Lab Med—Vol 138, September 2014

tight enough to detect clinically important analytic problems, and second, the criteria need to avoid false-positive flagging of changes that are not clinically relevant. Furthermore, the methodology used for assessment must provide sufficient statistical power to detect those changes when they occur. Kroll et al11 recommended that the assessment methodology have a sensitivity of 80% and a specificity of 95%. For the assessment method to achieve those goals, the measurements need to be run in duplicate, and at least 4 different analyte concentrations should be assessed. The performance goals used within the evaluation are increased slightly to achieve that level of sensitivity and specificity when accounting for sampling variability. Verifying Performance Characteristics—Killeen et al 1177

Figure 4. A sample report from a linearity survey showing nonlinear trend results. In this example, the linearity evaluation result is ‘‘nonlinear.’’ For a survey with 6 specimen levels, the algorithm evaluates the full range, the first 5 consecutive specimens, and then the first 4 consecutive specimens. None of these evaluations has a linear result. Linearity plot 1 shows both the best-fit straight line and best-fit curve when the evaluation result is nonlinear. Siemens ADV CNTR/XP indicates the instrument model used by the participating laboratory, the Siemens Advia Centaur XP (Siemens Healthcare Solutions, Tarrytown, New York). Abbreviations: TE, total error; TE/4, one-quarter total error.

The result of a CVL participant’s calibration verification evaluation is reported as ‘‘Verified,’’ if all differences between the mean and the target value are within the calibration verification allowable error limits in the range specified, and as ‘‘Different,’’ if at least one of the means differs from the target value by more than the allowable error limit. The allowable error in the CAP calibration verification evaluation is the larger of 2 limits: (1) one-half of the goal for total error, or (2) the alternate limit for low concentrations, which is defined as the smallest difference in concentration or activity that is clinically important at the low end of the analytic measurement range. Calibration verification can be accomplished in several ways. If the method manufacturer provides a calibration 1178 Arch Pathol Lab Med—Vol 138, September 2014

verification process, it should be followed. Other techniques include (1) an assay of the current method calibration materials as unknown specimens, and determination that the correct target values are recovered; (2) an assay of matrix-appropriate materials with target values that are specific for the method, such as the CVL materials; and (3) an assay of patient materials of known analyte concentration or activity. Each laboratory must define limits for accepting or rejecting tests during calibration verification. Both linearity assessment and calibration verification are needed to evaluate the performance of a method. In particular, assay results may be linear for true analyte concentrations but show an unacceptable constant and/or proportional bias (Figure 2, line A). In such a case, an Verifying Performance Characteristics—Killeen et al

Figure 5. A sample report from a linearity survey showing a result of poor repeatability. In this example, the evaluation result is ‘‘imprecise’’ (poor repeatability and/or fit), which means that, for all subsets of specimens evaluated, there was large imprecision around the best-fit line or curve. This example shows large imprecision because of large differences between the replicate pairs. Although the data may appear linear from visual examination of the plots, the analysis reveals an underlying problem with poor repeatability and/or poor fit that precludes confirmation of linearity. The Roche COBAS e601/E170 (Roche, Indianapolis, Indiana) is the instrument system used by the participant. Abbreviations: CMIA, chemiluminescent microparticle immunoassay; TE, total error; TE/4, one-quarter total error.

assessment of the calibration of at least 2 points would reveal the bias. Conversely, a method may appear to be correctly calibrated if just a few points are analyzed but have unacceptable nonlinearity (Figure 2, line B). DIFFERENCES AND SIMILARITIES BETWEEN CALIBRATION VERIFICATION AND PT The CAP’s CVL and PT surveys provide similar information but differ in their purposes and their grading criteria. The goal of both is to examine the ability of an analytic system to generate expected test results for a number of samples that are sent to participating laboratories. Test values submitted for both the CAP’s CVL and PT surveys Arch Pathol Lab Med—Vol 138, September 2014

are generally graded by peer groups among participants who use the same or similar test methodology. For certain surveys that use matrix-free material, and where reference methods are available, for example, for creatinine, accuracybased grading is used. The CVL surveys are designed to meet the CLIA calibration verification requirements and to allow laboratories to determine whether the calibration settings of an analytic system have changed since its last calibration. The ranges of analyte concentrations or activities in the CVL surveys are intended to cover the AMR of analytic systems. Conversely, PT fulfills a regulatory requirement to determine whether a laboratory can produce a correct test result, and failures in PT may result in Verifying Performance Characteristics—Killeen et al 1179

laboratory sanctions by a regulatory agency. The PT surveys are graded according to criteria listed in subpart I of the CLIA regulations or, for analytes not listed in that subpart, according to criteria set by the PT program (see Tables 1 through 3 for examples). The range of analyte concentrations or activities in the PT surveys reflects the range of values that may be seen in patient samples, which are often narrower than an instrument’s AMR. Inevitably, both CVL and PT surveys are subject to similar sources of error, which include methodologic and nonmethodologic mistakes (eg, sample mix-up, clerical error, and deterioration of the survey material). Description of the CAP’s Linearity Survey Linearity is typically evaluated using 5 to 8 equally spaced admixtures from a specimen with a high concentration of the analyte of interest and a specimen with a low concentration. Those concentrations are specified so that, ideally, the test material spans the test system’s AMR. Participants analyze samples in duplicate within a single run. Both diluted and semiquantitative results can be reported, although semiquantitative results, that is, ‘‘greater than’’ or ‘‘less than’’ values, are excluded from the analysis. Results on samples that are diluted are considered only if a sufficient number (usually 5) of undiluted sample results are available for linearity evaluation. The best-fit line from those points is extended, and an assessment is made of the closeness of the diluted sample results with that line. Diluted sample results are not, however, used to assess linearity. CAP’s CVL Standard Linearity Evaluation The CAP’s standard linearity statistical evaluation is rigorous and is described in detail below. The theoretic development of this evaluation and the exact calculations for the nonlinearity assessment have been described previously.12–15 The evaluation principles employ 3 steps as follows. Polynomial Method.—The first step of the evaluation fits a series of regression models to determine whether the data are best fit by a straight line or a curve. This step is also referred to as the polynomial method.13–14 The regression models are fit using the assay recoveries and the relative (proportional) concentrations that have been standardized from 0 to 1. Generally, 3 models are fit: cubic, quadratic, and linear. If there are only 4 sets of results, then the highestorder polynomial tested is a quadratic. The significance of each model’s fit is assessed using a t test of the corresponding coefficient. The null hypothesis for that test is that the coefficient equals 0. The CAP uses a significance level of a ¼ .05. If either the cubic or quadratic coefficient is statistically significant, then the data are best fit by a curve and will require further analysis to determine whether that nonlinearity is clinically significant (step 3). If neither of those terms is statistically significant, then the data are best fit by a line. Regression Standard Error.—The second step of the evaluation performs an imprecision screen to verify the data are reliable enough to assess linearity. Data may be too imprecise because of poor repeatability, poor conformance to the model, or both. This estimated imprecision is based on the differences between the assay recoveries and the best-fit line or curve. Formally, this difference is the regression standard error. It is converted to the coefficient of variation for comparison to the imprecision limit. The imprecision limit is determined by the analyte’s total error goal, the number of results (N) included in the model, and a 1180 Arch Pathol Lab Med—Vol 138, September 2014

constant (C) derived from the formal statistical evaluation.11 In general, the limit on the imprecision is always slightly larger than one-fourth of the goal for total error: Limit ¼ Total Error Goal 3 0:25 3ðN=CÞ1=2 where C is 6.3 for the straight line or quadratic model, and C is 6.5 for the cubic model. The protocol for assessment of the imprecision screen is shown in Table 4. Average Deviation From Linearity.—The third step is to evaluate nonlinearity relative to a nonzero clinical threshold, if applicable. This step is only required when the best fit for the data is a curve and it is necessary to determine whether that nonlinearity is clinically relevant. The significance of the nonlinearity estimated in the polynomial method is tested using a nonzero threshold, which permits small, clinically unimportant deviations from linearity. The summary measure of the difference between the best-fit curve and a straight line is called the average deviation from linearity (ADL). The ADL limit is based on the 95th percentile of a noncentral v2 distribution where the noncentrality parameter is a function of the expected sums-of-squares deviations between the fitted values from the best-fit curve and a straight line. Based on the complexity of that calculation, an approximation is provided. Only one fitted value is used for each specimen level. " #1=2 X ðLinear Fits  Nonlinear FitsÞ2 ADL ¼ No: of Specimen Levels The ADL limit is slightly larger than one-fourth of the total error goal, so the following approximation can be used: ADL Limit Approximation ¼ Total Error Goal 3 0:25 3 Mean of the Model Results The protocol for assessment of nonlinearity is shown in Table 5. If the evaluation using all specimens is nonlinear or imprecise, subsequent evaluations are performed, systematically eliminating specimens from the high end. The rules for the required number of specimens must be maintained. Examples of Linearity Reports and Problem-Solving Advice Three examples of linearity problems and troubleshooting recommendations are illustrated in Figures 3 through 5. A complete set of troubleshooting examples is provided in the Calibration Verification/Linearity Surveys User’s Guide, which is provided by the CAP to all program participants. CONCLUSION Calibration verification and verification of an assay’s AMR are CLIA and CAP Laboratory Accreditation Program requirements that laboratories must perform on a regular basis. Those requirements can be met and documented by participation in programs such as the CAP CVL surveys. Perhaps equally important, participation in those surveys helps to evaluate the linearity of assay systems using welldefined criteria related to total error limits. When problems are identified, the surveys provide recommended actions and troubleshooting suggestions. This article is the product of members of the College of American Pathologists Instrumentation Resource Committee and College of American Pathologists staff. Verifying Performance Characteristics—Killeen et al

References 1. Centers for Medicare & Medicaid Services, Department of Health and Human Services. Clinical laboratory improvement amendments of 1988; final rule. Fed Regist. 1992;57(40):7165–7186. Codified at 42 CFR §493.1217. 2. Centers for Medicare & Medicaid Services, Department of Health and Human Services. Medicare, Medicaid, and CLIA programs; laboratory requirements relating to quality systems and certain personnel qualifications; final rule [published correction appears in Fed Regist. 2003;68(163):50722–50725]. Fed Regist. 2003; 68(16):3707–3714. Codified at 42 CFR §493.1255. 3. Centers for Disease Control and Prevention. CLIA law & regulations Web site. http://wwwn.cdc.gov/clia/Regulatory. Updated May 31, 2012. Accessed December 30, 2013. 4. College of American Pathologists, Commission on Laboratory Accreditation. Chemistry and Toxicology Checklist. Northfield, IL: College of American Pathologists; 2012. 5. Fraser CG. General strategies to set quality specifications for reliability performance characteristics. Scand J Clin Lab Invest. 1999;59(7):487–490. 6. Dhatt GS, Agarwal MM, Bishawi B, Gill J. Implementing the Stockholm Conference hierarchy of objective quality criteria in a routine laboratory. Clin Chem Lab Med. 2007;45(4):549–552. 7. Fraser CG, Kallner A, Kenny D, Petersen PH. Introduction: strategies to set global quality specifications in laboratory medicine. Scand J Clin Lab Invest. 1999;59(7):477–478.

Arch Pathol Lab Med—Vol 138, September 2014

8. Westgard JO, Carey RN, Wold S. Criteria for judging precision and accuracy in method development and evaluation. Clin Chem. 1974;20(7):825– 833. 9. Burnett RW, Westgard JO. Selection of measurement and control procedures to satisfy the Health Care Financing Administration requirements and provide cost-effective operation. Arch Pathol Lab Med. 1992;116(7):777– 780. 10. Westgard JW. Desirable specifications for total error, imprecision, and bias, derived from intra- and inter-individual biologic variation. http://www.westgard. com/biodatabase1.htm. Accessed July 24, 2013 11. Kroll MH, Praestgaard J, Michaliszyn E, Styer PE. Evaluation of the extent of nonlinearity in reportable range studies. Arch Pathol Lab Med. 2000;124(9): 1331–1338. 12. Clinical and Laboratory Standards Institute/National Committee for Clinical Laboratory Standards. Evaluation of the Linearity of Quantitative Measurement Procedures: A Statistical Approach; Approved Guideline. Wayne, PA: CLSI/NCCLS; 2003. CLSI/NCCLS Document EP6-A; vol 23, no. 16. 13. Kroll MH, Emancipator K. A theoretical evaluation of linearity. Clin Chem. 1993;39(3):405–413. 14. Emancipator K, Kroll MH. A quantitative measure of nonlinearity [published correction in Clin Chem. 1993;39(8):1589].Clin Chem. 1993;39(5): 766–772. 15. Jhang JS, Chang CC, Fink DJ, Kroll MH. Evaluation of linearity in the clinical laboratory. Arch Pathol Lab Med. 2004;128(1):44–48.

Verifying Performance Characteristics—Killeen et al 1181

Copyright of Archives of Pathology & Laboratory Medicine is the property of College of American Pathologists and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

Verifying performance characteristics of quantitative analytical systems: calibration verification, linearity, and analytical measurement range.

Both the regulations in the Clinical Laboratory Improvement Amendments of 1988 (CLIA) and the checklists of the College of American Pathologists (CAP)...
457KB Sizes 0 Downloads 3 Views