NORMAL VALUES: THEORETICAL AND PRACTICAL ASPECTS

Authors:

Mario Werner

Critical Reviews in Clinical Laboratory Sciences Downloaded from informahealthcare.com by McMaster University on 12/09/14 For personal use only.

William L. Marsh Division of Laboratory Medicine The George Washington University Medical Center Washington, D.C.

Referee:

Dean A . Arvan Chemistry Division Hospital o f the University o f Pennsylvania Philadelphia, Pennsylvania

INTRODUCTION It is considered good practice to list a normal range along with every reported laboratory result. Since the results of the test can be compared with these arbitrary limits, the interpretation of laboratory data to make diagnostic or therapeutic decisions at first glance appears to be straightforward: Either the test result is within the normal range and this is taken as the absence of a specific health impairment, or the test result falls outside the normal range and this is taken as an indication of disease. When test results are placed either centrally in the normal range or far outside it, this approach poses no problems, but if values are close to the limit of the normal range, these simple concepts become equivocal.’ In the past, the clinical data consisted mainly of signs and symptoms. The quulifurive nature of this information favored yes or no decisions, separating by implication a defined, unique, and optimal state of health from any other, less desirable state. At present, precise laboratory tests necessitate the evaluation of quunrifufiue data. Inevitably, this change led to the recognition that health cannot be defined by laboratory findings alone but is subject to physiological variability, so that the 7’

definition of arbitrary thresholds became necessary. Pleased with the advance of medicine from a mostly qualitative to a more quantitative information base, physicians have been reluctant to find drawbacks in the latter approach or to face the problems introduced by quantitative methods. Even so, some difficulties have been recognized. The use of normal values of uncertifiable validity which have been copied blindly from one textbook to the next often is criticized on the grounds that analytical methods are not identical in different places and at different time^.^,^ Equally, the use of normal values obtained from, say, a series of 25 ostensibly normal technicians, medical students, and middle-aged pathologists is no longer generally accepted.’ To avoid such problems, indirect methods for estimating the normal range have been proposed.6-’ However, none of these methods has proven satisfactory when critically evaluated.” Finally, the use of batteries of screening tests in healthy individuals has been questioned by some clinicians since each added test increases the probability of finding false positives and false negatives which prompt follow-up testing. A less generally recognized and rarely discussed problem of laboratory tests in health screening programs is the regression toward September 1975

81

the mean on repeated biochemical determinations.’ Other, probably more important, difficulties are still not sufficiently recognized because their effects are less overtly discernible. It is not generally appreciated that sampling methods and statistical techniques influence normal values,’ even though these values supposedly are based exclusively on a biological discrimination.’ Still more fundamentally, there are conceptual problems related to the classification of individuals as “healthy” or “ill” according to a single test result.17s’B Indeed, even if several parameters are considered together, there is no sharp separation of “health” and “illness” by a specific value since the ranges of laboratory values obtained in these two states usually overlap such that the key question, “Does this finding require further action or not?” may not have an unequivocal answer.

Critical Reviews in Clinical Laboratory Sciences Downloaded from informahealthcare.com by McMaster University on 12/09/14 For personal use only.

9’

SAMPLING Data collection, or sampling, is the first step in defining normal values. Ideally a normal. range would be derived from the entire population to which it is to be applied. Since this is impractical, only a sample is used. Unfortunately, many reports of normal values do not precisely define the criteria used to select the sample of normal subjects. The proper collection of a sample which adequately represents the population is not a trivial problem.’’ Two incompatible sets of considerations have to be taken into account in designing the sampling procedure.’ ’,” At first glance one set of considerations appears to have conceptual merits while the other appears to offer practical advantages, but as both sets of considerations contain potential pitfalls, choosing the proper approach is not straightforward. On the one hand, the derivation of statistical inferences usually requires a randomly selected sample, in which each member of the analyzed population has an equal chance of being included. This precludes any prior selection of subjects, such as selection based on hedth screening, since the outcome of prior testing would negate randomness. As a consequence of this restriction, it is possible that the random sample of normal subjects is adulterated by sick subjects. On the other hand, being within the normal range implies to many being a “healthy person.” Thus, each subject should in some way be evaluated before being accepted as a member of the sample. 82

CRC Critical Reviews in Clinical Labomtory Sciences

However, such selection not only eliminates the randomness of the sample but additionally presupposes that healthy can be distinguished unequivocally from ill. The latter assumption produces a circular logic since the very purpose of the sample is to define what is normal. Underlying these two contrasting approaches to sampling are two divergent definitions of what is “normal.” If an unbiased, random sample is used, then normal has to be that which is encountered usually, frequently, or habitually and no value judgment should be attached to it. Conversely, if a biased, selected (nonrandom) sample is used, then the exclusion of subjects with undesirable properties attaches a value judgment conferring ideal or perfect properties to the term “normal.” The practical objections to selecting a sample of healthy subjects to define values representing a perfect ideal are probably insurmountable. Such properties would not only involve comprehensive consideration of present health but also of factors affecting future health and life span. As our natural surroundings continuously change, even the genetic selection for survival alters its specifications. Fortunately, the conceptual objections against the alternative of using an unselected, random sample to define normal values are manageable in practical application. This is evidenced in Figures 3 and 4, which are discussed in detail in the next section on Statistics of Normal Values. The practical conclusions of this analysis are (1) that the real life process of sampling determines the theoretical meaning of normal values and (2) that “normal” is used only as a statistical (probabilistic) term and does not imply a deterministic concept, such as the state of health. These conclusions and the resulting necessity for an unbiased, random sample do not prohibit the grouping of data into separate subclasses. Indeed, certain subclassifications of normal values are desirable since this often narrows the normal range and so enhances the chance of recognizing disease. It is possible to stratify normal values, say, according to sex and age, provided: (1) the class criteria can be unequivocally defined and are independent of the criterion normal-abnormal and (2) sampling within each class is random.

STATISTICS OF NORMAL VALUES The specific intent of this section is to stress statistical points pertinent to biological variability since the laboratorian has been exposed to

Critical Reviews in Clinical Laboratory Sciences Downloaded from informahealthcare.com by McMaster University on 12/09/14 For personal use only.

statistics mainly through quality control. The variability of analytical errors by definition is random, and so the Gaussian, “normal” distribution of values can be applied to analyze variability. Familiarity with Gaussian mathematics may have prompted their adoption in the description of medlcal normal values. However, biological values are not varying through random dispersion only and, therefore, cannot be expected a priori to have Gaussian distribution.’ Rather, biological effects ultimately are explainable by defined causes. In the case of normal values, variability can be traced to genetical makeup, environmental influences, age, and other sources. It is particularly important for laboratorians who have a purely technical background, such as analytical chemists, to recognize this relevance of biological factors. Therefore, the nature of frequency distributions is considered first, their description by measures for central tendency and for dispersion second, the effect of sample size and of method error on the observed biological distribution third, and the occurrence of abnormal results in multiple tests performed in supposedly healthy subjects is considered last. 7’



Frequency Distributions In measuring the same property or quantity repeatedly, one usually finds the bulk of observations in a central cluster and decreasing numbers of observations on the extremes. Such frequency distributions are familiarly represented as histograms plotting the number or frequency of observations in discrete classes against the measured value (Figures 1 and 2, upper left). The humped curves so produced may be symmetrical or slanted to one side (skewness), more or less peaked (kurtosis), and may indeed have one or more peaks (modality). An alternative representation of the same information is the cumulative frequency distribution, in which the total number or frequency of all values up to the class under consideration is plotted against the measured value (Figures 1 and 2, upper right). To avoid the need of listing all values contributing to a given frequency distribution, standard methods of simplification are used. By necessity, such abstractions omit part of the total information contained in the original data. As a trade-off, certain key properties of the data become more ftpparent. The description of frequency distributions with a single hump

(unimodal) requires at least two characteristics: a measure of central tendency which locates the average or more frequent value typical or representative of the set of data and a measure of dispersion which describes the scatter of observations around the central value. Central Tendency Measures of central tendency are the mode, the median, and the mean. The mode is the most common value (the value or class of values which occurs with the greatest frequency). This characteristic requires no Computation and is, therefore, the most purely descriptive measure of central tendency. The median is the middle value. If all data are arranged in order of magnitude, the median separates them into equal parts. While this requires a minimum of computation, no assumptions about the distribution characteristics of the data are necessary to make the median a meaningful value. The mean (arithmetic mean) is the algebraic sum of all observations, divided by their number. The computation necessary to derive the mean may introduce considerable artificiality into its meaning, and for strongly skewed distributions the mean is an unrepresentative value. If observations are added to one extreme of a distribution, the introduced skew displaces the mode less than the mean or the median. Usually the median lies between the mean and the mode, and for unimodal frequency distributions with only moderate asymmetry the empirical relationship exists: Mean

-

Mode = 3Wean - Median)

(1)

In other words, the difference between the mean and the mode is three times as large as the difference between the mean and the median. Dispersion The simplest indicator of dispersion of data in a population is the range of the sample data, i.e., the difference between the largest and the smallest sample value. The range is dependent on sampling and varies with the sample size; i. e., extreme values are more likely encountered as the sample size increases, resulting in widened ranges. When small sample sizes are compared with each other, the range variability may be large; when larger sample sizes are compared with each other, the September 1975

83

-

100

I I

r

I

10

I

g

I I

t 1S . D . 1

?!

I

I

f Do-

s 0

I

I I 12

I

0 0 0 0 0

Y

E

I

N

Critical Reviews in Clinical Laboratory Sciences Downloaded from informahealthcare.com by McMaster University on 12/09/14 For personal use only.

t

ZW-

I

0 0 0 0

'

I

Y

h

I I

0 0 0 0 0 0 0

1,

I I

0 0 0 0 0 0 0 0 0 0

40-

P s 5

20-

4.

0-

I

0-

B.2

1.0

I

10.0 10.4 CALCIUM, mgllOO ml I

I

1

10.1

11.2

I

1

90

-

"X*2 I

+Xtl

8.0.8.D.-

I

9.2

B.0

B2

B.0

I

I

10.0 10.4 CALCIUM.'mgllOO ml

I

I

10.1

111

J

-

D6-

sW-

0

fa

S W -

f 8

B

a

Y

E 60-

E

E

y1

h

f

3

Y

6

5s a 0 10-

6-

2r I

B.2

B.0

10.0 10.4 CALCIUM. mgllOO ml

10.1

112

I

I I 10.0 10.4 CALCIUM, mp1100 ml

I 10.1

I

I

11.2

FIGURE 1. Gaussian (normal) distribution represented in four different ways. Data based on the serum calcium concentrations from 140 males, ages 20 to 29 years. Upper left: frequency histogram plotting number of observations (y-axis) against calcium concentrations (x-axis). Lower left: Gaussian distribution fitted to actual observations plotting relative frequency (y-axis) against calcium concentration (x-axis). Upper right: cumulative frequency distribution plotting cumulative frequency in percent on a linear scale (y-axis) against calcium concentration (x-axis). Lower right: cumulative frequency distribution plotting cumulative frequency in percent on a probability scale (y-axis) against calcium concentration (x-axis).

range becomes more uniform. Equating the range with 100% assigns a percentile number to each 2-2 m u s, dispersion can be characterized by the 10 to 90 percentile range, the 5 to 95 percentile range, or any other interval. Such a value may be preferred over the entire range itself since it tends to vary less between samples and be less dependent on sample size, as the effect of extreme values is reduced. 84

CRC Critical Reviews in Clinical Laboratory Sciences

In the 19th century, Johann Carl F. Gauss observed that random errors encountered in repeatedly making the same measurement without bias have a characteristic dispersion. On the other hand, no variability other than method error truly fits the Gaussian frequency distribution curve, although it may approximate it closely. The Gaussian or normal curve is symmetrical such that the mean, the mode and the median (50 per-

28

5 --- I-80%

RANGE-

24

-

-

20

-

-

100

z

80

-

>

V

16

-

I 0 0 0 l o 0 0

Critical Reviews in Clinical Laboratory Sciences Downloaded from informahealthcare.com by McMaster University on 12/09/14 For personal use only.

N.

12

I I

I 0 0 0 0 0 0

I 0 0 0 0 0 0 I 0 0 0 0 0 0 I 0 0 0 0 0 0 ~ ~

0 0

0 0

0 0

0 0

0 0

6

-

I

I I

0 0

w

$ 405

-

6-

I

~ 0 0 0 0 0 0 0 0

2 60-

Y

I

I o o o o o L

-

E

-

I

2

5 20-

-

I-

0-

0-

I

r

1

I

IC-SO%

I

I

I

I

I

I I 1.0 1.1 BILIRUBIN. mgllO0 ml

I

I

0.2

0.6

I

1.8

I

RANGE-

I

0.2 BILIRUBIN, me1100 ml

I

I

I

0.6 1.0 1.4 BILIRUBIN. mg/lW ml

I

1.I

FIGURE 2. Skewed distribution represented in foul different ways. Data based on the serum total bilirubin concentrations from 144 males, ages 20 to 29 years. Upper left: frequency histogram plotting number of observations (y-axis) against total bilirubin concentrations (x-axis). Lower left: skewed distribution fitted to actual observations plotting relative frequency (y-axis) against bilirubin concentrations (x-axis). Upper right: cumulative frequency distribution plotting cumulative frequency in percent on a linear scale (y-axis) against bilirubin concentrations (x-axis). Lower right: cumulative frequency distribution plotting cumulative frequency in percent on a probability scale (y-axis) against bilirubin concentrations (x-axis). ’

centile) coincide. The mean (Z’J is used to express central tendency. The use of the mean 6) has the advantage that it is less variable than the median or the mode provided the distribution is approximately Gaussian, and, further, it permits the calculation of additional significant numbers. Dispersion can be defined by a single characteristic for all Gaussian curves, the standard deviation (S.D.). In Gaussian curves, X f 1 S.D. comprises

/

68.27% of the data (16 to 84 percentile range), F? 2 S.D. comprises 95.45% of the data (2 to 98 percentile range), and -2 f 3 S.D. comprises 99.73% of the data or almost the full range. As a consequence of this last statistic, a quick estimate of the standard deviation can be obtained by dividing the range by six (three standard deviations on either side of the mean) if the sample is sufficiently large. September 1975

85

URIC ACID

ALKALINE PMOSPMATASE

Critical Reviews in Clinical Laboratory Sciences Downloaded from informahealthcare.com by McMaster University on 12/09/14 For personal use only.

SCOT Nl 25

NI

I TOTAL BILIRUBIN N -

FIGURE 3. Frequency distributions of serum inorganic phosphorus, calcium, glucose, total protein. albumin, and cholesterol concentrations in females aged 30 to 39 years. The shaded area includes only subjects without any complaints and not taking any drugs. The white part of the bars includes those subjects which either had some complaint or were taking drugs (see text for details). Note the similarity of these two frequency distributions in every instance. On top of each panel the respective means and ranges of i 2 standard deviations are given for the entire group of all subjects (white circle) and for the subjects without any complaints or drug intake only (gray circle). Since the distributions are symmetrical, these ranges adequately represent normal values, and there is no statistically significant difference between the two calculated means and ranges except in the case of the albumin ranges. (From Werner, M., Tolls, R.E., and Hultin, J.V.,personal communication.)

Given the uniformity of Gaussian dispersion, a scale can be constructed which transforms the sigmoid cumulative frequency distribution curve (Figure 1, upper right) into a straight line on a probability plot. This probabi2ity scale is plotted against the observed values (Figure 1, lower right). Note that the probability scale is in percentiles (or in fractions of 1) but that the scale has no zero percentile and no 100 percentile value since the Gaussian distribution has no cutoffs and continues to infinity. Figures 3 and 4 illustrate by practical examples some of the points made: Certain parameters, such as those shown in Figure 3, have a symmetrical distribution of individual observations in approximately Gaussian fashion which is well represented 86

CRC fitiml Reviews in Clinbl kbomtory Sciences

FIGURE 4. Frequency distributions of uric acid, urea nitrogen (BUN), total bilirubin, alkaline phosphatase, aspartate transammase (SCOT). and lactate dehydrogenase (LDH)concentrations in females aged 30 to 39 years. See caption to Figure 3 and text for details. Again, no statistically significant difference exists between the entire group of all subjects (white) and the subjects without any complaints or drug intake (gray). (From Werner, M., Tolls, R. E., and Hultin, J. V., personal communication.)

by the mean and the standard deviation. Other parameters, such as those shown in Figure 4, have individual observations distributed with a varying degree of skew. Depending on the degree of the latter, the mean and the standard deviation no 1onger adequately describe the population. Generally, the inclusion of subjects in the sample population who are taking medication as well as those with minor complaints does not appear to affect the distribution characteristics when compared with a similar sample population where those on medication or with minor complaints are excluded. Provided the population is sufficiently large, this pertains regardless of the manner in which these statistics are represented, i.e., whether by a frequency histogram or by the mean and standard deviation. The two specific drugs which were used most commonly by the more than 3,000 subjects in the study on which many of the data presented here

Critical Reviews in Clinical Laboratory Sciences Downloaded from informahealthcare.com by McMaster University on 12/09/14 For personal use only.

are based were thyroid substitutes and oral contraceptives. The effects of both drugs were studied by randomly matching by computer each subject taking medication. No effect of thyroid drugs was demonstrated. This is not surprising since thyroid medication tends to be compensated by the organism by depressing endogenous hormone synthesis. On the other hand, oral contraceptives produced significant effects which will be discussed later. One method of defining normal values described by Hoffmann'.' uses a patient population (routine laboratory tests) as the sample, assuming the large majority of this population will exhibit normal values. A frequency distribution curve can be plotted using the y-axis as the number of subjects and the x-axis as the test results either grouped or nongrouped. This frequency distribution is assumed to represent a mixture of two populations, the healthy and the sick. To separate these populations and to establish the normal range, a minimal amount of statistical exercise is required. First a cumulative frequency graph can be made on normal probability paper, and then a Gaussian curve (straight line) is eye-fitted by constructing a straight line through the points clustered around the 50% point (median). From this line a normal range is selected, for example, those results represented from the 2.5% to the 97.5% point. Note that this method elects to use the observed median (50% point or percentile) of the entire patient population sampled as the estimation of the mean for the healthy portion of the population sampled. This method also assumes that the frequency distributions of the healthy are Gaussian. The immediate availability of the sample and the ease of calculating the normal range render this method attractive. However, a study by Amador' comparing normal ranges obtained by this method (as well as other indirect methods) with those obtained by using a healthy population sample as the reference method revealed that the indirect methods showed significant shifting toward the pathological and their standard deviations were widened to an unacceptably large normal range. In another study, Elveback et al." conclude the Hoffmann method is not acceptable as a method to establish a normal range. In this study, results from seven measured serum constituents were used to establish the various frequency distributions and the statistical indices. Only one constituent

(albumin) showed a Gaussian distribution. A comparison of the measurements of the central tendency, namely, the means for the healthy with the modes for the patient population, was favorable in three constituents (calcium, phosphate, and total protein). Comparisons of the normal ranges were favorable with two constituents (urea and alkaline phosphatase), but there were large differences with five constituents (calcium, phosphate, total protein, albumin, and magnesium), the normal range being considered unacceptably wide with the Hoffmann method.

Effect of Sample Size and of Method Error on the Observed Biological Distributions All sample statistics are only estimates of the parameters existing in the underlying universe of values (population). In addition, analytical reliability introduces uncertainty. Therefore, every point estimate, whether it describes central tendency or dispersion, should ideally be accompanied by confidence intervals. The key elements in these considerations are intuitively evident, namely, that the larger the sample and the smaller the method error, the better the statistical d e s c r i p t i o n of the biological distribution. However, a formal analysis can determine the sample size and the analytical precision required for a desired precision of the biological information. The mean and the standard deviation of a sample have a standard error reflecting their sampling distributions : = - (2) x 6

Standard error of the mean

0-

Standard error of the standard deviation

O

us =

A

fi

(3)

where u is the standard deviation of the population, S is the standard deviation of the sample, and N is the sample size. Analogous formulas for calculating the standard error of the median and other percentiles are found in standard statistical texts. A useful alternative formula for relating sample size to the precision (d) with w h c h the sample mean (x) estimates the population mean is based on the fact that sampling distributions tend to become Gaussian: September 1975

87

Critical Reviews in Clinical Laboratory Sciences Downloaded from informahealthcare.com by McMaster University on 12/09/14 For personal use only.

(4)

Selecting the desired degree of confidence, z (normalized deviate of the Gaussian distribution), and knowing or inferring from previous experience the standard deviation of the population distribution, u, the required number of subjects to obtain a given precision of the sample mean can be predicted.2 Analytical reliability affects central tendency values mainly through accuracy, and dispersion values mainly through precision. The observed dispersion of biological data is a composite of true biological variance and analytical variance (0:). Their joint variance is determined by:

(4)

It is apparent that if one standard deviation is more than double the other, the larger obscures the effect of the smaller. Thus, if the analytical standard deviation were a fourth of the biological standard deviation, as postulated by Tonks,Z6 the observed biological distribution would hardly be distorted by method error:

Multiple Tats and the Frequency of Abnormal

Results The use of panels of multiple tests in clinical medicine is growing rapidly. When each new test I s added to the battery of tests, the probability of encountering an extreme value in a normal subject increases.’ Thus, every “abnormal” value poses the problem of whether it truly reflects biological abnormality or simply normal statistical variability.” The economical consequences of this dilemma are great as multiphasic health screening of subjects without clinical complaints is applied to large population groups.28 In one study of 1,831 subjects19 receiving a 12-test chemistry panel upon hospital admission, only 40% of all subjects produced findings in all 12 tests within the 95% confidence limits of the “hospital normal concentrations,” while 60% revealed one or more results considered abnormal by that criterion. About half of the latter cases, or about a third of all admissions, were not considered medically relevant. Mosey these “abnormal” results were only slightly outside the 88

CRC CHHCrrl Review$ in Clinic01Laboratory Science‘S

normal limits and were not accompanied by B diagnostic finding in the history or physical. Another fourth of the cases with abnormal results, corresponding to about one in six admissions, were explainable by the disease causing hospitalization. Yet another 5% of the cases with abnormal results, or about 1 in 30 admissions, produced an unexpected but medically relevant finding leading ultimately to a definitive diagnosis not recognized previously. No information on the remainder of the cases with abnormal results was given. Frequently, normal values are set to include 95% of normal subjects. The probability of a single

result from a normal subject being within this range is (by definition) 0.95. The probability of two independent results from a normal subject being both within t h i s range is 0.95 X 0.95 or 0.9025, and for N independent results the probability is

PA = 0.95N

(7)

In this fashion the probability, PA,of finding all independent values from a normal subject within the 95% normal range becomes increasingly less than 95% as N increases. If no more than 5% of the normal subjects are to be declared outside normal limits in any one test comprised in a battery of tests, the normal range for each individual test must be liberali~ed.~,’s3 For this, the normal range for each independent test, P,, must be redefined as a function of the number of independent tests, N, in the battery: 0.95 = PBN

(8)

Thus, with only 1 test we find P, = 0.95 (the customary 95% normal range or mean f 1.96 S.D.), with 2 independent tests pB = 0.9747 (97.5% normal range or mean f 2.24 S.D.),with 6 independent tests PB = 0.9915 (99.1% normal range or mean f 2.63 S.D.),and with 12 independent tests PB = 0.9957 (99.6% normal range or mean f 2.86 S.D.). It is apparent that this approach rapidly extends the.normal range to the point where truly abnormal conditions may be overlooked. Combining Formulas 7 and 8 shows the tradeoff between the occurrence of false positives and false negatives as a function of N:

TABLE 1A

The probabilities, PA, of obtaining all normal results in normal subjects with test batteries of varying size, N, are given for more and less stringent normal ranges, PB. In addition, the probabilities of encountering different numbers of abnormal results are detailed for the 90% normal range.

Critical Reviews in Clinical Laboratory Sciences Downloaded from informahealthcare.com by McMaster University on 12/09/14 For personal use only.

Number of independent tests in the battery, N Normal probability range assigned each test, PB

Number of abnormal results in the battery

4

8

12

16

20

85.1 44.0 18.5 32.9 21.5 14.2 5.1 1.4 0.3

81.8 35.8 12.2 27.0 28.5 19.0 9.0 3.2

PA(%) 99% (51 f 2.57 SD) 95% (X f 1.96 SD) 90% (sr f 1.65 SD)

96.1 81.4 65.6 29.2 4.9 0.4

0 0 0 1 2 3 4 5 6

92.3 66.3 43.0 38.3 14.9 3.3 0.5

88.6 54.0 28.2 37.7 23.0 8.5 2.1 0.4

7

PA = PBN

0.9

0.2

TABLE IS

(9)

This relationship is illustrated in Tables 1A and 1B. The probability PA of finding all independent values from a normal subject within the normal range given is listed in Table 1A as a function of different normal probability ranges assigned each independent test, PB, and of different numbers of tests in the battery, N. The probability, PA, of finding all normal results decreases as the normal range, PB, narrows from 99% to 90% and as the number of tests, N, increases from 4 to 20. In addition, probabilities of finding a specified number of abnormal results in the test battery are listed for the 90% normal range assigned to each test. It is seen that with a battery of four independent tests and a 90% normal range, normal subjects in 65.6% of the cases will show no abnormal result, in 29.2% of the cases there will be one abnormal result, in 4.9% of the cases there will be two abnormal results, and in 0.4% of the cases there will be three abnormal results. Table 1B conversely lists probabilities of PB for test batteries of given sizes, N, while keeping a fixed probability, PA. Returning to the study of hospital admissions quoted earlier,29 about 85% of all admission panels were not expected to contain an abnormal

The confidence coefficient, PB, needed for each test of a diagnostic variable in order to insure a probability of PA = 0.90* that a normal subject be declared normal on each of N independent tests**

N 4 8 10 12 16 20

PB

Twesided significance level (each test)

0.9740 0.9869 0.9895 0.9913 0.9934 0.9941

0.0260 0.0131 0.0105 0.0087 0.0064 0.0053

x i 2.23 S.D.

X * 2.49 S.D. 51 i 2.56 S.D. 51 f 2.63 S.D. 51 i 2.13 S.D. 51 i 2.79 S.D.

*(1 - P) = 0.10 can also be restated as the probability that an individual will be incorrectly declared abnormal on one or more independent tests **It is clear that some tests in a panel may be mutually correlated. The role of the independence assumption is to make the nominal probabilities of being right overall (or of being wrong in one or more tests) conservative. That k. if correlations exist among the tests, then performing each. test at the listed significance levels will yield a probability that an individual be correctly declared normal on all N tests greater than 0.90.

finding explainable by the disease causing hospitalization. However, in practice only about 40% of all subjects, or slightly less than half the theoretically expected number, produced all normal findings. September 1975

89

Considering the approximate assumptions made in the analysis, this outcome surprisingly well matches the value PA = 54% given in Table 1A for a battery of 12 tests and for a normal probability range of 95%, as used in the quoted study.

Critical Reviews in Clinical Laboratory Sciences Downloaded from informahealthcare.com by McMaster University on 12/09/14 For personal use only.

BIOLOGICAL VARIABILITY The sources of biological variability are genetic, environmental, or both. If the same individual is observed repeatedly over a period of time, differences caused by age,’ 2-3 seasons,’ circadian rhythm: food intake,4 exercise:’ and even are found. Such intraindividual body posture4 variability is mainly due to environmental effects, although it should be recognized that genetic disposition may determine the degree of responsiveness. For instance, the skin tan of the same individual varies t o a different degree as a result of the amount of seasonal sun exposure (environmental) and of the melanophore responsiveness (genetic). If different individuals are observed, interindividual variability is found. The latter may result both from genetic causes, such as sex32-36 or race, or from environmental causes, such as local climate or dietary habits. The experimental design for the assessment of intraindividual differences on the one hand, and of interindividual differences on the other hand, requires fundamentally different considerations. Intraindividual differences are measured in longitudinal studies4 3-4 where the variations occurring when the same individuals are sampled repeatedly over time are assessed for each individual (obviously the differences between these individuals represent interindividual variation). In such measurements, particular attention must be given to the sampling interval to avoid an artifact called “aliasing” which may project spurious rhythmicity.’ Total variability observed in longitudinal studies of an individual in addition to intraindividual biological variability includes method error (analytical variability). Interindividual differences usually are measured in cross-sectional studies, which by definition comprise multiple individuals, whether each individual be sampled only once or repeatedly.32-3S~4’7-49 In such measurements, the Occurrence of skewed and indeed “atypical” distribution curves poses special problems which will be discussed later. Total variability observed in cross-sectional studies in addition to interindividual biological variability

’,’







14’



90

CRC CWiml Reviews in Clinical Laboratory Sciences

includes intraindividual biological variability and method error (analytical variability). It is important to recognize that no prospective inferences pertaining t o specific individuals may be drawn from such studies, e.g., predicting what observations will be made 40 years hence in currently 20-year-old people by analyzing observations in currently 60-year-old people. Characterization of Individuals For many biological parameters there are variations which may make a single determination misleading or even meaningless? 8-4 s4 Each person possesses a characteristic variability for measurable properties, For example, in different subjects, uric acid in blood may be set at a high or at a low concentration, and independently of this typical “average level” may vary much or little. In statistical terms the functional level may be described by a measure of central tendency, and variability by a measure of dispersion, and together these two terms help to define biochemical individuality. Of course, such simplified characterization omits description of the sequence or the harmonics according to which the analyzed property fluctuates. The difficulties associated with the latter problem have been referred to above. These concepts are illustrated in Figure 5. Urinary alkaline phosphatase was measured in 12-hr intervals over a period of several days in eight healthy subjects. The amount of excreted enzyme is plotted on the x-axis. The y-axis plots the cumulative frequency on a probit scale. Observations for each individual are connected by solid lines. Each of the eight individual curves approximates a straight line, implying Gaussian distributions. The curves differ both with respect to their 50 percentile value (central value) and to their slope (dispersion), but a clustering among intermediate values with only one low excreter and two high excreters is noted. Relatively few longitudinal studies have been published. Until recently the large errors inherent in many laboratory methods prevented the analysis of the comparatively small intraindividual fluctuations. In addition, variability evolves only slowly for many parameters so that protracted studies become necessary. This poses problems of logistics in handling subjects much the same as in maintaining standardized analytical methodology. One longitudinal has been designed to 93

’-’

I

i

I

I 1

70

Critical Reviews in Clinical Laboratory Sciences Downloaded from informahealthcare.com by McMaster University on 12/09/14 For personal use only.

%

10

UNITS/12 HOURS FIGURE 5. The urinary excretion of alkaline phosphatase in eight healthy subjects plotted as cumulative distributions on a probability scale (y-axis). Excreted enzyme (units/l2 hr) is on a logarithmic scale (x-axis). (From Werner, M., Heilbron, D., Maruhn, D., and Atoba, M., Clin. Chim.Acta, 29,437, 1970. With permission.)

arrive at estimates of intraindividual variability by subtracting analytical variability from the total viriabgity observed in a 1040 12-week study of 68 healthy adults in which 15 of the more common laboratory tests were performed weekly. This intraindividual variability represented by the fluctuations from the individual mean for each subject ranged from about 2% for Na, C1, Ca, and Mg up to about 10% for uric acid, urea nitrogen, SGOT, and LDH. Variability between subjects exceeded the intraindividual variability in 12 tests, but not for C1, C 0 2 , and K. A major potential application of longitudinal measurements has been the proposal to generate individual “normal values”’ which would be stored in a national medical data bank. Once a person has established these values, comparison of subsequent testing in clinical or diagnostic situations would be done against the prior measurements. There are considerable practical difficulties with the execution of such a project, but on a conceptual level it should be clearly recognized that individual “normal values” can only complement but not replace the biological reference values obtained from cross-sectional studies. To appreciate the usefulness of individual normal values, it is necessary to develop an



understanding for the biological rules which govern what may be termed the “hierarchy of variability”: First, biochemical individuality should be more evident for traits with high rather than with low variability. In other words, there can be n o individuality if everybody is the same. Generally, there is an inverse connection between the variability of a property and its survivial value.” For instance, serum sodium concentration which controls tissue hydration is maintained within very narrow regulatory limits at all times, whereas the concentration of serum alphalantitrypsin may vary widely and indeed be genetically absent in some subjects. Second, homeostatic mechanisms vary within wider limits than the parameters controlled by them. For instance, osmolar clearance fluctuates as it adjusts to maintain a constant blood osmolality. Third, scatter correlates often with functional level. For instance, the diurnal temperature fluctuations increase in most febrile state^.'^ A second example is given in Figure 5 , where the dispersion (slope of cumulative frequency curve) appears dependent on the functional level (50 percentile value) of urinary enzyme e ~ c r e t i o n . ’ ~Finally, a word need be said about the variability of variability itself. Isosthenuria illustrates the loss of variability used for diagnostic purposes. The functional study of such variability of variability is truly the conceptual basis of all tolerance tests.

Characterization of Groups Cross-sectional studies of the variability between different subjects are used to derive normal values adjusted for age and sex. Such specific normal limits may be narrower than general ones, improving diagnostic discrimination. For instance, a cholesterol concentration of 300 mg/dl will have a different interpretation in a 30-year-old woman (above norm) and in a 60year-old man (within norm) if the appropriate, specific normal values are defined. Other sources of interindividual variability, such as race, dietary habits, and climate, are not extensively investigated and are often difficult to separate from each other. Fortunately, such variables appear to have a much lesser effect on interindividual variability compared with the primary determinants, age and sex. Just as not every laboratory parameter is equally subject to intraindividual variability, the effects of sex and age in cross-sectional studies are September 1975

91

Critical Reviews in Clinical Laboratory Sciences Downloaded from informahealthcare.com by McMaster University on 12/09/14 For personal use only.

activity occur for other commonly assayed enzymes such as lactate dehydrogenase (LDH)and aspartate aminotransferase (SCOT). It should be noted that the age drift in both cases cannot be adequately represented by a rectilinear fit. Generally, the rate of change of variables influenced by age increases at puberty in both sexes and again at the menopause in females. Therefore, it is inappropriate to disregard the influence of these important life stages in fitting the age regression of a biological variable to a straight line as is sometimes done. Figure 9 illustrates age drift occurring in both sexes for 12 parameters commonly assayed together in a panel.” This figure again highlights the inadequacy of representing universal normal values for all individuals by a single range. Other points relevant specifically to the problems of sampling “normal” subjects and to the statistical description of non-Gaussian distributions have been discussed earlier (Figures 3 and 4). Even while it is possible to deal in practical ways with non-Gaussian cross-sectional studies, it would be preferable to reduce observations to the

not of equal importance for every laboratory parameter (Tables 2 and 3). On the one hand, the concentrations of Ca, Na, and C1 in serum are relatively constant throughout subjects of both sexes and all ages. On the other hand, phosphorus (Figure 6) or cholesterol concentration is serum (Figure 7) illustrates age and sex dependence. Phosphorus drops markedly at puberty in both sexes, continues to fall in adult males, but rises again after the menopausal age in females. Cholesterol concentration is similar in children of both sexes, increases more markedly at the pubertal age, but flattens its rise later in males while displaying a second marked increase after the menopausal age in females. Alkaline phosphatase activity in serum falls precipitously at puberty (Figure 8). However, no specific age cutoff can separate prepuberty and postpuberty subjects because sexual maturation occurs at an individually different age. Thus, in the 14-year-old group two completely separate subpopulations of alkaline phosphatase activities are distinguished, one with values typical for children and one with values typical for young adults. Similar but less marked pubertal drops in serum

TABLE 2 Effect of Puberty and Menopause in Female8 and in Males on 11 Serum Constituents* Females (years)

Constituent

Males (years)

0-12

30-39

0-12

VS.

vs.

VS.

vs.

30-39

60-69

20-29

60-69

Calcium Inorganic phosphorus Total protein Albumin Urea nitrogen (BUN) Uric acid Cholesterol Total bilirubin Alkaline phosphatase Lactate dehydrogenase (LDH) Aspartate transaminase (GOT)

N.S.

tt $4

N.S. ttt ttt ttt N.S. ttt ttt ttt

t

+++

ttt ttt N.S. ttt ttt ttt +&+

+++ +&+

20-29

++ +++ ++&

444

ttt N.S.

ttt

+++

t tt N.S.

*The direction and statistical significance of the age shifts are indicated by arrows (one arrow for P < 0.05; two arrows for P < 0.01; three arrows for P < 0.001). Note that statistical significance is more dependent on the uniformity of the shift than on its extent. The latter is illustrated in Figure 9. Modified after Werner, M.,Tolls, R. E., Hultin, J. V.. and Mellecker, J., Z Klin. Chem., 8, 105, 1970. 92

CRC Oitical Reviews in Clinical Laboratory Sciences

TABLE 3 Sex Differences of 11 Serum Constituents in Prepubertal, Sexually Mature, and Postmenopausal Subjects*

Critical Reviews in Clinical Laboratory Sciences Downloaded from informahealthcare.com by McMaster University on 12/09/14 For personal use only.

Age groups (years) Constituent

0-12

20- 29

Calcium Inorganic phosphorus Total protein Albumin Urea nitrogen (BUN) Uric acid Cholesterol Total bilirubin Alkaline phosphatase Lactate dehydrogenase (LDH) Aspartate transaminase (GOT)

N.S. F

Normal values: theoretical and practical aspects.

NORMAL VALUES: THEORETICAL AND PRACTICAL ASPECTS Authors: Mario Werner Critical Reviews in Clinical Laboratory Sciences Downloaded from informaheal...
2MB Sizes 0 Downloads 0 Views