ClinicalRadiology (1992) 46, 344-347

Observer Variability in the Sonographic Measurement of Renal Length in Childhood M. A. S A R G E N T and B. P. M. W I L S O N

Royal Manchester Children's Hospital, Pendlebury, Manchester The intra- and interobserver variability of the measurement of renal length in an unselected series of children attending for renal tract ultrasound were estimated using a standard technique and test-retest protocol. The standard errors of measurement were calculated and the limits of agreement derived. The results indicate that for normal kidneys there is a 95% probability that a second measurement by one observer will be within 4.6 mm of his first reading. Similarly, a single measurement will be within 7.8 mm of a reading by a second examiner. For abnormal kidneys in this series the intraobserver variability was similar to normal kidneys but interobserver variability was greater. Observer error in the measurement of renal length is equivalent to between 2 and 3 years normal growth for children older than 1 year. Sargent, M.A. & Wilson, B.P.M. (1992). Clinical Radiology 46, 344-347. Observer Variability in the Sonographic Measurement of Renal Length in Childhood

Accepted for Publication 20 June 1992

Measurement of kidney size is an integral part of the assessment of the renal tract in childhood. Comparison with normal values aids in the diagnosis of both unilateral and bilateral renal disease. Serial measurements may be used in the follow up of known pathology and for confirmation of normal renal growth. Sonographic measurements are now regarded as the standard for assessment of renal size in children, and numerous published normal values compare sonographic length or volume with age, sex, height and weight. The present study was undertaken to estimate both the intra- and interobserver error in measurement of renal length in children using one real-time ultrasound system and a single agreed sonographic technique. M A T E R I A L S AND M E T H O D S An unselected consecutive series of children attending our department for renal tract ultrasound was studied. In r a n d o m order, one of the two authors, each highly experienced in paediatric ultrasound, performed his usual examination of the renal tract but omitting his usual measurements. At the end of this part of the examination, he made three longitudinal coronal measurements of each kidney from a posterior oblique approach with the patient in the lateral decubitus position. The system's electronic calipers were used to measure the longest longitudinal diameter of the kidney on each of three images. Using the same technique, measurements were then made by the second observer who was blind to the results of the first observer. All examinations were performed using the same ultrasound system (Diasonics DS1-RF) with a 5.0 M H z mechanical sector transducer. With t h i s system, the measured distance is not displayed until the two caliper positions have been formally selected by the operator. Statistical analysis was by means of analysis of variance Correspondence to: Dr M. A. Sargent, Radiology, British Columbia Children's Hospital, 4480 Oak Street, Vancouver V6H 3V4, Canada.

and Student's t-test for differences between matched pairs with a significance level o f P < 0.05. The standard error of measurement (Smeas) was estimated from S m e a s = Sd/ x/2 where Sd = standard deviation of differences between paired measurements [1]. Verbal informed consent was obtained before each patient was studied and there were no refusals. RESULTS Thirty-three children (16 girls and 17 boys) were admitted to the study. Their ages ranged from 1 day to 15 years. There were 48 sonographically normal kidneys, and 18 abnormal kidneys were identified in 11 children. Abnormalities included hydronephrosis, cystic disease, calculus and glomerulonephritis. There being no gold standard for the renal lengths, the best estimate was taken to be the median of the six readings for each kidney. F r o m these, the mean length of normal kidneys was 7.61+ 1.5 cm and of abnormal kidneys 8.9 +_2.1 cm. For the 22 patients with two normal kidneys, the mean renal length was 7.5 __ 1.5 cm with the left slightly larger than the right (mean difference 1.9 ram; 95% CL: 0.3, 3.5 mm). For normal kidneys there was no detectable difference in measurement error of left and right kidneys and no correlation of intraobserver error with renal size. Hence all kidneys were grouped together as either normal or abnormal. As an example of the range of measurement variability encountered in this study, the correlation curve (Fig. 1), histogram (Fig. 2), and scattergram (Fig. 3) each depict the distribution of differences between the observers' median measurements of normal kidneys. Similar distributions were found among all the parameters assessed for both intra- and interobserver variation. Normal Kidneys

(1) Intraobserver variability. The differences between measurements 1 and 2, and 2 and 3 of each observer are

345

OBSERVER VARIABILITY IN KIDNEY MEASUREMENT

12

Table 1 - lntraobserver variability: mean differences between readings of renal length for each observer

11

Observer

"EIO

s

t-

9 t-

"~

8


6 ..D

Reading

Mean (mm)

Sd (mm)

Smeas(mm)

Normal kidneys A 1-2 2-3 B 12 2 3

1.25 --0.44 --0.08 --0.25

2.33 2.70 2.32 2.69

1.65

Abnormal kidneys A 12 2-3 B 12 2-3

-1.11 1.00 0.22 0.72

2.42 2.77 3.89 3.04

1.91 1.64 1.90

1.71 1.96

2.75 2.15

©

5

Sd, Standard deviation of differences between individual measurements; Smeas, standard error of measurement; n =48 for normal and n= 18 for abnormal kidneys.

f/... 4

5

i

t

~

I

i

I

i

1

i

I

6 7 8 9 Observer B renal length (cm)

I

i

11

10

Fig. 1 Interobserver variation: median measurements of 48 normal kidneys by both observers demonstrating scatter around the line of equality. The same data is used for Figs 2 and 3.

Table 2 - Interobserver variability: mean differences between measured renal lengths, Observer A - Observer B

Observer

Reading

Mean (mm)

Sd (mm)

Smeas(mm)

Normal kidneys A-B 1 2 3 Median Maximum

0.96 --0.38 --0.21 0.04 0.38

3.95 3.97 4.81 3.61 4.01

2.79 2.80 3.40 2.55 2.84

Abnormal kidneys A B 1 2 3 Median Maximum

0.39 1.72 1.44 1.83 0.56

6.66 8.52 7.05 6.58 7.29

4.71 6.03 4.98 4.65 5.15

.13

E Z

See Table 1 footnote. -t2-11

10

9

8

7

6-5

4-3

2

1 0

1

2

3

4

5

6

7

Difference, Observer A - Observer B (mm)

Fig. 2 Interobserver variation: histogram of measurement differences between median lengths of normal kidneys.

9

g

+ 2 S.D.

7

E

5

m

3

>

1

mm mm n



mm



m•

umj

mmm -n--

o I

--1




-5

,13

-7



....







mj



-2

S.D.

I

i



o -9 -11 -13

I

4

i

~

5

I

I

I

6

7 8 9 Renal length (cm)

I

I

I

J

I

10

I

I

11

I

t

12

Fig. 3 - Interobserver variation: scattergram of measurement differences between median lengths of normal kidneys plotted against the means of each pair of readings. Mean difference = 0.04 + 3.6 mm.

p r e s e n t e d in T a b l e 1. F o r O b s e r v e r A t h e r e w a s a s m a l l r e d u c t i o n f r o m t h e first to s e c o n d r e a d i n g s ( m e a n c h a n g e 1.25 m m ; 9 5 % C L : 0.1, 2.3 m m ) , b u t a n a l y s i s o f v a r i a n c e showed no overall difference between the three measurem e n t s o f either o b s e r v e r . T h e r a n g e o f differences b e t w e e n t h e m a x i m u m a n d m i n i m u m o f the t h r e e m e a s u r e m e n t s o f e a c h k i d n e y was 1 to 8 m m for O b s e r v e r A a n d 0 to 12 m m f o r O b s e r v e r B ( m e a n _ + S . D . = 3 . 4 + 1.6 m m , a n d 2.9_+2.0 m m respectively). F o r the first a n d s e c o n d r e a d i n g s o f e a c h o b s e r v e r , t h e intraobserver standard errors of measurement for normal k i d n e y s w e r e S m e a s = 1.7 m m a n d 1.6 m m f o r O b s e r v e r s A a n d B respectively. (2) Interobserver variability. T h e differences b e t w e e n the first, s e c o n d , third, m a x i m u m a n d m e d i a n r e a d i n g s o f the t w o o b s e r v e r s a r e p r e s e n t e d in T a b l e 2. D i f f e r e n c e s b e t w e e n r e a d i n g s o f t h e t w o o b s e r v e r s w e r e u p to 12 m m for b o t h the first a n d m e d i a n m e a s u r e m e n t s . T h e m e a n d i f f e r e n c e b e t w e e n o b s e r v e r s i m p r o v e d f r o m t h e first to t h i r d r e a d i n g s , b u t this was n o t a c c o m p a n i e d by a n i m p r o v e m e n t in v a r i a n c e . T h e m a x i m u m r e a d i n g s o f the t w o o b s e r v e r s s h o w e d s i m i l a r v a r i a n c e to the i n d i v i d u a l measurements. As expected, the smallest interobserver v a r i a t i o n in m e a s u r e d l e n g t h w a s b e t w e e n t h e m e d i a n s o f t h e t w o o b s e r v e r s ( m e a n d i f f e r e n c e = 0.04 + 3.6 m m ) .

346

CLINICAL RADIOLOGY

Interobserver standard errors of measurement estimated for the first and median readings were Smeas = 2.8 m m and 2.6 m m respectively.

Abnormal Kidneys The findings for intra- and interobserver error in the measurement of abnormal kidneys are included in Tables 1 and 2. Intraobserver error was similar to the sonographically normal kidneys (Smeas for the first two readi n g s = 1.7 m m and 2.8 m m for Observers A and B respectively). Interobserver error in this series was greater for abnormal kidneys (Smeas = 4.7 m m for both the first and median readings). DISCUSSION Ultrasound is widely accepted as the standard by which renal growth should be assessed in childhood. Edell et al. [2] summarize the numerous normal ranges which compare renal length and renal volume with other parameters such as age, sex, height and weight. Much of the original work and some of these standards were derived from studies using static scanners or automated ultrasound equipment. Little has been published on the variability of measurement using these systems and there is only one comparable other study of observer variability in realtime sonography [3]. There are m a n y potential sources of error in sonographic measurement, but if there is adequate quality control of the ultrasound system the major variable can be expected to be observer error [4]. It is important to establish the degree of observer error in order to determine what constitutes a significant change in size given the expectation of normal growth. In this study we have utilized a test retest design to assess both the intra- and interobserver error in the measurement of children's kidneys over a short period of time. Day to day biological variability has not been estimated. Bland and Altman [5] demonstrated that the correlation curve (Fig. 1) and correlation coefficient are poor means of assessing the reliability of clinical measurement. They recommend the scattergram (Fig. 3) as the best graphic means of demonstration of measurement error and its relation to absolute size. The standard error of measurement (Smeas) employed in the present study is the usual means of numerical estimation of measurement error [1, 6]. Although this study comprises a relatively small group of children examined by just two observers, we believe the findings can be applied more widely since the use of an agreed systematic method of measuring the kidneys should give observer error near the minimum to be expected in routine practice. The sonographic technique employed was chosen because it was the method by which the two observers agreed that they could most consistently identify both poles of the kidneys in single views with the equipment used. Different transducer types m a y require alternative imaging planes. Using the described technique, we have shown that the standard deviation of the differences (Sd) between repeated measurements of normal kidneys by a single observer was of the order of 2.5 mm, while between two experienced observers it was approximately 4 mm. Theoretically, if a single measurement is made then there is a

95% probability that it will be within 1.96(Sd) of a second measurement and within 1-96(Smeas) o f the 'true' value where S m e a s = S d / x / 2 [1]. Thus based on our data, a single measurement o f a normal kidney by one examiner can be expected to be within 4.6 m m of a second reading by the same person. Similarly, a reading by one observer will be within 7.8 m m of a reading by another observer and within 5.5 m m of the 'true' value of the two observers. For abnormal kidneys in this series, intraobserver measurement error was similar, but interobserver error was greater than for normal kidneys. The 1.96(Sd) limit for single readings would be about 13.5 m m between observers. This difference is thought to be due to the technical diffÉculty in defining the margins of abnormal kidneys. We noted a tendency for between-observer error to be greater for smaller normal kidneys (Fig. 3). While a larger sample might identify a clear relationship between error and renal length, it is more practicable for clinical purposes to assume error to be independent of the size. M a n y sonographers are taught to record the largest of three consecutive measurements but we found that our m a x i m u m readings were no less variable between observers than the individual measurements. As would be expected, the interobserver variability of the median measurements was less than the other values examined. Ideally every institution following renal growth sonographically should undertake a study such as this to estimate in-house measurement error. We recommend that a consistent imaging plane be adopted for the followup of each patient in order to reduce the potential for error due to technique. We suggest that if two separate measurements lie within 2 (Sd) of each other then there is no value to further readings and their mean can be taken as the appropriate measurement. Practically, our data indicate that the mean of two readings lying within 5 mm of each other can be recorded. Such a method should be both repeatable and reproducible. Using state of the art equipment and three observers but fewer patients, Schlesinger et al. [3] found intraobserver error broadly similar to ours with the operators allowed to select their own imaging plane for each patient. In contrast to our findings, these authors found less intraobserver variation in the measurement of the right than the left kidney. Using the means of measurement recording we recommend, these authors showed interobserver variation virtually identical to our median readings. Population studies using sonography have shown renal length to increase by about 1.5 m m per m o n t h in children under 1 year of age and between 2 and 3 m m per year thereafter [7]. Renal length increases by about 5 m m for each 10 cm increase in height [8]. Our 2 (Sd) limits of 4.6 and 7.8 m m for one or two observers indicate that the measured length of sonographically normal kidneys may be static for between 2 and 3 years in children older than 1 year before it can be attributed to growth failure rather than to measurement error, with correspondingly wider limits for abnormal kidneys. Measurement error should be considered in the evaluation of renal growth with time. This m a y simply be done by plotting an earlier recorded renal length with its associated 2 (Sd) error bars on a graph of normal values and extrapolating them along the predicted growth lines. If a new measurement lies within these extrapolated limits then it is compatible with normal growth.

OBSERVER VARIABILITY IN KIDNEY MEASUREMENT

REFERENCES 1 Cameron N. The measurement of human growth. London: Croom Helm, 1984:104 106. 2 Edell SL, Kurtz AB, Rifkin MD. In: Goldberg BB & Kurtz AB, eds. Atlas of ultrasound measurements. Chicago: Year Book Medical Publishers, 1990:146-160. 3 Schlesinger AE, Hernandez RJ, Zerin JM, Marks TI, Ketsch RC. Interobserver and intraobserver variations in sonographic renal length measurements in children. American Journal of Roentgenology 1991;156:1029-1032. 4 Burns PN, Waldroup L, Pinkney MN. In: Goldberg BB & Kurtz AB, eds. Atlas of ultrasound measurements. Chicago: Year Book Medical Publishers, 1990:1-18.

347

5 Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;8476:307 310. 6 British Standards Institution. Precision of test methods I." Guidefor the determination and reproducibility of a standard test method (BS5497, part 1). London: BSI, 1979. 7 Rosenbaum DM, Korngold E, Teele RL. Sonographic assessment of renal length in normal children. American Journal of Roentgenology 1984; 142:467-469. 8 Dinkel E, Ertel M, Dittrich M, Peters H, Berres M~ SchulteWissermann H. Kidney size in childhood: sonographical growth charts for kidney length and volume. Pediatric Radiology 1985; 15:3843.

Observer variability in the sonographic measurement of renal length in childhood.

The intra- and interobserver variability of the measurement of renal length in an unselected series of children attending for renal tract ultrasound w...
348KB Sizes 0 Downloads 0 Views