Adolescent Depression and the Child Behavior Checklist JOSEPH M. REY, PH.D., F.R.A.N.Z.C.P., AND ALLEN MORRIS-YATES, B.A. (HONS.) Abstract. Using receiver operating characteristic methods, the authors planned to assess the diagnostic accuracy of a measure of depression extracted from the Child Behavior Checklist in a group of 667 referred adolescents with DSM-III diagnoses. This depression scale was based on a depression factor found by Nurcombe et al. in Child Behavior Checklists from adolescent inpatients. Receiver operating characteristic analysis showed that Child Behavior Checklist-Nurcombe scores were able to discriminate between subjects with and without major depression with an accuracy comparable to that reported for the Dexamethasone Suppression Test (area under the receiver operating characteristic curve = 0.78). l. Am. Acad. Child Adolesc. Psychiatry, 1991, 30, 3:423-427. Key Words: depression scales, Child Behavior Checklist, receiver operating characteristics, adolescents. The work group to revise classification and diagnostic criteria in child and adolescent psychiatry for DSM-IV has recently stated that "symptoms of depression that are of sufficient magnitude to meet RDC [Research Diagnostic Criteria]IDSM-III criteria occur almost entirely in association with other disorders (comorbidity), the only exception to this rule being adolescents with acute onset of depressive psychopathology and good premorbid function ... " (Shaffer et aI., 1990, p. 834). This appraisal was reinforced by Nurcombe et al. (1989) and Seifer et al. (1990) who found no support for the existence of a "pure" depressive disorder in children but could identify a depressive syndrome in adolescents using multivariate analysis of parents' responses on the Child Behavior Checklist (CBCL) (Achenbach and Edelbrock, 1983). Nurcombe and his colleagues (1989) reported that principal components and cluster analysis of CBCL ratings from a series of 216 adolescent inpatients revealed a depressive cluster. A comparison between the 23 patients identified by this cluster and 23 age- and sex-matched controls with low scores on this factor showed significant differences in Children's Depression Inventory (CDI) scores (Kovacs, 1981). The authors concluded that their findings were consistent with the concept of a categorically distinct "nuclear" depression and argued that, since the internal consistency of the items making up the factor was high, those items were promising as a scale to measure depression in its own right.

Accepted November 15, 1990. Dr. Rey is Director, Rivendell, Child Adolescent and Family Psychiatric Services, Royal Prince Alfred Hospital, Concord West N. S. W., Australia. Mr. Morris-Yates is research staff at the Clinical

Research Unit for Anxiety Disorders, University of New South Wales at St. Vincent's Hospital. lon Plapp helped with diagnosis and comments on successive drafts. Thomas Achenbach commented on an earlier version of the paper. Elzbieta Schrader helped with statistical analysis. This work was partly funded by the National Health and Medical Research Council of Australia. Reprint requests to Dr. Rey, Rivendell Adolescent Unit, Child Adolescent and Family Psychiatric Services, Royal Prince Alfred Hospital, Hospital Road, Concord West N. S. W. 2138, Australia. 0890-8567/91/3003-0423$03 .OO/O© 1991 by the American Academy of Child and Adolescent Psychiatry. l.Am.Acad. Child Adolesc. Psychiatry, 30:3, May 1991

If the cluster of symptoms described by Nurcombe et aJ. (1989) can in fact identify adolescents with "nuclear" depression, scores on a scale made up of these items should accurately distinguish patients diagnosed as suffering from DSM-III major depression from those with other DSM-III disorders. For tests that do not produce a binary result (e.g., illness present/absent) but a range of scores, like most psychiatric rating scales, analysis of the receiver operating characteristic (ROC) of signal detection theory is the only technique that provides an overall index of diagnostic accuracy that is not influenced by decision biases and prior probabilities (Swets, 1988). Because of this, ROC analysis places the performance of a variety of tests on a common, easy to interpret scale. These methods have been used in medicine, particularly in radiology, for some time. Recehtly, Murphy et al. (1987) and Hsiao et al. (1989) have emphasized the relevance of ROC analysis for psychiatry. This technology appears to be an appropriate means of testing whether the above mentioned cluster of symptoms can identify patients who suffer from major depression with a degree of accuracy sufficient to be clinically useful (Costello and Angold, 1988; Mossman and Somoza, 1989). Testing this hypothesis and the application of ROC techniques to this type of problem is the object of the present study.

Method

Subjects The cohort used in this study consisted of 667 adolescents, aged 12 to 16 years, referred for assessment to the Rivendell Adolescent Unit, Sydney, Australia, between 1983 and 1986, and for whom DSM-III diagnoses were available at the time of analysis. All these youngsters and their parents had completed the CBCL and other questionnaires before the clinical assessment, following a standardized procedure described in detail elsewhere (Rey et aI., 1989). The age and sex distribution of the subjects are presented in Table 1. Socioeconomic status (SES) ratings were made on the basis of the breadwinner's profession, according to Australian norms (Daniel, 1983). Twenty-two percent of the subjects were from the top three SES levels, 60% from the three middle levels, and 18% from the two lower SES levels. 423

REY AND MORRIS- YATES TABLE

I. Age and Sex Distribution of the Subjects Girls

TABLE

N

%

N

%

12

39 74 76 52 39 280

14 26 27 19 14

66 122 106 65 28 387

17 32 27

14 15 16 Total

17 7

Diagnosis

The diagnostic process has been described in detail by Rey et al. (1989). In summary, two senior clinicians made independent DSM-Ill diagnoses using all the information available in the file, including detailed reports of psychiatric interviews with the child and the family, school reports, reports of psychological and educational testing, and parents' and children's item responses in the questionnaires. Once the independent DSM-IlI diagnoses were made, the cases in which there was disagreement were reviewed by the same two clinicians who, after further consideration of the available data, made a joint decision about which diagnostic criteria were met by the child and the consequent DSM-IlI diagnosis (consensus diagnosis). This procedure was carried out with the first 367 cases, which were the object of a reliability study (Rey et al., 1989), while the other 300 cases only had diagnoses made by consensus by the same two clinicians, following the same procedure but without prior independent diagnosis. Chance corrected agreement (kappa) for the independent diagnoses were: overall, 0.59; major depression, 0.57; dysthymic disorder, 0.36; separation anxiety disorder, 0.80. Estimated reliability of consensus diagnoses (kappa with Spearman-Brown correction for two raters; Guilford, 1954) were: overall, 0.74; major depression, 0.73; dysthymic disorder, 0.53; separation anxiety disorder, 0.89. The number of subjects in each diagnostic group is presented in Table 2. Ten cases (1.4%) were excluded from analysis because more than 10% of item responses were missing. A case was included in the major depression or dysthymia categories irrespective of other concurrent diagnoses. Cases were included in the separation anxiety category even if they had other concurrent diagnoses with the exception of major depression or dysthymia. Measures

The CBCL (Achenbach and Edelbrock, 1983) is a parentcompleted checklist designed to provide a standardized description of behavior of children, aged 4 to 16 years. It consists of a Social Competence scale and a Behavior Problem scale. The Behavior Problem scale consists of 118 items that are endorsed according to three response options: not true (rating = 0), somewhat/sometimes true (rating = 1) and very true (rating = 2). The depression factor identified by Nurcombe et al. (1989) consisted of 22 CBCL items listed in Table 2. This scale will be referred to as CBCL-NUR.

424

Item-Total Correlation

Boys

Age 13

2. CBCL-NUR Items and Item-Total Correlations

CBCL Item 13 14 18 30 31 32 35 42 47 50 52 54 56B 75 77 80 91 lOa 102 103 III 112

Confused or seems to be in a fog Cries a lot Deliberately harms self or attempts suicide Fears going to school Fears he/she might think or do something bad Feels he/she has to be perfect Feels worthless or inferior Likes to be alone Nightmares Too fearful or anxious Feels too guilty Overtired Headaches Shy or timid Sleeps more than most children during day and/or night Stares blankly Talks about killing self Trouble sleeping Underactive, slow moving, or lacks energy Unhappy, sad, or depressed Withdrawn, doesn't get involved with others Worrying

0.45 0.37 0.28 0.30 0.49 0.32 0.49 0.22 0.33 0.53 0.44 0.46 0.34 0.38 0.37 0.35 0.40 0.40 0.41 0.60 0.45 0.58

ROC Analysis

Though the results of a diagnostic test are usually presented as a dichotomy where the tested individual is identified as being either well or ill, the actual test will almost always produce a range of outcomes varying from indications of severe illness through to indications that no illness is present. Diagnoses are based on setting a cutoff point beyond which the individual is deemed to be ill. Using that decision criterion, the test will correctly classify a certain proportion of truly ill persons as ill (true positives) and a certain proportion of truly well persons as well (true negatives). Unless it is a perfect instrument (and the criterion against which it is compared is also completely free of error [Kraemer, 1988]), the test will also classify a proportion of truly ill persons as well (false negatives) and a proportion of truly well persons as ill (false positives). The accuracy of a test has commonly been assessed through consideration of its sensitivity and specificity. The proportion of true positives identified by the test defines its sensitivity, while the proportion of true negatives defines its specificity. For all tests, there is a trade off between sensitivity and specificity where increased specificity is obtained at the expense of decreased sensitivity. Choice of the cutoff depends on which aspect of performance is most important. For screening tests, it is usual to sacrifice specificity for the sake of sensitivity, since false positives are usually of less consequence than false negatives. In epidemiological studies where the aim is to obtain an accurate estimate of prevalence, it is more usual to employ a cutoff point that minimizes both the false positives and false negatives. In either case, it is clear that there is no one index of a test's perJ. Am. Acad. Child Adolesc. Psychiatry, 30:3, May 1991

ADOLESCENT DEPRESSION AND THE CBCL

formance, but instead there are 'pairs of sensitivities and specificities for each possible cutoff. In ROC analysis, diagnostic performance can be measured by the area under a curve (AUC) defined by a series of points in a two dimensional space. The X axis of this space represents the false-positive rate (1- specificity) and the Y axis represents the true-positive rate (sensitivity). Each point in the space represents the unique true-positive and false-positive rate of a particular cutoff score (Fig. 1). A cutoff set above the test's highest possible score is represented by the point at the curve's origin in the lower left hand corner of the graph. At this point, no ill individuals would be identified as ill (true-positive rate = 0.0) and no well individuals would be identified as ill either (false-positive rate = 0.0). A cutoff set below the test's lowest score is represented by the point at the curve's destination in the top right hand corner of the graph. At this point, all ill individuals would be identified as ill (true-positive rate = 1.0) as would all well individuals (false-positive rate = 1.0). The diagonal joining the origin and destination points is known as the "random ROC" and represents a test that is no better than chance in its ability to discriminate between well and ill individuals. A test with this ROC will have an AUC of 0.5. The farther to the left a curve is, the larger the AUC and the higher the diagnostic accuracy of the scale. An AUC of 1.0 represents perfect accuracy. The ROC curve and the area under it together define the performance of a test throughout the whole range of cutoff scores for the test. Given the assumption that under some monotonic transformation of the test scores the populations 1.0

-j-----'-------''-----'-----::>---r--:---+

0.8

l:' 0.8

.;: ;l

'iii

.,

c

I

eX 0.4

J.

r :i

I ....

1'/

/

(

If

0.2

i

I

,/ :/

0.0

Results Distribution of Depression Scores The mean score on the CBCL-NUR scale for the referred sample in this study was 14.29. Other parameters of the distribution were: SD = 7.95; median = 14; mode = 15; skewness = 0.44; and range = 0 to 40.

Internal Consistency Split-halfreliability was 0.80, and alpha was 0.84. Itemtotal correlations are presented in Table 2.

Comparison between Diagnostic Groups Three comparisons were carried out: 1. Between patients with major depression and patients with other diagnoses. This represents the ordinary situation that clinicians face when making a diagnosis. Scores on CBCL-NUR in the depressive group would be expected to be significantly higher than those in the other disorders group. 2. Between patients suffering from major depression and from dysthymia. This comparison was chosen because it tests the ability of the scale to differentiate between conditions with a substantial clinical overlap. 3. Between major depression and separation anxiety disorder. Since separation anxiety is one of the best defined and more reliable diagnosis in adolescents (Rey et al. 1989), and since there is a good deal of overlap between measures of anxiety and depression, this contrast is also important to test the performance of the CBCL-NUR scale. Table 3 shows that depression scores are significantly

-j-----,------,r---,-----.-----+

0.0

of well and ill individuals will form two overlapping gaussian (i.e., normal) distributions, it is possible to derive a concise, two-parameter description of the ROC curve. From the two parameters, it is then possible to derive the area under the estimated "binormal" ROC curve (Hanley, 1988; Mossman and Somoza, 1989). The first parameter, delta m (intercept-slope ratio), represents the difference between the population means measured in units of the well population standard deviation. The second parameter, s (slope of curve), is the ratio of the ill population standard deviation to that of the well population standard deviation. The ROC statistics were computed using Dorfman and Alf's RSCORE-II program (Dorfman and Alf, 1969; Swets and Pickett, 1982). The program gives maximum likelihood estimates of the three parameters, delta m, s, and the AUC. Additional explanation of ROC methods can be found in Kraemer (1988), Swets (1988), and Hsiao et al. (1989).

0.2

0.4

0.8

0.8

1.0

1 - Specificity FIG. 1. ROC curves derived directly from the observed data for the discrimination of adolescents with a DSM-III diagnosis of major depression (N = 23) versus adolescents with other DSM-III diagnoses (N = 634) (solid line), adolescents with a DSM-III diagnosis of separation anxiety (N = 57) (dashed line), and with dysthymia (N = 62) (dotted line). l.Am.Acad. Child Adolesc. Psychiatry, 30:3, May 1991

different between the groups and in the expected direction, the differences being smaller between major depression and dysthymia.

ROC Analysis ROC curves for the three comparisons, reflecting the sensitivity and specificity of the scale at each score between o and 40, are presented in Figure 1. A visual examination of the figure shows that scores on the CBCL-NUR scale seem to differentiate most accurately between major depres425

REY AND MORRIS-YATES TABLE

3. Comparison of Depression Scores on CBCL-NUR among Diagnostic Groups (two-tailed t-test)

Group 1. 2. 3. 4.

Major depression All other disorders Dysthymia Separation anxiety

N

Mean

SD

23 634 62 57

22.52 13.99 17.50 14.70

7.77 8.41 8.26 7.87

TABLE 4. ROC Curve Statistics for the Discrimination of Adolescents with DSM-lII Diagnosis of Major Depression from Those with Other DSM-III Diagnoses, with Separation Anxiety, and with Dysthymia

Delta m

s Area under curve (AVe) SE of area

Other Diagnoses

Separation Anxiety

Dysthymia

1.075 1.048 0.782 0.047

0.987 1.015 0.759 0.058

0.642 1.033 0.678 0.064

sion and other disorders. However, the scale appears to discriminate almost as accurately between major depression and separation anxiety. The scale's ability to distinguish major depression from dysthymia is not so clear. The area under the ROC curve provides a more precise measure of the diagnostic accuracy of the instrument. An area of 0.5 means that the scale is not more accurate than chance, while an area of 1.0 would mean that the test has perfect diagnostic accuracy. Statistics of the ROC curves for the comparisons carried out are presented in Table 4. They confirm the finding that CBCL-NUR is most accurate when discriminating between patients with and without major depression and least accurate when discriminating major depression from dysthymia. However, even for that comparison, the scale is better than chance. Sensitivities and specificities were not high. For example, for the comparison between major depression and all other disorders, a CBCLNUR score of 15 had a sensitivity of O. 83 (which represents a false negative rate of 17%) and a specificity of 0.55 (producing 45% false positives), while comparable figures for a score of 20 were 0.65, 0.75 and for a score of 25, 0.48 and 0.91. For the reasons presented in the method section, an optimal cutoff point that can be used in a variety of settings cannot be given. This will need to be determined for each case, taking into consideration the specific conditions and requirements of the clinical situation in which the test is to be applied (Hsiao et aI., 1989). This information can be extracted from the data provided in Table 4. Discussion

The results in this study confirm that the scale obtained from the CBCL by Nurcombe et al. (1989) does reflect a depressive syndrome: first, because scale scores in the major depression group are significantly higher than in the other diagnostic groups, and, second, because scores on this scale are able to discriminate between adolescents with major depression from individuals suffering from other disorders

426

t-test

(1-2) (1-3) (1-4)

5.12 2.46 3.94

p

0.000 0.016 0.000

with a substantial degree of accuracy. It is particularly encouraging that CBCL-NUR scores are significantly different between diagnostic groups, even between major depression and dysthymia, which is more than can be said of other widely used instruments, such as the CDI, which have produced conflicting results in this area (Costello and Angold, 1988; Kashani et aI., 1990). The internal consistency of the scale (alpha = 0.84) is also adequate. The area under the ROC curve for the comparison between adolescents with and without major depression (0.782) is very similar to the area under the ROC curve found by Mossman and Somoza (1989) for the composite of seven studies using the Dexamethasone Suppression Test (DST) (0.792). This suggests that the CBCL-NUR is as accurate as the DST in differentiating among patients with major depression from those with other disorders. It is not possible to compare the performance of CBCL-NUR with other depression scales because data of this kind for other scales is lacking. The size of the sample, the fact that data collection and diagnosis were not specifically designed for this study (therefore avoiding possible study-related biases), that only 3.5% of patients had a diagnosis of major depression (suggesting that clinicians making diagnoses applied DSM-III criteria in a very conservative manner), and the acceptable reliability of the diagnoses indicate that these findings are likely to be reliable. It cannot be forgotten, however, that reliability of clinical diagnosis sets an upper limit on the performance of any diagnostic test. This study gives validity to CBCL-NUR as a scale to measure depression. However, CDI scores, as used by Nurcombe et al. (1989), are not appropriate to serve as the main validating standard for CBCL-NUR, since the two scales are likely to be significantly correlated. The findings of Nurcombe et al. (1989) are very useful and in the same ways similar to those reported by Hepperlin et al. (1990). The latter also tried to extract a measure of depression from a widely used multipurpose, multiinformant instrument like the CBCL, moving away from the proliferation of separate scales to measure depression that now plagues child and adult psychiatry (Endicott et aI., 1981; Costello and Angold, 1988; Kazdin, 1990) and interferes with the progress of research, as Endicott et al. (1981) have pointed out. Hepperlin et al. (1990) tested whether a series of CBCL or YSR (Youth Self-Report) (Achenbach and Edelbrock, 1987) items could predict CDI scores. They found 15 items from the YSR (YSR-CDI) that predicted CDI scores with a reliability of 0.74, similar to the test-retest agreement of child self-report instruments (Achenbach et aI., 1987). The correlation between the same items from the CBCL (CBCL-CDI) and the CDI was much lower (r = 0.23). It l.Am.Acad. Child Adolesc. Psychiatry, 30:3, May 1991

ADOLESCENT DEPRESSION AND THE CBCL

is of note that eight items from the CBCL-CDI are included in CBCL-NUR. It has repeatedly been stated in the literature (e.g., Angold, 1988; Kazdin, 1990) that parents are often unaware of depressive feelings in their children. Although Achenbach and Edelbrock (1983) failed to find a depression factor in the CBCL of adolescents, they did obtain such a factor when using the YSR (Achenbach and Edelbrock, 1987). This suggests that self-report measures might be more efficient in identifying depression than parent-based measures. It remains to be seen whether parent or child reports or a combination of both are more accurate than CBCL-NUR in identifying adolescents with major depression. That work is in progress. References Achenbach, T. M. & Edelbrock, C. S. (1983), Manualfor the Child Behavior Checklist and Revised Child Behavior Profile. Burlington, VT: University of Vermont Department of Psychiatry. - - - - (1987), Manual for the Youth Self-Report and Profile. Burlington, VT: University of Vermont Department of Psychiatry. - - McConaughy, S. H. & Howell, C. T. (1987), Child/adolescent behavioral and emotional problems: implications of cross-informant correlations for situational specificity. Psychol. Bull., 101:213-232. Angold, A. (1988), Childhood and adolescent depression, II: research in clinical populations. Br. J. Psychiatry, 15:476-492. Costello, E. J. & Angold, A. (1988), Scales to assess child and adolescent depression: checklists, screens and nets. J. Am. Acad. Child

Adolesc. Psychiatry, 27:726-737. Daniel, A. (1983), Power, Privilege and Prestige: Occupation in Australia. Melbourne: Longman Cheshire. Dorfman, D. D. & Alf, E. (1969), Maximum likelihood estimation of parameters of signal detection theory and determination of confidence intervals: rating method data. Journal of Mathematical Psy-

chology, 6:487-496. Endicott, J., Cohen, J., Nee, J., Fleiss, 1. & Sarantakos, S. (1981), Hamilton Depression Rating Scale extracted from regular and changed versions of the Schedule for Affective Disorders and Schizophrenia.

Arch. Gen. Psychiatry, 38:98-103.

J.Am.Acad. Child Adolesc. Psychiatry, 30:3, May 1991

Guilford, J. P. (1954), Psychometric Methods. New York: McGraw Hill. Hanley, J. A. (1988), The robustness of the "binormal" assumptions used in fitting ROC curves. Med. Decis. Making, 8:197-203. Hepperlin, C. M., Stewart, G. W. & Rey, J. M. (1990), Extraction of depression scores in adolescents from a general purpose behaviour checklist. J. Affective Disord., 18:105-112. Hsiao, J. K., Bartko, 1. J. & Potter, W. Z. (1989), Diagnosing diagnoses: receiver operating characteristic methods and psychiatry. Arch. Gen. Psychiatry, 46:664-667. Kashani, J. H., Sherman, D. D., Parker, D. R. & Reid, J. C. (1990), Utility of the Beck Depression Inventory with clinic-referred adolescents, J. Am. Acad. Child Adolesc. Psychiatry, 29:278-282. Kazdin, A. E. (1990), Childhood depression. J. Child Psychol. Psychiatry, 31: 121-160. Kovacs, M. (1981), Rating scales to assess depression in school aged children. Acta Paedopsychiatrica, 46:305-315. Kraemer, H. C. (1988), Assessment of 2 x 2 associations: generalizations of signal detection theory. American Statistician, 42:3749. Mossman, D. & Somoza, E. (1989), Maximizing diagnostic information from the dexamethasone suppression test. Arch. Gen. Psy-

chiatry, 46:653-660. Murphy, J. M., Berwick, D. M., Weinstein, M. C., Borus, J. F., Budman, S. H. & Klerman, G. L. (1987), Performance of screening and diagnostic tests: application of receiver operating characteristic analysis. Arch. Gen. Psychiatry, 44:550-555. Nurcombe, B., Seifer, R., Scioli, A., Tramontana, M. G., Grapentine, W. G. & Beauchesne, H. C. (1989), Is major depressive disorder in adolescence a distinct diagnostic entity? J. Am. Acad. Child

Adolesc. Psychiatry, 28:333-342. Rey, J. M., Plapp, J. M. & Stewart, G. W. (1989), Reliability of psychiatric diagnosis in referred adolescents. J. Child Psychol. Psy-

chiatry, 30:879-888. Seifer, R., Nurcombe, B., Scioli, A. & Grapentine, W. L. (1990), Is major depressive disorder in childhood a distinct diagnostic entity?

J. Am. Acad. Child Adolesc. Psychiatry, 28:935-941. Shaffer, D., Campbell, M., Cantwell, D. et al. (1990), Child and adolescent psychiatric disorders in DSM-IV: issues facing the work group. J. Am. Acad. Child Adolesc. Psychiatry, 28:830-835. Swets, J. E. (1988), Measuring the accuracy of diagnostic systems. Science. 240: 1285-1293. - - Pickett, R. M. (1982), Evaluation of Diagnostic Systems: Methods from Signal Detection Theory. New York: Academic Press.

427

Adolescent depression and the Child Behavior Checklist.

Using receiver operating characteristic methods, the authors planned to assess the diagnostic accuracy of a measure of depression extracted from the C...
5MB Sizes 0 Downloads 0 Views