The reliability and validity of the Tokyo Autistic Behaviour Scale.

The Japanese Journal of Psychiatry and Neurology, Vol. 44, No. 1, 1990

The Reliability and Validity of the Tokyo Autistic Behavior Scale Hiroshi Kurita, M.D. and Yuko Miyake, Ph.D.* Division of Developmental Disorders, National Institute of Mental Health, NCNP, lchikawa *Departmenf of Social Psychiatry, Psychiatric Research Institute of Tokyo, Tokyo

Abstract: The Tokyo Autistic Behavior Scale (TABS) consisting of 39 items provisionally grouped in four areas-interpersonal-social relationship, language-communication, habit-mannerism and o t h e w i s an instrument used by a chid’s caretaker to rate the chWs autistic behaviors on a 3-point scale. Test-retest reliability was satisfactory (i.e., an r for a total score was .94). Among six DSM-111 diagnostic groups, infantile autism showed a significantly higher total TABS score than the other five groups, and a taxonomic validity coefficient was .54. An r between total scores of the TABS and the Childhood Autism Rating Scale-Tokyo Version was .59. The area scores showed a lower validity than the total score. The TABS appears to be a useful instrument to assess autistic behavior. Key Words: autism, diagnosis, rating scale Jpn J Psychiatr Neurol44: 25-32, 1990

INTRODUCTION

To assess the behavior of autistic children is indispensable before initiating treatment. For this purpose, there are two types of information gathering. One is to observe directly the behavior of a child.* * l4 IG The other is to rely on the rating of child’s behavior made by his or her caretaker based on everyday and historical inf~rrnation.~ l3 The former could warrant information of a better quality than the latter, even though observable behavior is limited in terms of space and time. Whereas the latter can evaluate behavior not directly seen in an observation Received for publication on July 25, 1989. Mailing address: Hiroshi Kurita, M.D., Division of Developmental Disorders, National Institute of Mental Health, NCNP, 1-7-3, Konodai, Ichikawa 272, Japan.

situation, such as behaviors in early infancy, but tends to involve more problems in reliability and validity than the former, it is necessary to integrate both methods in order to grasp the whole clinical picture of autistic children. For any rating scale regardless of whether it is the former or latter type, the demonstration of reliability and validity is quite important. By reviewing studies on five major English rating scales for autistic behavior (i.e., the Behavior Observation Scale for Autism (BOS)? the Autism Behavior Checklist (ABC) ,7 Rimland‘s Form E-2,I3 the Behavior Rating Instrument for Autistic Children (BRIAC) l4 and the Childhood Autism Rating Scale ( CARS)IG), ParksB noted that all of them lacked studies on a test-retest reliability, discriminant and/or content validity. Although several reliability

26

H. Kurita and Y. Miyake

and/or validity studies on the ABC,2l BOS3 and CARS51? were published after the Parks’ review, there still seems to be a lot to be done in the reliability and validity studies on these ~ca1es.l~ Two Japanese autism rating scales not based on direct observation [CLAClO and NSATll] are also yet to have completed sufficient reliability and validity studies. This report is our attempt to demonstrate the validity and reliability, including the testretest type of the Tokyo Autistic Behavior Scale (TABS), developed by us for the use of a caretaker of the child. SUBJECTS AND METHODS

Outline of the Tokyo Autistic Behavior Scale As shown in the English translation in Appendix, the Tokyo Autistic Behavior Scale (TABS) consists of 39 items provisionally grouped in four areas-interpersonal-social relationship, language-communication, habitmannerism and others. (A copy of the original Japanese TABS may be requested from the first author.) Each item is rated on a 3-point scale (i.e., “true,” “sometimes in the past” and “false”) by a caretaker of the child. Items were selected from our item pool created based on a literature review and our clinical experiences with autistic children. The TABS was designated to be used along with the Tokyo Child Development Schedule (TCDS)s to cover the whole clinical picture of autistic and other developmentally disabled children. Because of a relatively small number of items of the TABS and our encouraging raters to check all items, the omission of an item in an individual case was rare in all of our samples. But we employed only the complete data in the following analyses.

Validity Study Since the TABS is rated on a 3-point scale involving a grade “sometimes in the past” that is not logically a midpoint between

“true” and “false,” three ways are possible to determine an item score. First is to use the three grades (i.e., 3, 2 and 1 for true, sometimes in the past and false, respectively) themselves as scores. The other two are to create the two 2-point scales of “currently present” versus “currently absent,” and “present or having been present” versus “having never been present” by collapsing the original grades 2 and 1, and 3 and 2, into one category, respectively. Based on the three calculations of an item score, the validity of the TABS was tested in two ways. First, DSM-1111 diagnoses on 102 children (mean age = 4.5 & 1.7; 78 males and 24 females), who had attended the Nerima Welfare Center for the Mentally and Physically Handicapped, were taken as a criterion variable. Their mean IQ was 68.9 I+ 22.9 [in 7 children untestable on the Japanese version of the Stanford-Binet and its equivalent for infants, the MCC Baby Test,GDQs on the Tsumori’s Mental Development Scale were used for IQs because of a good correlation of the DQ with IQIS]. These children were divided into six diagnostic groups (i.e., infantile autism including its residual state, other pervasive developmental disorders, childhood onset pervasive developmental disorders and atypical pervasive developmental disorders) including their residual states, mental retardation without a concomitant diagnosis of pervasive developmental disorder, specific developmental disorders, attention deficit disorders, and others, according to the DSM-111. Based on this classification, the taxonomic validity was testedz0 for the total and area scores of the TABS. (Taxonomic validity is a type of validity that takes the group-membership as a criterion measure and indicates the ability of a test to differentiate among groups. Its index is computed as the square root of the ratio of between-group variance to the total variance based on the analysis of variance of the test score on the groups.) With this same sample, Cronbach’s alpha, a coefficient for internal consistency, was

27

Reliability and Validity of TABS

calculated based on items and areas. Second, the total score of the Childhood Autism Rating Scale Tokyo Version (CARST V 9 was taken as a criterion variable of concurrent validity for the total and area scores of the TABS. The CARS-TV is a Japanese translation of the Childhood Autism Rating Scale (CARS),16 a 15-item rating scale for autism. It was administered to children at the Nerima Welfare Center for the Mentally and Physically Handicapped by an experienced psychologist based on behavior observation for an average of one hour. Of the 102 children, 52 children (mean age = 4.1 -t 1.6; 42 males and 10 females), on whom both CARS-TV and TABS were administered with an interval of less than 1 month, served as the subjects. We analyzed all of the data with a computer program “High Quality Analysis Libraries for Business and Academic Users.”18 Reliability Study

The number of subjects for the test-retest reliability study was 38 (mean age = 5.2 % 0.9; 32 boys and 6 girls) autistic and/or mentally retarded children who had attended a day care unit for such children affiliated with the Department of Psychiatry, Tokyo University Hospital. At the time of the initial rating, the TABS was fxst introduced to the 38 mothers, and they were not told of the next rating. After an average of 14 f 5.5 days, the mothers were requested to rate again without referring to the results of the first rating. The test-retest reliability of the TABS was evaluated in two steps. First, the statistic kappa, a coefficient for the classification agreement existing beyond the amount expected by chance, was calculated for each of the 39 items from a fourfold or ninefold table made by two ratings on the 2-point or 3-point scale. This is a modified use of kappa, because originally it indicates the true agreement between the two ratings of the not test-retest type but of the two raters on nominal or ordinal scales. However, to assess

the genuine agreement between the two ratings, it seems more appropriate to use this extension than to employ Cramer’s V, which cannot eliminate the influence of chance agreement. Next, the Pearson’s product moment correlation coefficient (r ) was calculated for the total score based on the two ratings. RESULTS

Validity Taxonomic and Discriminant Validity : Table 1 shows the mean total TABS scores, based on the 3-point scale, of the six DSMI11 groups. The analysis of variance on these data showed a significant group effect for the DSM-I11 diagnoses [F(5, 96) = 7.74, p < .001], and a taxonomic validity index, as introduced in the method section, was .54. The values of this index for the total TABS scores on the two 2-point scales created by collapsing the original grades 2 and 1, and 3 and 2, were .52 and .49, respectively. As shown in Table 2, Scheffe’s test succeeding the analysis of variance showed that the total TABS score on the original 3-point scale was significantly higher in infantile autism than the other five diagnostic groups. Table 1: Mean Total TABS Scores in Six DSM-I11 Diagnostic Groups Diagnosis” Case Number IA OPDD MR SDD ADD Others

Mean Total TABS score f SD

18 33 20 13

74.7f 8 . 9 65.3k 7.4

13 5

63.3f11.4 51.8f 3.5

6 2 . 3 k 9.4 61.0f 6.6

a IA, infantile autism including residual state; OPDD. other pervasive developmental disorders (i.e., childhood onset pervasive developmental disorder and atypical pervasive developmental disorder) including residual states; MR, mental retardation without an additional diagnosis of pervasive developmental disorder; SDD, specific developmental disorders; ADD, attention deficit disorders; Others. those with some mild developmental problems not meeting DSM-111 criteria for the other five groups.


28

Table 2: Multiple Comparison (Scheffe’s Test) of Total TABS Score among Six DSM-I11 Groups Pair

Difference in Mean Total TABS Score

*

IA-OPDD IA-MR IA-SDD IA-ADD IA-others OPDD-MR OPDD-SDD OPDD-ADD OPDD-others MR-SDD MR-ADD MR-others SD D-AD D SDD-others ADD-others

scores (interpersonal-social relationship, habit-mannerism and others) based on the 3point scale also showed a signiscant diagnostic group effect with taxonomic validity indexes of .39, .51 and .45, respectively. In Scheffe’s test, however, only the habit-mannerism area score significantly difterentiated infantile autism from the other five diagnostic groups. Concurrent Validity: In the 52 children on whom both CARS-TV and TABS were administered with an interval of less than 1 month, the total TABS m r e on the 3-point scale showed significant correlations with item 15 (general impression of autism scale) score (r = .61, p < .001) and the total CARS-TV score (r = .59, p < .001). The other two total TABS scores on the two 2point scales made by collapsing the original grades 2 and 1, and 3 and 2, showed weaker correlations with item 15 and the total CARS-TV scores with rs of .56 and .56, and .56 and .53, respectively (all significant at p < .001). The total TABS score on the 3-point scale showed significant negative correlations with IQs (r = -.51, p < .W1) and mental ages (r = -.33, p < .05), and no significant correlation with ages of the children. The interpersonal-social relation-

F (5, 96)

2.70*** 3.81** 3.70** 2.56*** 5.38** 0.29 0.45 0.10 2.07 0.04 0.02 1.16 0.09 0.80 1.26

9.39 12.37 13.67 11.36 22.87 2.97 4.27 1.97 13.47 1.30 -1.01 10.50 -2.31 9.20 11.51

See footnote a of Table 1 for abbreviations.

** p < 0.01, *** p < 0.05.

This total TABS score did not differ signscantly among the other five groups. The two total TABS scores based on the 2-point scales did not completely discriminate infantile autism from the other five groups. The analysis of variance on three area

Table 3: Kappa for Each of the 39 Items in the Item Number

Interpersonalsocial

LanguageCommunicationb

HabitMannerism

1 2 3 4 5 6 7 8 9 10 11 MfSD

0.40 0.43 0.72 0.72 0.57 0.75 0.68 0.72 0.56 0.55

0.70 0.51 0.61 0.88 0.85 0.79 0.59 0.79 0.68

0.78 0.74 0.47 0.60 0.60 0.75 0.47

-

0.81 0.72+ 0.12

0.61f0.12

TABSa

-

-

0.63f0.12

Others 0.71 0.56

0.67 0.84 0.80 0.78 0.59 0.63 0.78 0.62 0.85 0.71f0.10

a: All values are signi6cant at p < .005. b: For item 10 of the language-communication area, kappa could not be obtained, because grade 2 was not used in the fist rating, whereas all the three grades were usrd in the second rating.

Reliability and Validity of TABS ship, language-communication, habit-mannerism and other area scores on the 3-point scale showed weaker correlations with the total CARS-TV score than the total TABS score with rs of S O , .32, 30 and .45, respectively (all significant at p < .001). Reliability

Since the validity of the total TABS score on the 2-point scales was lower than that on the 3-point scale, the area and total scores of the TABS were calculated based on the 3-point scale in reliability studies. Test-Retest Reliability: Table 3 shows the values of kappa for 38 of the 39 items. They ranged from .40 to .88 with an overall mean of .67 +- .12. Since the first rating on item 10 of the language-communication area did not yield grade 2 rating, while the second rating did all of the three grades, kappa could not be calculated for the item. But on this item, the two ratings completely agreed in 34 of the 38 cases. The total TABS score showed a high correlation between the two ratings (r = .94). Internal Consistency: Cronbach‘s coefficient alpha based on 39 items was .78. DISCUSSION

Since autistic features change over the course of development, it is important for a clinician to know behaviors in early infancy not directly observable in an evaluation situation by means of a scale like the TABS in order to make a diagnosis of autism. This study has clarified several issues concerning the reliability and validity of the TABS. As to the organization of the 3-point scale, it may appear unusual to place “sometimes in the past” between “true” and “false.” However, it is common that, even if a typical autistic behavior disappears, its remnant could somewhat modify the behavior of the child. Therefore, “sometimes in the past” is by no means the same as “false” nor “true,” and can be placed between both, if not ex-

29

actly at a midpoint. This seems to be supported by the fact that the total score based on the 3-point scale more efficiently discriminated infantile autism from the other five groups than the total scores on the 2point scales. The total TABS score on the 3-point scale showed modest correlations with the two concurrent validity measures (i.e., item 15 and the total CARS-TV scores). Cronbach’s alpha was also smaller in the TABS than the CARSg The TABS differences from the CARS-TV in items themselves, the organization of an ordinal scale and the way of administration seem to have contributed such modesty. The degrees of the correlations, however, seem acceptable as an evidence of concurrent validity of the TABS, given the brevity of the TABS and the fact that it was used not by a professional but by a mother of the child. The negative correlation of the total TABS score with the IQ may be another evidence of concurrent validity, because autistic symptoms, such as those described in the TABS, are usually less prominent in autistic children with high IQs than those with low IQs. A similar relationship was observed in the CARSg x Along with the validity of the TABS, the satisfactory test-retest reliability based on the 3point scale warrants the usefulness of the TABS. In view of the fact that the total score showed a higher validity than the area scores in several respects and the present grouping the four areas is tentative, the total score seems better than the area scores as an index of autism. In sum, the current version of the TABS as a whole appears to be a useful instrument to assess autistic behavior of the child. When used in combination with the rating scales based on direct behavior observation, such as CARS, the TABS may become more helpful for a clinician to grasp the whole clinicaI picture of autistic conditions, though the TABS never can be a substitute for an experienced clinician’s diagnosing autism.


30 ACKNOWLEDGMENTS

This study was supported in part by a Grantin-Aid for General Scientific Research (No. 63480263) from the Ministry of Education, Science and Culture of Japan. The authors thank Mrs. Kaoru Katsuno for her assistance in data collection. REFERENCES 1 . American Psychiatric Association: Diag-

2.

3.

4.

5.

6.

7.

8.

9.

nostic and Statistical Manual of Mental Disorders, 3rd ed. American Psychiatric Association, Washington, D.C., 1980. Freeman, B.J., Ritvo, E.R., Guthrie, D., Schroth, P. and Ball, J.: Behavior observation scale for autism: Initial methodology, data analysis, and preliminary findings on 89 children. J Am Acad Child Psychiatry 17: 576-588, 1978. Freeman, B.J., Ritvo, E.R. and Schroth, P.C.: Behavior assessment of the syndrome of autism: Behavior observation system. J Am Acad Child Psychiatry 23: 588-594, 1984. Freeman, B.J., Ritvo, E.R., Yokota, A. and Ritvo, A.: A scale for rating symptoms of patients with the syndrome of autism in real life settings. J Am Acad Child Psychiatry 25: 130-136, 1986. Garfm, D.G. and McCallon, D.: Validity and reliability of the Childhood Autism Rating Scale with autistic adolescents. J Autism Dev Disord 18: 367-378, 1988. Koga, Y., Niwa, Y.,Omura, M., Takashima, M., Yamauchi, S . and Okada, Y. (eds.): The Baby Tests for Mother-Child Counseling. Dobun-shoin, Tokyo, 1967 (in Japanese). Krug, D.A., Arick, J. and Almond, P.: Behavior check!ist for identifying severely handicapped individuals with high levels of autistic behavior. J Child Psycho1 Psychiatry 21: 221-229, 1980. Kurita, H., Uchiyama, T. and Takesada, M.: Tokyo Child Development Schedule. I. Test-retest reliability and concurrent validity. Folia Psychiatr Neurol Jpn 39: 129-138, 1985. Kurita, H., Miyake, Y.and Katsuno, K.: Reliability and validity of the Childhood Autism Rating Scale-Tokyo Version (CARS-TV). J Autism Dev Disord 19: 389-396, 1989.

10. Makita, K. and Umezu, K.: An objective evaluation technique for autistic children: An introduction of CLAC scheme. Acta Paedopsychiatr 3% 237-253, 1973. 11. Nakatsuka, Z.: Changes in autistic syndrome with chronological age: Examination using the Nakatsuka Scales of Autistic Tendencies (NSAT). Jpn J Child Adolesc Psychiatry 29: 117-126, 1988 (in Japanese with English abstract). 12. Parks, S.L.: The assessment of autistic children: A selective review of available instruments. J Autism Dev Disord 13: 255-267, 1983. 13. Rimland, B.: The differentiation of childhood psychoses: An analysis of checklists for 2,218 psychotic children. J Autism Child Schizophr 1: 161-174, 1971. 14. Ruttenberg, B.A., Dratman, M.L., Fraknoi, J. and Wenar, C.: An instrument for evaluating autistic children. J Am Acad Child Psychiatry 5: 453-478, 1966. 15. Shimizu, Y., Senda, S.,Someya, R., Ohta, M. and Kawasaki, Y.: Correlation between DQ and IQ in autistic children of preschool age. Jpn J Psychiatr Treatment 2: 61-67, 1987 (in Japanese). 16. Schopler, E., Reichler, R.J., DeVellis, R.F. and Daly, K.: Toward objective classification of childhood autism: Childhood autism rating scale (CARS). J Autism Dev Disord 1 0 91-103, 1980. 17. Schopler, E., Reichler, R.J. and Renner, B.R.: The childhood autism rating scale (CARS) for diagnostic screening and classification of autism. Irvington, New York, 1986. 18. Takagi, H., Yanai, H., Hattori, Y., Ichikawa, M., Sato, S. and Marui, E.: High Quality Analysis Libraries for Business and Academic Users. Gendaisugaku-sha, Kyoto, 1987 (in Japanese). 19. Teal, M.B. and Wiebe, M.J.: A validity analysis of selected instruments used to assess autism. J Autism Dev Disord 16: 485-494, 1986. 20. Thorndike, R.L.: Group membership as a criterion variable. ID: Thorndike, R.L. (Ed.), Applied Psychometrics. Houghton Mifflin, Boston, pp 220-222, 1982. 21. Volkmar, F.R., Cicchetti, D.V., Dykens, E., Sparrow, S.S., Leckman, J.F. and Cohen, D.J.: An evaluation of the Autism Behavior Checklist. J Autism Dev Disord 18: 81-97, 1988.

31

Reliability and Validity of TABS APPENDIX Tokyo Autistic Behavior Scale The followings are questions on behavior and symptoms that must be taken into consideration. Please answer whether your child now has these problems. If your child now shows a problem behavior or symptom, please check “True” in the answering section. If he/she formerly exhibited such behavior but currently does not, please check “Sometimes in the past.” If he/she has never exhibited such behavior, please check “False.” Area and Item 1. Interpersonal-Social Relationship

Grade 3 . True

2. Sometimes in the Past

1. False

3 . True


1. False

1. Has a lack of anxiety with strangers. 2. Has a lack of eye-to-eye contact with other people. 3. Keeps his/her nose close to things or people as if smelling them. 4. Seldom (or never) has social relations with other children or adults, and frequently plays alone. 5. Slaps other children or thrusts them away. 6 . Bites other people. 7. Suffers no anxiety to be alone when his/ her parents are not at his/her side. 8. Cries when his/her mother leaves even for a short time. 9. Throws himself/herself down and cries whenever frustrated. 10. Seldom (or never) displays emotions such as joy and sadness.

II.

Language-Communication

1. Spoke a few words at about the age ofL-

2. 3. 4.

5. 6.

7. 8.

one, but after then spoke few or no words. Cannot understand even simple directions. Cannot use meaningful words. Parrots what he/she is told. Speaks, what he/she once heard or was told, at an unrelated place or time. Pulls adults’ hands or arms close to things which he/she wants. Indicates things that he/she wants, not with his/her finger but with the whole hand. Waves his/her hand, but in an unusual manner, showing the back of his/her hand to other party.

32

H. Kurita and Y.Miyake 9. Answers the names of articles fairly correctly (when asked “What is this?’), but cannot otherwise speak. 10. Memorizes well names of things and people, telephone numbers, etc. 11. Can use nouns and verbs, but hardly uses other words, including adjectives. Ill.

Habit-Mannerism

Flutters his/her hands or fingers, or moves and stares at fingers. Loves and retains items that are not toys (e.g., strings and sticks). Does not seem to understand how to play with a toy, and swings or throws it. Repeats similar movements of his/her hands, feet, or a part of the body; or always assumes the same poses. 5. Always does things in an identical sequence and gets frustrated (cries) if the sequence is changed. 6. Not satisfied unless he/she takes the same route to a familiar place. 7. Hates changes in furniture arrangement or parents’ attire, hair-style, spectacles, etc. IV.

Others

1. Restless and rushes out immediately when not held by his/her hand. 2. Easily distracted and rarely can continue to gaze at an object. 3. Sensitive to slight sounds or does not react to loud sounds. 4. Licks anything or puts it into his/her mouth. 5 . Bites a part of his/her body or clothes. 6. Sometimes hits his/her head or knocks it against the wall. 7. Sometimes cannot easily stop crying, and there is no way to stop him/her from crying. 8. Expresses a great fear of a harmless person or thing (e.g., a TV commercial). 9. Follows an unbalanced diet. 10. Does not become dizzy and can walk without trouble even immediately after being swung. 11. Has trouble because the child does not fear to climb up to height.

3. True


1. False


1. False

3. True

Development of a Japanese version of the reported and intended behaviour scale: reliability and validity.

Reassessing the validity and reliability of the MMPI Alexithymia Scale.

Reliability and validity of the appraisal of diabetes scale.

Reliability and validity of the Dutch-translated Body Image Scale.

Validity and reliability of the dementia behavior disturbance scale.

The Agoraphobia Scale: an evaluation of its reliability and validity.

Reliability and validity of the Abusive Violence Scale.

Reliability and validity of the Menopausal Symptom Scale.

Validity and reliability of the Vietnamese Physician Professional Values Scale.

Reliability and validity of the Apathy Evaluation Scale.

The Patient Learning Needs Scale: reliability and validity.

Validity and reliability of a scale to assess fatigue.

The validity and reliability of the Turkish version of the bipolar depression rating scale.

Reliability and validity of perceived heart risk factors scale.

The reliability and validity of the Chinese version of the Modified Overt Aggression Scale.

Reliability and Validity of the Korean Version of the Dimensions of Tobacco Dependence Scale for Adolescents.

Validity and reliability of the Finnish version of the Multiple Sclerosis Impact Scale-29.

The reliability and validity of the rating scale of criminal responsibility for mentally disordered offenders.

Reliability, validity, and feasibility of the Zwisch scale for the assessment of intraoperative performance.

Reliability and Validity Study of the Turkish Version of the Clinical Opiate Withdrawal Scale.

The Nursing Assessment of Medication Acceptance: the reliability and validity of a schizophrenia medication adherence scale.

Validity and reliability of the Chinese version of the Sheehan Disability Scale (SDS-C).

The validity and reliability of the Sense of Coherence scale among Indian university students.

Weinstein's Noise Sensitivity Scale: reliability and construct validity.