Assessment http://asm.sagepub.com/

Psychometric Properties of the Parent and Teacher ADHD Rating Scale (ADHD-RS): Measurement Invariance Across Gender, Age, and Informant Guido Makransky and Niels Bilenberg Assessment published online 22 May 2014 DOI: 10.1177/1073191114535242 The online version of this article can be found at: http://asm.sagepub.com/content/early/2014/05/20/1073191114535242

Published by: http://www.sagepublications.com

Additional services and information for Assessment can be found at: Email Alerts: http://asm.sagepub.com/cgi/alerts Subscriptions: http://asm.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav Citations: http://asm.sagepub.com/content/early/2014/05/20/1073191114535242.refs.html

>> OnlineFirst Version of Record - May 22, 2014 What is This?

Downloaded from asm.sagepub.com at SETON HALL UNIV on September 12, 2014

535242

research-article2014

ASMXXX10.1177/1073191114535242AssessmentMakransky and Bilenberg

Article

Psychometric Properties of the Parent and Teacher ADHD Rating Scale (ADHD-RS): Measurement Invariance Across Gender, Age, and Informant

Assessment 1­–12 © The Author(s) 2014 Reprints and permissions: sagepub.com/journalsPermissions.nav DOI: 10.1177/1073191114535242 asm.sagepub.com

Guido Makransky1 and Niels Bilenberg1

Abstract Attention deficit/hyperactivity disorder (ADHD) is one of the most common psychiatric disorders in childhood and adolescence. Rating the severity of psychopathology and symptom load is essential in daily clinical practice and in research. The parent and teacher ADHD-Rating Scale (ADHD-RS) includes inattention and hyperactivity/impulsivity subscales and is one of the most frequently used scales in treatment evaluation of children with ADHD. An extended version, mADHD-RS, also includes an oppositional defiant disorder subscale. The partial credit Rasch model, which is based on item response theory, was used to test the psychometric properties of this scale in a sample of 566 Danish school children between 6 and 16 years of age. The results indicated that parents and teachers had different frames of reference when rating symptoms in the mADHD-RS. There was support for the unidimensionality of the three subscales when parent and teacher ratings were analyzed independently. Nonetheless, evidence for differential item functioning was found across gender and age for specific items within each of the subscales. The findings expand existing psychometric information about the mADHD-RS and support its use as a valid and reliable measure of symptom severity when used in age- and gender-stratified materials. Keywords ADHD-RS, Rasch model, validation, invariance, differential item functioning

Introduction Attention deficit/hyperactivity disorder (ADHD) according to the Diagnostic and Statistical Manual (DSM-IV and DSM-5; American Psychiatric Association, 1994, 2013) and the corresponding ICD-10 diagnosis Hyperkinetic Disorder (HKD; World Health Organization, 1992) is one of the most common psychiatric disorders seen in childhood and adolescence. ADHD or HKD is defined by symptoms within three areas (latent traits): inattention, hyperactivity, and impulsivity. A large proportion of children with ADHD/ HKD present comorbidity, with oppositional defiant disorder (ODD) being the most common, but also sleeping disorders, depression, anxiety disorders, developmental coordination disorder (DCD), learning disabilities, and language impairment are frequently present. ADHD psychopathology is based on a description of the child as no biological markers specific for the disorder are currently known. This means that assessment through parental interviews, descriptions of the child’s development and cognitive style, and observation of the child in multiple settings is essential for a correct diagnostic formulation. In addition, assessing symptom load and severity in a valid and reliable way is crucial in both clinical and research

settings—mainly to measure treatment outcomes. Cons­ equently, instruments that provide an objective estimate of the severity of the construct of ADHD in a given person at a given time are needed. The latent traits that characterize ADHD are measured by use of a number of items/variables, each of which represents a component of the construct. After having combined a number of items, these must be tested to assess how each item contributes to the total score and to determine whether some items collect redundant information. It is also important to know how the scale operates for individuals from different demographic groups and for those scoring at the low, the mid, and the high ends of the trait, and whether a higher score actually reflects more psychopathology. A series of psychometric scales based on different informants have been developed to collect information about child behavior and psychopathology (Collett, Ohan, & 1

University of Southern Denmark, Odense, Denmark

Corresponding Author: Guido Makransky, Department of Psychology, University of Southern Denmark, Campusvej 55, Odense M, DK-5230, Denmark. Email: [email protected]

Downloaded from asm.sagepub.com at SETON HALL UNIV on September 12, 2014

2

Assessment 

Myers, 2003). These scales are useful in the diagnostic process and in treatment evaluation of ADHD patients. Some of the most widely used examples are the Vanderbilt ADHD Parent Rating Scale (Wolraich et al., 2003), the Swanson, Nolan, and Pelham Scale IV (Swanson et al., 2001), and the ADHD Rating Scale-IV (ADHD-RS-IV; DuPaul, Power, Anastopoulos, & Reid, 1998). The latter being the most frequently used. The ADHD-RS was conceived as a 14-item rating scale containing the DSM-III criteria for ADHD rated by parents and teachers (DuPaul, 1991). As the diagnostic criteria changed with DSM-III-R and later DSM-IV the scale was revised to an 18-item scale with nine hyperactivity/impulsivity criteria and nine inattention criteria. The revised version was named ADHD-RS-IV (DuPaul, 1998; DuPaul et al., 1998). Barkley, Gwenyth, and Arthur (1999) modified the ADHD-RS-IV to a 26-item version by adding eight specific ODD items to the original 18 items. This modified version fulfills the needs consistent with ICD-10 classification, where ODD in combination with HKD constitutes a specific diagnostic category, hyperkinetic conduct disorder (World Health Organization, 1992). In DSM-oriented settings the 18 items are unchanged in DSM-5 (American Psychiatric Association, 2013), and the modified version adds valuable information about comorbid ODD. The modified ADHD-RS (mADHD-RS) is the focus of the current study. To date, the psychometric properties of the ADHD-RS and the mADHD-RS have been subjected to evaluation using classical test theory (CTT; Lord & Novick, 1968) methods. These studies have generally supported the construct and predictive validity of the measures, including findings that confirm a two- and three-factor model for the ADHD-RS and the mADHD-RS, respectively (e.g., Magnusson, Smari, Gretarsdottir, & Prandardottir, 1999; Szomlaiski et al., 2009; Zhang, Faries, Vowles, & Michelson, 2005). The subscales have been shown to discriminate patients with ADHD from children within the general population and from clinical controls. Interrater agreement between parents and teachers, nevertheless, has only been moderate, which is not necessary due to rater reliability, but may instead reflect real differences in behavior across different settings. Gender and age trends have also indicated more symptoms in boys than in girls and higher scores in younger children. Although there is existing research that has investigated the reliability between informants, as well as differences between gender and age groups when using the mADHD-RS, little is currently known about how the scales function across these groups. This can be investigated by assessing the measurement invariance (MI) of a scale across different demographic or informant groups. MI refers to the statistical property of measurement that indicates that the same construct is being measured across some specified groups. For example, MI

can be used to study whether a given measure is interpreted in a conceptually similar manner by respondents representing different genders or age groups, or when parents or teachers are used as informants. In standard Child and Adolescent Mental Health Services (CAMHS), the clinician uses the mADHD-RS total score to assess severity, and if the patient is treated, to assess changes over time (remission/relapse of symptoms). Therefore, more information is needed about the dimensionality of the scale and weather the scale can be used as a single unidimensional indicator of pathological severity, or if it is necessary to interpret the results based on the three subscales of inattention, hyperactivity/impulsivity, and ODD. It is also essential for the clinician to be familiar with how a score actually reflects symptom load and severity and how you can interpret scores and changes in scores from different informant reports or across gender and age groups. Several researchers have suggested that future research use methods such as item response theory (IRT) to investigate the validity of scales (Embretson & Reise, 2000; Hays, Morales, & Reise, 2000). These methods provide a more thorough assessment of the measurement properties of a scale and test for specific properties such as MI and the unidimensionality of a scale (Embretson & Reise, 2000). There is existing literature that has examined the psychometric properties of other ADHD scales using the IRT approach (e.g., Gomez, 2008a, 2008b, 2012, 2013; Gomez, Vance, & Gomez, 2010). These studies have shown that virtually all items were effective in discriminating children with and without ADHD both in parent and teacher ratings in the Disruptive Behavior Rating Scale (Barkley & Murphy, 1998). Conversely, to our knowledge, there is no research that has investigated the ADHD-RS-IV or the mADHD-RS from an IRT perspective. Furthermore, little is known about the MI of the scale across raters or across demographic groups. The Rasch model (Rasch, 1960), also known as the 1-PL model within the framework of IRT, describes the association between a person’s level of an underlying trait and the probability of a specific item response on a measure. This association places the individual’s level of the underlying trait and the item difficulty on a same metric. Observed data are tested against the assumptions of the model, and if met, the raw score of a scale can be said to reflect the severity of the underlying trait on an interval scale of measurement (Tennant & Conaghan, 2007). An interval level of measurement is essential when measuring change based on the effects of clinical interventions as is often the case with ADHD rating scales (e.g., Sonuga-Barke et al., 2013; Swanson et al., 2001). An extension of the Rasch model to items with more than two response options (polytomous items), the partial credit model (PCM; Masters, 1982) is applied in this study. This model was selected because when fit with the PCM is

Downloaded from asm.sagepub.com at SETON HALL UNIV on September 12, 2014

3

Makransky and Bilenberg obtained raw scores represent a sufficient statistic (e.g., Rasch, 1960). That is to say, the person total score contains all information available within the specified context about the individual, and the item total score contains all information with respect to item, with regard to the relevant latent trait. This is important in most applied settings where raw total scores are used to support diagnostic assignment and in clinical decision making. Using the PCM also results in a more comprehensive evaluation of the validity of a scale because fit to the PCM requires that the data fulfill a number of rigorous conditions. Therefore, assessing the validity of mADHD-RS with the PCM is an important further step in establishing the validity of these scales. Furthermore, assessing the validity of the mADHD-RS scale from a Rasch or IRT perspective is important because the mADHDRS includes the additional ODD/CD scale that is not included in other ADHD measures. The main objectives of the article are to use the PCM to investigate the following key research questions: Research Question 1: Does the mADHD-RS make up a single unidimensional scale or is it a multidimensional scale with three individual subscales (inattention-, hyperactivity/impulsivity- and ODD)? Research Question 2: Do the scale(s) function similarly when teachers or parents are used as informants? Research Question 3: Do the items in the scale(s) function similarly across gender and age groups?

Method Sample and Procedures The participants in the study consisted of 566 children, 296 (52%) boys and 270 (48%) girls, ranging from 6 to 16 years of age (mean = 10.98). Children were recruited from representative schools from inner city, suburban, and rural areas. The sample was representative of the Danish school child population in terms of family, social status, and IQ (Szomlaiski et al., 2009). Participants mirror the Danish child population, including the whole spectrum of children within this age, with the exception of severely intellectually disabled individuals. Data were only included in the analysis when complete ratings on all items from both parents and teachers were available.

Measures ADHD-RS modified by Barkley (Barkley et al., 1999) is a 26-item questionnaire including the 18 original ADHDRS-IV items supplemented with 8 conduct problem items. The questionnaire is used in two settings (identical versions), with parents and teachers, respectively, as informants. All items are rated on a 4-point Likert-type scale

(0-3), where 0 represents never or rarely, 1 is sometimes, 2 is often, and 3 is very often. The total score: range 0 to 78 and 3 subscores; inattentive scale: range 0 to 27, hyperactive/impulsive scale: range 0 to 27, and conduct scale: range 0 to 24 can all be calculated from item scores. The Danish version of the questionnaire was used in this study. The questionnaire was translated from the original U.S. version and back-translated into English. If there was discrepancy in item phrasing after back-translation, the Danish item was rephrased in order to assess the diagnostic essence of the given symptom, all according to standard procedures. Existing evidence of validity and reliability of the Danish version of the measure using CTT is documented in Szomlaiski et al. (2009).

Statistical Analysis The research questions presented above were investigated by assessing if the scale(s) in the mADHD-RS fit the GPC Rasch model with the RUMM2030 program (Andrich, Sheridan, & Luo, 2010). There are four fundamental evaluation criteria for PCM, namely, unidimensionality, item fit, item invariance, and general fit of the data to the model (Smith, Wright, Selby, & Velikova, 2007). These are described below. Unidimensionality.  A fundamental assumption when using a scale for clinical purposes is that the items in the scale measure only one underlying trait, which is known as unidimensionality. The unidimensionality assumption can be tested in the PCM by testing for local independence of the items (Wright, 1996). Local dependence (LD) between items occurs when items are redundant or linked in some way, such that the response on one item will determine the response on another. LD can be assessed by examining the residual correlation matrix. Items with residuals more than 0.2 are typically labeled as being locally dependent. Unidimensionality can also be assessed with a formal test proposed by Smith (2002). This test uses the first residual factor in a principal components analysis (of residuals) to determine two groups of items: those with positive and those with negative loadings. Each set of items is then used to make an independent trait estimate for each person in the sample. Given that the items form a unidimensional scale, it is expected that there should not be much difference between the person estimates from the two item subsets. An independent samples t test is used to determine whether there is a significant difference between the two estimates. If the value does exceed the 5% expected value then the conclusion can be made that the scale is unidimensional. Item Fit. Item fit is investigated in this study in order to determine whether all the symptoms addressed in the mADHD-RS are equally important for assessing the severity

Downloaded from asm.sagepub.com at SETON HALL UNIV on September 12, 2014

4

Assessment 

of the diagnosis. Fit is achieved when the individual items measure the latent trait similarly to the other items in the scale. Over-fit is obtained when an item discriminates (between individuals who are high and those who are low on the latent trait) more than what is expected by the model. Similarly under-fit is obtained when the item does not discriminate as well as expected. Significant chi-square statistics at the .05 level of significance (with a Bonferroni correction) are classified as items that do not fit the PCM in this study. Item Invariance.  Item invariance requires that item estimation is independent of the subgroups of individuals completing the measure (Bond & Fox, 2001). Items not demonstrating invariance are commonly referred as exhibiting differential item functioning (DIF; Makransky & Glas, 2013). For example, DIF occurs when different subgroups within the sample (e.g., boys vs. girls) have different scores on specific items, despite equal levels of latent ADHD trait (e.g., inattention). General Model Fit. The chi-squared statistic is commonly used to assess the general fit of the data to the model including the property of invariance across the trait. A significant chi square indicates that the hierarchical ordering of the items varies across the trait, which violates the requirements of the PCM (Pallant & Tennant, 2007). A significance value of .5 with a Bonferroni adjustment to account for the number of hypothesis tested was used in this study. In addition to these four criteria that assess the construct validity of the measure, reliability is reported with Cronbach’s alpha in addition to a Person Separation Index (PSI). Similar interpretations can be used when using the two measures (Tennant & Conaghan, 2007); however, the PSI takes an IRT perspective where standard error may vary at different points of the latent construct depending on the item information that is available at that point. Therefore, the PSI is directly related to the targeting of the items and is important in clinical settings because poorly targeted measures often result in floor or ceiling effects. Targeting can be displayed by reporting a confidence interval (CI) based on the standard error (SE) of the trait estimate conditionally at each point on the latent trait range.

Results The means and standard deviations as well as person separation and alpha reliability for the scales investigated in this study are presented in Table 1. Total ADHD-RS mean scores and mean subscores are consistent with Danish norm scores (Poulsen, Jørgensen, Dalsgaard, & Bilenberg, 2009). Also, scores reflect the same age and gender effects seen in all previous community samples. The results of fit to the PCM are presented in Table 2. Table 2 reports the

Table 1.  Means, Standard Deviations, and Reliability for Each Scale in the mADHD-RS. Reliability Scale

Mean

SD

Full scale parents 10.68 9.10 Full scale teachers 7.57 10.84 Inattention parents 4.25 3.91 Inattention teach 3.90 5.37 Hyperactivity/impulsivity parents 3.70 3.59 Hyperactivity/impulsivity teachers 2.28 4.26 ODD problems parents 2.73 3.16 ODD problems teachers 1.39 3.17

PSI

Alpha

0.84 0.85 0.70 0.86 0.62 0.72 0.61 0.70

0.92 0.95 0.86 0.94 0.82 0.93 0.85 0.93

Note. mADHD-RS = modified Attention Deficit/Hyperactivity DisorderRating Scale; PSI = Person Separation Index; ODD = oppositional defiant disorder.

chi-squared probability of the general fit of the scale to the model in the first column. The second column reports the number of significant t tests based on Smith’s (2002) unidimensionality test where fewer than 5% of the t tests should be significant for the scale to be unidimensional. The third column reports the combination of items with significant LD residuals. The fourth column reports the items that did not fit the PCM. The final two columns report the items that displayed DIF by gender and age within each scale.

Analysis of the ADHD-RS as a Single Unidimensional scale The first set of analysis investigated the possibility of considering the entire 26 item mADHD-RS as a unidimensional scale. The results of the PCM analysis show lack of fit for the parent and the teacher scale. There is clear evidence of multidimensionality for the parent and teacher scales with 13.07% and 16.25% significant t tests, respectively, which fall outside the nominal level of 5% (see Table 2). Furthermore, LD was identified with a large number of the item pairs, and there were many items that did not fit the scale. Therefore, the remaining analyses in this study will investigate the mADHD-RS as a multidimensional scale that measures the three scales of inattention, hyperactivity/ impulsivity, and ODD. This formulation is consistent with the way the scale was devised.

Analysis Combining Parent and Teacher Ratings The next set of analyses compared the teacher and parent scales. Table 1 indicates that the mean ratings were higher for parents than teachers for all three scales, meaning that teachers were more lenient in their ratings of the children than parents. Similarly the SD was lower for parents than for teachers in all three scales. Although the mADHD-RS is typically

Downloaded from asm.sagepub.com at SETON HALL UNIV on September 12, 2014

5

Makransky and Bilenberg Table 2.  Results of Fit to the PCM in the Original Analysis for Each Scale. General fit Scale Full scale Par. Full scale Teach. Inatt. Par. Inatt. Teach. Hyp/imp Par. Hyp/imp Teach. ODD prob Par. ODD prob teach.

Unidimensionality

Item fit

Item invariance (DIF)

χ2 Probability

t Tests

LD

χ2 < .05

Gender

Age

p < .0005* p < .0005* p = .14 p = .02 p < .0005* p < .0005* p = .06 p = .06

13.07% 16.25% 4.95% 2.83% 2.30% 1.94% 2.47% 1.24%

nr nr — — — 11 and 12; 16 and 17 — 20 and 21

nr nr — Item 2 Item 11 Items 12, 16 — —

nr nr — — Items 11, 15 Items 10,12,15 — Item 25

nr nr Items 1, 3 Items 7, 8 — — Item 25 —

Note. nr = not reported because too many items that did not fit the model. Inattention items: 1 = Fails to give close attention, 2 = Difficulty sustaining attention, 3 = Does not listen, 4 = Does not follow instructions, 5 = Difficulty organizing tasks, 6 = Avoids tasks, 7 = Loses necessary things, 8 = Easily distracted, 9 = Is forgetful. Hyp/imp items: 10 = Fidgets or squirms, 11 = Leaves seat, 12 = Runs about or climbs excessively, 13 = Difficulty playing quietly, 14: On the go, 15 = Talks excessively, 16 = Blurts out answers, 17 = Difficulty awaiting turn, 18 = Interrupts. ODD items: 19 = Loses temper, 20 = Argue with adults, 21 = Disobeys parents and teachers, 22 = Deliberately annoys people, 23 = Blames others, 24 = Easily annoyed, 25 = Is angry and resentful, 26 = Is spiteful or vindictive.

used independently for parents and teachers, the analysis was designed to investigate if the ratings from the different frames of reference could be combined to fit a single PCM. Fit to the model would mean that both frames of reference could be combined to create a single total score scale. The results indicated that parent and teacher ratings combined did not fit the PCM for any of the scales. There was also not support for the unidimensionality assumption because 17.84%, 11.48%, and 10.78% of the t tests were significant for the inattention, hyperactivity/impulsivity, and ODD scales, respectively. Furthermore, there were many items that did not fit the model, and item pairs with LD. This is an indication that the frame of reference used by parents and teachers is different. The results also indicated that the PSI index and alpha reliability in the mADHD-RS was consistently higher for teachers than for parents (see Table 1). Therefore, based on these results and because the scales are used independently, in practice the remaining analyses investigated the ratings from parents and teachers independently.

Results for Individual Scales Figures 1 displays the item parameters of the individual scales in relation to the distribution of respondents. The left frames in the figure represent the inattention, hyperactivity/ impulsivity, and ODD scales when parents are used as informants, and the right frames represent the three scales when teachers are used as informants. It is clear from this figure that the items target a wide range of the latent trait. It is also clear that most respondents score low on the scales which is expected because the sample represents a general school population. Figure 2 displays the targeting of the scale in terms of measurement precision as a function of the

estimate of the latent trait value reported in the PCM, and the total score for the individual scales. A 95% confidence interval around the estimated score is displayed in the graph to show the measurement precision of the estimates conditionally at each trait level. The graphs illustrate that there is good targeting for each scale and confidence intervals only become large for very high and very low scores.

Parent Inattention Scale In general, the nine-item parent inattention scale fit the PCM (χ2[63] = 75.03, p = .14). There was support for the unidimensionality of the scale (4.95 significant tests; Smith, 2002), and there was no redundancy in the items (no residuals over 0.2 indicating no LD). Furthermore, there were no items with significant chi-squared statistics indicating that all the items were good measures of the latent inattention construct. In relation to invariance, no items displayed DIF by gender, conversely, Item 1 (Fails to give close attention to details or makes careless mistakes in schoolwork) and Item 3 (Does not seem to listen when spoken to directly) displayed DIF by age. Cronbach’s alpha of the scale was 0.86 and the PSI was 0.70 indicating acceptable reliability for the scale. In general, the results support the construct validity and reliability of the parent inattention scale.

Teacher Inattention Scale The nine item teacher inattention scale also had general good fit to the PCM (χ2[36] = 55.12, p = .02). There was support for the unidimensionality of the scale (2.83% significant tests), and there was no redundancy in the items within the scale. Furthermore, all items fit the PCM model with the exception of Item 2 (Has difficulty sustaining attention in

Downloaded from asm.sagepub.com at SETON HALL UNIV on September 12, 2014

6

Assessment 

Figure 1.  Item parameter estimates in relation to the distribution of respondents.

tasks or play activities). This item had a negative fit residual indicating that the item discriminated slightly better than the PCM model predicted (χ2[5] = 16.75, p = .005). Nonetheless, the item was retained because the item did not display LD with any other items, the fit of the scale did not improve when eliminating it, and because the general fit of the model was good. In relation to invariance, no items displayed DIF by gender, conversely, Item 7 (Loses things necessary for tasks or activities) and Item 8 (Is easily distracted) displayed

DIF by age. Cronbach’s alpha of the scale was 0.94 and the PSI was 0.86 indicating good reliability for the scale. In general, the results support the construct validity and reliability of the teacher inattention scale.

Parent Hyperactivity/Impulsivity Scale The nine-item hyperactivity/impulsivity scale did not fit the PCM (χ2[45] = 99.35, p < .0005). However, there was

Downloaded from asm.sagepub.com at SETON HALL UNIV on September 12, 2014

7

Makransky and Bilenberg

Parents Inatt.

-5 -4 -3 -2 -1 Hyp/imp

0

1

2

3

4

5

27 24 21 18 15 12 9 6 3 0 -5 -4 -3 -2 -1

ODD prob.

Teachers

27 24 21 18 15 12 9 6 3 0

27 24 21 18 15 12 9 6 3 0

0

1

2

3

4

5

-5

-4

-3

-2

-1

0

1

2

3

4

5

-5

-4

-3

-2

-1

0

1

2

3

4

5

-5

-4

-3

-2

-1

0

1

2

3

4

5

27 24 21 18 15 12 9 6 3 0

24

24

21

21

18

18

15

15

12

12

9

9

6

6

3

3 0

0 -5 -4 -3 -2 -1

0

1

2

3

4

5

Figure 2.  A display of the relationship between the estimate of the latent trait value reported in the PCM and the total score, with a 95% confidence interval around the estimated score.

support for the unidimensionality of the scale (2.30% significant tests), and there was no redundancy in the items within the scale. Furthermore, all items with the exception of Item 11 (Leaves seat in classroom or in other situations in which remaining seated is expected) fit the PCM. This item had a negative fit residual showing that the item discriminated better than the PCM predicted (χ2[5] = 20.27, p = .001). In relation to invariance, no items displayed DIF by age, conversely, Item 11 and Item 15 (Talks excessively) displayed DIF by gender. Specifically, parents of boys were more likely to endorse Item 11 and parents of girls were more likely to endorse Item 15, despite children having the same level of the latent trait of hyperactivity/impulsivity. Cronbach’s alpha of the scale was 0.82 indicating good

reliability; however, the PSI was 0.62 indicating low reliability for the scale. A more detailed description of the dissimilar results is provided in the Discussion section. It is important to investigate sources of misfit when acceptable fit to the PCM is not obtained. Although little evidence was uncovered to determine the source of misfit, a number of follow-up analyses were conducted to assess if a revised version of the scale could fit the PCM. These included the elimination of Items 11 and 15 and analyses pooling different combinations of items. Although the fit of the model was improved slightly, acceptable fit was not obtained. Nevertheless, a follow-up analysis for the boys and girls samples independently indicated that the scale fit the PCM for each of these samples. Therefore, the results

Downloaded from asm.sagepub.com at SETON HALL UNIV on September 12, 2014

8

Assessment 

support the construct validity of the gender stratified parent hyperactivity/impulsivity scale.

Teacher Hyperactivity/Impulsivity Scale The nine item hyperactivity/impulsivity scale did not fit the PCM (χ2[36] = 73.11, p < .0005). There was support for the unidimensionality of scale (1.94% significant tests). However, positive residuals over 0.2 indicating LD or redundancy were found between Item 11 (Leaves seat in classroom or in other situations in which remaining seated is expected) and Item 12 (Runs about or climbs excessively in situations in which remaining seated is expected); and between Item 16 (Blurts out answers before questions have been completed) and Item 17 (Has difficulty awaiting turn). Furthermore, Items 12 (χ2[4] = 14.28, p = .006) and 16 (χ2[4] = 16.54, p = .002) had negative fit residuals showing that the items discriminated better than the PCM predicted, which could be a direct consequence of the LD reported above. All other items fit the PCM. In relation to invariance, no items displayed DIF by age, conversely, Item 10 (Fidgets with hands and feet or squirms in seat), Item 12, and Item 15 (Talks excessively) displayed DIF by gender. Specifically, teachers of boys were more likely to endorse Items 10 and 12 and teachers of girls were more likely to endorse Item 15, despite children having the same level of the latent trait of hyperactivity/impulsivity. Cronbach’s alpha of the scale was 0.93 and the PSI was 0.72 indicating acceptable reliability for the scale. Since direct evidence was uncovered indicating the sources of misfit to the PCM, follow-up analyses were performed to investigate if acceptable fit to the PCM could be obtained. Response dependence between items is an indication that items that are treated as independent have redundancy that should be modeled; alternatively one of the pair of items can be eliminated. Item dependence was dealt with in this study by combining Items 11 and 12 and Items 16 and 17 into two composite items. The follow-up analysis after combining these items indicated that the scale fit the PCM (χ2[35] = 47.44, p = 0.08). Therefore, the results support the construct validity and reliability of the revised teacher hyperactivity/impulsivity scale.

Parent ODD Scale The eight-item parent ODD scale had general good fit to the PCM (χ2[40] = 55.28, p = .06). There was support for the unidimensionality of the scale (2.74% significant tests), and there was no redundancy in the items within the scale. There were also no items with significant chi-squared statistics indicating that all the items were good measures of the latent ODD construct. In relation to invariance, no items displayed DIF by gender; however, Item 25 (Is angry and resentful) displayed DIF by age. Cronbach’s alpha of the

scale was 0.85 indicating good reliability; however, the PSI was 0.61 indicating low reliability for the scale. In general, the results support the construct validity of the parent conduct problems scale.

Teacher ODD Scale The eight-item teacher ODD scale also had general good fit to the PCM (χ2[24] = 35.30, p = .06). There was support for the unidimensionality of the scale (1.24% significant tests). However, a positive residual over 0.2 indicating LD or redundancy between Item 20 (Argues with adults) and Item 21 (Actively defies or refuses to comply with adults’ requests or rules) was detected. In relation to invariance, no items displayed DIF by age; however, Item 25 (Is angry and resentful) displayed DIF by gender. Specifically, teachers were more likely to endorse the item in girls compared with boys. Cronbach’s alpha of the scale was 0.93 and the PSI was 0.70 indicating acceptable reliability for the scale. Although the results support the construct validity and reliability of the teacher conduct problems scale, a follow-up analysis was conducted to deal with the LD between Items 20 and 21. The results of the new analysis indicated excellent fit to the PCM (χ2[21] = 25.66, p = .22).

Discussion ADHD/HKD symptom levels span a continuum from normal to severely disordered children, all within the school population. The DSM and ICD diagnoses are constructs based on clinical case reports and empirical data. Rating the severity of psychopathology is essential in daily clinical practice and in research. Therefore, reliable measures are needed that in the most valid way reflect the patient’s clinical picture, and change in symptom load when patients are assessed on two or more occasions within an intervention. The PCM, within the framework of IRT, was used to test the psychometric properties of the mADHD-RS in this study. More precisely, we investigated whether the mADHD-RS makes up a single unidimensional scale as it is sometimes used by clinicians, or if it is a multidimensional scale with three individual subscales (inattention, hyperactivity/impulsivity, and ODD). Furthermore, we investigated the MI of the scales. Specifically, we investigated whether the scale(s) function similarly when teachers or parents are used as informants, and whether the items in the scale(s) function similarly across gender and age groups. The results of the study showed clear evidence of multidimensionality for the parent and teacher mADHD-RS. Therefore, the mADHD-RS total score should not be used as a single severity measure of ADHD psychopathology as it does not measure a single unidimensional clinical representation (diagnose or trait). The current analyses support the DSM (fourth and fifth editions) classification systems

Downloaded from asm.sagepub.com at SETON HALL UNIV on September 12, 2014

9

Makransky and Bilenberg where ADHD patients are characterized either as primarily inattentive or primarily hyperactive/impulsive, or as a combined type with severe deficits in both areas. When assessing symptom load, clinicians need to address the two problem areas, inattention and hyperactivity/impulsivity, separately. ODD symptoms also form a separate category and should be rated and interpreted independently. An analysis of whether the parent and teacher ratings for each child could be combined to get a global scale of ADHD indicated that the scales could not be combined because parents and teachers had very different frames of reference in applying the mADHD-RS. Teachers had lower average ratings showing leniency in their ratings. Teachers were also better at differentiating between children with low and high levels of the ADHD trait, which was evident in the larger SD for all scales compared with the parent raters. Additionally, Cronbach’s alpha and the PSI were higher for teacher ratings compared with the parent ratings. The results indicate that teachers are more lenient but better at differentiating between children with low and high levels of the ADHD trait. This may be the case because teachers interact with large groups of age-matched children of both genders, and therefore have a more explicit “reference,” whereas parents typically only have a couple of siblings to compare when rating the indexed child. In general, the results of the study supported the validity of the mADHD-RS when used independently for parents and teachers. The results indicated that parent and teacher inattention and conduct ODD scales had good fit to the PCM. Conversely, the parent and teacher hyperactivity/ impulsivity scale in its current format did not have good fit to the PCM. However, acceptable fit was achieved with slight adjustments including the combination of item pairs that displayed LD, and stratifying by gender for the teacher and parent hyperactivity/impulsivity scales, respectively. The results of the study showed some redundancy in the form of LD between items in the teacher hyperactivity/ impulsivity and ODD scales. LD was identified between Item 11 (Leaves seat in classroom or in other situations in which remaining seated is expected) and Item 12 (Runs about or climbs excessively in situations in which remaining seated is expected) and between Item 16 (Blurts out answers before questions have been completed) and Item 17 (Has difficulty awaiting turn) in the teacher hyperactivity/impulsivity scale. It is clear from the content of these items that they overlap considerably. Items 11 and 12 assess very similar behaviors and differ only slightly in that teachers are asked if children leave their seat (Item 11) or run and climb (Item 12) in “situations in which remaining seated is expected.” Similarly, Items 16 and 17 both assess children’s ability to wait until it is appropriate to act. LD was also identified between Item 20 (Argues with adults) and Item 21 (Actively defies or refuses to comply with adults’ requests or rules) in the teacher ODD scale. Again it is clear

that these two items are quite similar in that they both assess student’s failure to comply with adults by arguing or refusing to comply with their rules. There are several options when deciding on how to improve the scale based on redundant items. The first is to eliminate one of the item pairs. This solution has complications with the mADHD-RS because the scale is developed to assess the diagnosis in the DSM-IV and DSM-5, and the elimination of the items would decrease the content validity of the scale. One consideration could be the revision of the items with the objective of differentiating the behavior that is assessed in each one to limit redundancy. Although this option sounds appealing, revising a widely used questionnaire often comes with a number of practical challenges and it often takes a considerable period of time before data can be collected to assess how well new items are functioning. A final option is to model the LD between the items, by combining them into single combination items (e.g., Kreiner & Christensen, 2011). Although future research could assess one of the other options if the results from this study are replicated, LD was successfully dealt with in this study by combining the sets of items with LD into single items and reestimating item parameters. There were also several items that did not fit the PCM. This was true for Item 2 (Difficulty sustaining attention) in the teacher inattention scale, Item 11 (Leaves seat) in the parent hyperactivity/impulsivity scale, and Items 12 (Runs about or climbs excessively) and 16 (Blurts out answers) in the teacher hyperactivity/impulsivity scale. All items had negative fit residuals showing that the items discriminated better than the PCM predicted. One cause for higher levels of discrimination is redundancy between items, which typically leads to inflated discrimination estimates. This was true for Items 12 and 16 in the teacher hyperactivity/impulsivity scale, which showed LD as discussed above. Therefore, lack of fit for these items was likely a direct consequence of LD. The lack of fit for the other two items was not seen as problematic as the discrimination parameters were higher than expected by the model, and there was no LD identified. In relation to MI across gender, DIF was identified in Items 11 (Leaves seat in classroom or in other situations in which remaining seated is expected) and 15 (Talks excessively) in the parent inattention scale; Items 10 (Fidgets with hands and feet or squirms in seat), 12 (Runs about or climbs excessively in situations in which remaining seated is expected), and 15 (Talks excessively) in the teacher inattention scale; and Item 25 (Is angry and resentful) in the teacher ODD scale. Parents of boys were more likely to endorse Item 11 and parents of girls were more likely to endorse Item 15, despite children having the same level of the latent trait of hyperactivity/impulsivity. Similarly, teachers of boys were more likely to endorse Items 10 and 12, and teachers of girls were more likely to endorse Items 15 and 25, despite

Downloaded from asm.sagepub.com at SETON HALL UNIV on September 12, 2014

10

Assessment 

children having the same level of the latent traits of hyperactivity/impulsivity and ODD, respectively. These results are congruent with previous findings (Biederman, Faraone, & Monuteaux, 2002) that boys and girls with high levels of ADHD have different ways of expressing this behavior. The results indicated that boys tend to act up by squirming and fidgeting and leaving their seats, whereas girls tend to talk excessively, or act angry and resentful. Lack of MI across age in the form of DIF was also detected for Items 1 (Fails to give close attention to details or makes careless mistakes in schoolwork) and 3 (Does not seem to listen when spoken to directly) in the parent inattention scale. Specifically, for Item 1 parents of the oldest group of children (14 to16 years of age) were more likely to endorse the item compared with the parents of the younger children (6 to 9 years of age) despite children having the same level of the latent trait of inattention (the middle-aged group scored between the two). This finding is in line with clinical observations because parents tend to have higher expectations of older children. For Item 3, parents of younger children (6 to 9 years) were more likely to endorse the item compared with the parents of the older two groups of children (10 to 16 years). This result is also consistent with clinical observations that children between the ages of 6 and 9 years have a shorter attention span than their older peers. Furthermore, Items 7 (Loses things necessary for tasks or activities) and 8 (Is easily distracted) exhibited DIF by age in the teacher inattention scale. For Item 7 teachers of the middle aged group (10 to 13 years) were more likely to endorse the item compared with the younger group (6 to 9 years) despite children having the same level of the latent trait of inattention (the oldest aged group scored between the two). This result may be explained by higher expectations from teachers of the middle-aged children, and acceptance of preadolescents or pubertal children to become more forgetful. For Item 8, teachers of younger children (6 to 9 years) were more likely to endorse the item, followed by the middle group (10 to 13 years), and the oldest group (14 to 16 years) despite having the same trait level, which is consistent with clinical observations that younger children are more easily distracted in school activities. Finally, there was DIF by age for Item 25 (Is angry and resentful) in the parent ODD scale. Specifically, parents of the middle-aged group (10 to 13 years) were more likely to endorse the item, followed by the younger group (6 to 9 years), and finally the oldest group (14 to 16 years) despite having the same trait level. There are several options available for dealing with lack of MI in the form of DIF. There are statistical procedures that can account for the DIF in items by using group specific item parameters (e.g., Kreiner & Christensen, 2011; Makransky & Glas, 2013). These procedures correct for the differences at the item level by estimating ability with different item parameters for each demographic group. This produces scores that have different trait-level values

depending on the demographic group the child belongs to (e.g., gender, age). The use of such a scoring scheme would have the advantage that children would be compared to a general standard because the model would account for the real differences that exist between demographic groups. However, the method has the disadvantage that it introduces added complexity and the total scores can no longer be used on their own as summary scores. A more simple practice is to use gender- and age-specific norms and cutoff scores when making clinical decisions about individuals with the ADHD-RS. This procedure is more appropriate in the current setting where total scores are typically used for making clinical decisions. Such reference scores seem to be culturally different with rather large cross-cultural differences. Therefore, national specific standardizations of ADHD-RS (mADHD-RS) should be used. When assessing a specific child one should transform raw scores to t scores (z scores), which indicate how far from the age and gender stratified mean the child has scored (Szomlaiski et al., 2009). Treatment effect can then be measured as reduction in t score, and normalization can be defined as a t score below 60 (mean score for a normal child plus 1 SD). One advantage of assessing the validity of measurement scales within the framework of IRT is that instead of assessing reliability in general for the entire scale, IRT modeling provides a standard error band around the ability estimate at each point on the latent trait continuum. Good test targeting is obtained when the items within a scale accurately assess the sample of respondents who are administered the scale. Clinically, this information can be useful in assessing floor and ceiling effects. An instrument is needed that can reliably measure the latent trait span from normal to severely disordered children, since measurement of treatment outcomes goes from subclinical symptom levels all the way to normalization. The results of this study indicated that the mADHD-RS subscales have good targeting for the general sample of school children between 6 and 16 years of age, with the exception of students with raw scores below 2 and those with scores higher than 3 SD above the mean. The difference between Cronbach’s alpha and PSI highlight the difference between assessing reliability with CTT and the PCM. While Cronbach’s alpha does not take targeting into account the PSI does. Cronbach’s alpha was considerably higher than the PSI in all of the scales in the mADHD-RS. This was the case because the sample used in this study was made up of children from the general population, which resulted in highly positively skewed distributions, so between 38% and 82% of the sample had raw scores below 3 on the different mADHD-RS scales. However, acceptable precision was obtained at the midrange of the scale (range 9-18), which is important because clinicians typically want to treat referred children so that they go from scores in the upper third (range 18-27) to lower third (range 0-9) of the scales.

Downloaded from asm.sagepub.com at SETON HALL UNIV on September 12, 2014

11

Makransky and Bilenberg Our results are important for clinicians assessing and treating children and youth with ADHD, because they underline the fact that core ADHD symptoms can be rated validly by use of multi-informant rating scales. There is limited information available regarding the MI of the ADHD-RS or other similar instruments; however, this is a fundamental assumption when making decisions about results obtained from different rating sources, or when comparing different demographic groups. Our results provide evidence that the ADHD-RS total score within each of the subscales is a sufficient and valid measure of the severity of ADHD symptoms when teachers and parent ratings are used separately. Furthermore, when compared with age and gender stratified norm distributions it is possible to calibrate the severity of symptoms and measure outcomes. Nevertheless, it is essential that symptoms are only one component of a clinical diagnosis; suffering or functional impairment is often of primary concern for patients and relatives. Consequently, global functional impairment must always be assessed as part of baseline evaluation and when measuring treatment response.

Future Research Although the mADHD-RS functioned well in a general school children sample, it would be beneficial to investigate the generalizability of the results to other populations in other cultural contexts. Furthermore, this study and other existing studies that have assessed the validity of different ADHD scales from an IRT or Rasch perspective have used general population samples (e.g., Gomez, 2008a, 2008b, 2012, 2013; Gomez et al., 2010). Although ADHD/HKD represents latent traits that span a continuum from normal to severely disordered children, it would be beneficial to investigate the validity of the scales within a clinical population. The use of the scales in clinical samples may illuminate specific characteristics of the scales that do not appear when the scales are used in general samples. Future research could assess the validity of these scales by using the PCM to provide more detailed information about how the scales function in these populations. In general, the results showed that very slight adjustments were needed to obtain good fit to the PCM; however, future research should be conducted to investigate if the results of the current analyses can be replicated. Depending on these results, changes may be required to improve the validity and reliability of the mADHD-RS. Consistency of the results may even influence future revisions of the classification systems if redundancies of the diagnostic criteria are evident. Specific criteria may be excluded from the diagnostic assessment if one or more symptoms are overlapping so that the presence of one criterion is trivial in the presence of another.

Declaration of Conflicting Interests The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding The authors received no financial support for the research, authorship, and/or publication of this article.

References American Psychiatric Association (1994). Diagnostic and statistical manual mental disorders (4th edition; DSM-IV). Washington, DC: Author. American Psychiatric Association (2013). Diagnostic and statistical manual mental disorders (5th edition; DSM-V). Washington, DC: Author. Andrich, D., Sheridan, B., & Luo, G. (2010). Rasch models for measurement: RUMM2030. Perth, Australia: RUMM Laboratory. Barkley, R., Gwenyth, E. H., & Arthur, L. R. (1999). Defiant teens. A clinician’s manual for assessment and family intervention. New York, NY: Guilford Press. Barkley, R. A., & Murphy, K. R. (1998). Attention-deficit hyperactivity disorder: A clinical workbook (2nd ed.). New York, NY: Guilford Press. Biederman, J., Faraone, S. V., & Monuteaux, M. C. (2002). Differential effect of environmental adversity by gender: Rutter’s index of adversity in a group of boys and girls with and without ADHD. American Journal of Psychiatry, 159(1), 36-42. Bond, T. G., & Fox, C. M. (2001) Applying the Rasch model: Fundamental measurement in the human sciences. Mahwah, NJ: Erlbaum. Collett, B. R., Ohan, J. L., & Myers, K. M. (2003). Ten-year review of rating scales. V: scales assessing attention-deficit/ hyperactivity disorder. Journal American Academy Child Adolescent Psychiatry, 42, 1015-1037. Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum. DuPaul, G. J. (1991). Parent and teacher ratings of ADHD symptoms: Psychometric properties in a community-based sample. Journal of Child Psychology, 20, 245-253. DuPaul, G. J. (1998). Parent ratings of attention-deficit/hyperactivity disorder symptoms: Factor structure and normative data. Journal of Psychopathology Behavioral Assessment, 20, 83-102. DuPaul, G. J., Power, T. J., Anastopoulos, A., & Reid, R. (1998). ADHD rating scale–IV. New York, NY: Guilford Press. Gomez, R. (2008a). Parent ratings of the ADHD items of the Disruptive Behavior Rating Scale: Analyses of their IRT properties based on the generalized partial credit model. Personality and Individual Differences, 45, 181-186. Gomez, R. (2008b). Item response theory analyses of the parent and teacher ratings of the DSM-IV ADHD Rating Scale. Journal of Abnormal Child Psychology, 36, 865-885. Gomez, R. (2012). Item response theory analyses of adolescent self-ratings of the ADHD symptoms in the Disruptive

Downloaded from asm.sagepub.com at SETON HALL UNIV on September 12, 2014

12

Assessment 

Behavior Rating Scale. Personality and Individual Differences, 53, 963-968. Gomez, R. (2013). DSM-IV ADHD symptoms self-ratings by adolescents: Test of invariance across gender. Journal of Attention Disorders, 17(1), 3-10. Gomez, R., Vance, A., & Gomez, A. (2010). Item response theory analyses of parent and teacher ratings of the ADHD symptoms for recoded dichotomous scores. Journal of Attention Disorders, 15, 269-285. Hays, R. D., Morales, L. S., & Reise, S. P. (2000): Item response theory and health outcomes measurement in the 21st century. Medical Care, 38(9 Suppl.), II28-II42. Kreiner, S., & Christensen, K. B. (2011). Item screening in graphical loglinear Rasch models. Psychometrika, 76, 228-256. Lord, F. M., & Novick, M. R (1968). Statistical theories of mental test scores. Oxford, England: Addison-Wesley. Magnusson, P., Smari, J., Gretarsdottir, H., & Prandardottir, H. (1999). Attention-deficit/hyperactivity symptoms in Icelandic schoolchildren: assessment with the attention deficit/hyperactivity rating scale-IV. Scandinavian Journal of Psychology, 40, 301-306. Makransky, G., & Glas, C. A. W. (2013). Modeling differential item functioning with group-specific item parameters: A computerized adaptive testing application. Measurement, 46, 3228-3237. Retrieved from http://dx.doi.org/10.1016/j.measurement.2013.06.020 Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrica, 47, 149-174. Pallant, J. F., & Tennant, A. (2007). An introduction to the Rasch measurement model: An example using the Hospital Anxiety and Depression Scale (HADS). British Journal of Clinical Psychology, 46, 1-18. Poulsen, L., Jørgensen, S. L., Dalsgaard, S., & Bilenberg, N. (2009). Danish standardization of the ADHD rating scale. Ugeskr Laeger, 171, 1500-1504. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research. Sonuga-Barke, E. J., Brandeis, D., Cortese, S., Daley, D., Ferrin, M., Holtmann, M., . . .Sergeant, J. (2013). Nonpharmacological

interventions for ADHD: systematic review and meta-analyses of randomized controlled trials of dietary and psychological treatments. American Journal of Psychiatry; 170: 275-289. Smith, A. B., Wright, P., Selby, P., & Velikova, G. (2007). Measuring social difficulties in routine patient-centred assessment: A Rasch analysis of the social difficulties inventory. Quality of Live Research; 16: 823-831. Smith, E. V. (2002). Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residuals. Journal Applied Measurement, 3, 205-231. Swanson, J. M., Kraemer, H. C., Hinshaw, S. P., Arnold, L. E., Conners, C. K., & Abikoff, H. B. (2001). Clinical relevance of the primary findings of the MTA: Success rates based on severity of ADHD and ODD symptoms at the end of treatment. Journal of American Academy of Child Adolescent Psychiatry; 40: 168-179. Szomlaiski, N., Dyrborg, J., Rasmussen, H., Schumann, T., Koch, S. V., & Bilenberg, N. (2009). Validity and clinical feasibility of the ADHD rating scale (ADHD-RS): A Danish nationwide multicenter study. Acta Paediatrica; 98: 397-402. Tennant, A., & Conaghan, P. G. (2007). The Rasch measurement model in rheumatology: What is it and why use it? When should it be applied, and what should one look for in a Rasch paper? Arthritis & Rheumatism, 57, 1358-1362. World Health Organization. (1992). The ICD-10 classification of mental and behavioural disorders. Clinical descriptions and diagnostic guidelines. Geneva, Switzerland: Author. Wolraich, M. L., Lambert, W., Doffing, M. A., Bickman, L., Simmons, T., & Worley, K. (2003). Psychometric properties of the Vanderbilt ADHD diagnostic parent rating scale in a referred population. Journal of Pediatric Psychology, 28, 559-567. Wright, B. D. (1996). Local dependency, correlations and principal components. Rasch Measurement Transactions, 10, 509-511. Zhang, S., Faries, D. E., Vowles, M., & Michelson, D. (2005). ADHD rating scale IV: Psychometric properties from a multinational study as a clinician-administered instrument. International Journal Methods Psychiatry Research, 14, 186-201.

Downloaded from asm.sagepub.com at SETON HALL UNIV on September 12, 2014

Psychometric properties of the parent and teacher ADHD Rating Scale (ADHD-RS): measurement invariance across gender, age, and informant.

Attention deficit/hyperactivity disorder (ADHD) is one of the most common psychiatric disorders in childhood and adolescence. Rating the severity of p...
675KB Sizes 0 Downloads 0 Views