Published for the British Institute of Learning Disabilities

Journal of Applied Research in Intellectual Disabilities 2015, 28, 561–571

The Behaviour Problems Inventory-Short Form: Reliability and Factorial Validity in Adults with Intellectual Disabilities Andrea N. Mascitelli*, Johannes Rojahn*, Vias C. Nicolaides*, Linda Moore†, Richard P. Hastings– and Ceri Christian-Jones‡ *George Mason University, Fairfax, VA, USA; †Chrestomathy Inc., Eden Prairie, MN, USA; –University of Warwick, Coventry, UK; ‡Bangor University, Bangor, UK

Accepted for publication 16 October 2013

Background The Behaviour Problems Inventory-Short Form (BPI-S) is a spin-off of the BPI-01 that was empirically developed from a large BPI-01 data set. In this study, the reliability and factorial validity of the BPI-S was investigated for the first time on newly collected data from adults with intellectual disabilities. Methods The sample consisted of 232 adults with intellectual disabilities who represented all levels of intellectual functioning. They were recruited at several day programs in the USA (n = 148) and the UK (n = 84).

Introduction Problem behaviour is a broad term synonymous with maladaptive behaviour and challenging behaviour. Emerson & Bromley (1995) defined problematic behaviour as culturally abnormal and of such intensity, frequency or duration that the physical safety of the person or others is threatened and as a consequence may limit the use of or deny access to ordinary community facilities. These problem behaviours typically include aggression towards others and/or property, self-injurious behaviour (SIB), but also stereotyped and oppositional behaviour. It has also been found repeatedly that SIB, stereotyped behaviour and aggressive/destructive behaviour correlate highly with one another (Rojahn et al. 2008). Prevalence estimates of problem behaviours among individuals with intellectual disabilities are unreliable due to the differences in definitions, sampling methods, reference populations and other methodological factors © 2015 John Wiley & Sons Ltd

Results We found acceptable reliability in terms of internal consistency, inter-rater agreement and test– retest reliability. Confirmatory factor analysis validated the three BPI-S subscale structure. Conclusions We corroborated the factor structure underlying the three subscales and found the BPI-S to have adequate to good psychometric properties in a newly collected sample of adults with intellectual disabilities. Keywords: aggression, assessment, behavior problems inventory, intellectual disability, psychometric, selfinjurious behavior

across different surveys (Rojahn & Esbensen 2002). Emerson (2001) estimated the prevalence at around 14%, while Cooper et al. (2007) found that the rates ranged from 0.1 to 22.5% depending on whether formal diagnostic criteria or more subjective clinical criteria were used. Estimates from one of the largest studies in the US suggest that 8.5% of people with intellectual disabilities presented severe SIB and 12.1% had severe aggressive behaviours (Rojahn et al. 1993). As far as the assessment of behaviour problems is concerned, one can distinguish between direct and indirect measures. Direct assessment involves direct observation and recording of the target behaviour. Indirect assessment instruments gather information about the target behaviour from others (third-party informants) who are familiar with the individual and the behaviour of interest through interviews, checklists or behaviour rating scales. Behaviour rating scales represent indirect assessment and have the advantage that they are relatively inexpensive in terms of time and 10.1111/jar.12152

562 Journal of Applied Research in Intellectual Disabilities

human resources, and because they have standard criteria across individuals. Some behaviour rating instruments measure a wide variety of behaviours, such as the Aberrant Behaviour Checklist (ABC; Aman & Singh 1986) and the Developmental Behaviour Checklist (DBC, Einfeld & Tonge 2002). Others capture select domains, such as the Repetitive Behaviour Scale-Revised (RBS-R; Bodfish et al. 1999) or the Children’s Scale of Hostility and Aggression: Reactive/Proactive (C-SHARP; Farmer & Aman 2009) that assesses aggressive behaviour. As more instruments are developed and compared to one another, their strengths and weaknesses are revealed, allowing practitioners and scientists to make informed choices. The Behaviour Problems Inventory-01 is another informant-based, select domain, behaviour rating instrument for individuals with intellectual disabilities (BPI-01; Rojahn et al. 2001). The BPI has evolved since its initial publication in German (Rojahn 1986). It was originally developed by a priori assignment of items selected from a literature review and other existing instruments into two subscales: SIB and stereotypic behaviour. Later, items related to aggressive behaviour were added by Mulick et al. (1988). Finally, the Stereotyped Behaviour scale replaced the original five stereotypy items (Rojahn et al. 1997, 2000), and the Aggressive/Destructive Behaviour subscale was added to create the three BPI-01 subscales, SIB (14 items), Stereotyped Behaviour (24 items) and Aggressive/ Destructive Behaviour (11 items). The BPI-01 consists of a total of 51 items, which are rated by frequency (0 = never, to 4 = hourly) and severity (0 = no problem, 3 = severe problem). Various researchers have analysed the psychometric properties of the BPI-01 and have found that the reliability and validity of their samples were acceptable to very good (Rojahn et al. 1989; Sturmey et al. 1993, 1995; Sturmey 2001). In 2001, Rojahn et al. conducted a study on 432 teens and adults with intellectual disabilities using the most current version of the BPI-01. They analysed the interrater agreement and retest reliability and conducted a confirmatory factor analysis (CFA). The CFA corroborated the three-factor structure based on the three subscales; all but three items were significant at the 0.05 level. Intraclass correlation coefficients (ICCs) were calculated to determine reliabilities among raters and across time. Regarding inter-rater agreement, the mean ICCs for the items in the SIB, Stereotyped Behaviour and Aggressive/Destructive Behaviour subscales were 0.75 (range of 0.17–1.00), 0.69 (range of

0.02 to 1.00) and 0.58 (range of 0.09–1.00), respectively. Overall, the full scale yielded an inter-rater ICC of 0.92, with the SIB subscale having a global ICC of 0.96, Stereotyped Behaviour subscale had an ICC of 0.90 and the Aggressive/Destructive Behaviour subscale had a value of 0.59. Mean retest reliability coefficients for the items in the SIB, Stereotyped Behaviour and Aggressive/Destructive Behaviour subscales were as follows: 0.65 (range of 0.18–0.96), 0.51 (range of 0.01 to 0.90) and 0.54 (range of 0.17–0.73), respectively, The retest reliabilities for the subscales of SIB, Stereotyped Behaviour and Aggressive/Destructive Behaviour were 0.71, 0.76 and 0.64, respectively. The full scale’s internal consistency was high with an a-coefficient of 0.83. The a-coefficient for the SIB subscale’s was 0.61, the Stereotyped Behaviour subscale’s was 0.79, and the Aggressive/Destructive Behaviour subscales was 0.82. Several subsequent studies have provided evidence for the convergent, divergent (Rojahn et al. 2003) and concurrent validity (Gonzalez et al. 2008; Hill et al. 2008) of the BPI-01. After several years of use in determining instrument validity, treatment effectiveness, and in assessing individuals, the authors of the BPI-01 decided to retrospectively and empirically develop a shortened version based on a BPI-01 data set consisting of 1122 cases from several different regions. The Behaviour Problems Inventory-Short Form (BPI-S; Rojahn et al. 2012a) is a BPI-01 spin-off consisting of the same three constructs with fewer items: SIB (eight items), Stereotyped Behaviour (12 items) and Aggressive/ Destructive Behaviour (10 items). Previously, internal consistency, construct validity and confirmatory and discriminant validity have been established for the BPI-S through retrospective data analysis (Rojahn et al. 2012b). In this study, a-coefficients of the BPI-01 and BPI-S were calculated to determine internal consistencies. The coefficients for the BPI-01 and BPI-S SIB subscales were 0.74 (frequency), 0.73 (severity), and 0.70 (frequency), 0.68 (severity), respectively. Stereotyped Behaviour subscale coefficients for the BPI01 and BPI-S frequency and severity scales were 0.92 (frequency), 0.90 (severity), 0.88 (frequency) and 0.86 (severity), respectively. The Aggressive/Destructive Behaviour subscale coefficients for the BPI-01 and BPI-S were 0.89 or 0.90 across both frequency and severity. Furthermore, the factor structure of the BPI-S was compared to the BPI-01, and they were found to be very similar with fit indices suggesting that the proposed three-factor model fits the data well (Rojahn et al. 2012b). The aim of this study was to assess © 2015 John Wiley & Sons Ltd, 28, 561–571

Journal of Applied Research in Intellectual Disabilities

the psychometric properties, including the internal consistency, the inter-rater and retest reliabilities and factorial validity of the BPI-S with newly collected data on adults with intellectual disabilities from two organizations.

Method Participants and procedures The sample consisted of 232 adults with intellectual disabilities from a day-program organization with three geographical locations in a large urban area of Minnesota, USA (n = 148) that accepts referrals for individuals with behaviour problems and a residential service organization in Wales, UK (n = 84). The age of the adults of the aggregated sample ranged from 16 to 71 years old (M = 36.5, SD = 11.9). Most of the participants were male (n = 157, 67.7%) and Caucasian (n = 196, 84.5%). The ethnicities of the remaining 36 participants were African American (n = 16, 6.9%), Asian/Pacific Islander (n = 9, 3.9%), American Indian (n = 4, 1.7%), African British (n = 4, 1.7%), Hispanic (n = 1, 0.4%) and ‘other’ (n = 2, 0.9%). The participants represented all levels of intellectual functioning including mild (n = 54, 23.3%), moderate (n = 74, 31.9%), severe (n = 66, 28.4%) and profound intellectual disability (n = 33, 14.2%). Five participants (2.2%) demonstrated average to below average intellectual functioning. Seventy-five (32.3%) had a diagnosis of autism spectrum disorder (i.e. autistic disorder, Asperger’s disorder or pervasive developmental disorder – not otherwise specified). Most of these participants had verbal communication skills (n = 148, 63.8%), and a large proportion had a diagnosis of a seizure disorder (n = 94, 40.5%). Table 1 shows the prevalence of demographic variables across the two locations and the aggregated sample. In Minnesota, the BPI-S was completed twice for each participant by two different sets of raters (primary raters and secondary raters) for a total of four administrations per participant. The group of primary raters consisted of 15 staff members who completed the BPI-S for the same participants on two different occasions, while the secondary raters were 15 additional staff members who also administered the BPI-S to their participants on two different occasions. The primary and secondary raters were consistent across the two administrations. The BPI-S was administered once to the participants in Wales by a single set of raters. Senior program staff members who © 2015 John Wiley & Sons Ltd, 28, 561–571

563

were well acquainted with their clients and their behaviours completed the BPI-S.

Instrument The BPI-S is a shortened version of the BPI-01, an informant-based assessment instrument developed to measure the occurrence, the frequency and the severity of challenging behaviours (SIB, Aggressive/Destructive Behaviour and Stereotyped Behaviour) in adults and children with intellectual disabilities. The number of items in the BPI-01 was reduced to form the BPI-S by combining items that were highly correlated, by removing poorly prevalent or performing items, and by examining the consequent changes in Cronbach’s a (Rojahn et al. 2012b). In the BPI-S, eight items make up the SIB subscale, 10 items the Aggressive/Destructive Behaviour subscale and the Stereotyped Behaviour subscale consists of 12 items. For each item, there is a 5-point scale to score frequency (0 = never, 1 = monthly, 2 = weekly, 3 = daily, 4 = hourly) and a 4-point scale to score severity (0 = not a problem, 1 = slight problem, 2 = moderate problem, 3 = severe problem). This is similar to the BPI01, with the exception that rating anchors for the severity scales were added to the SIB and Aggressive/Destructive Behaviour subscales for the BPI-S.

Data analysis The data were analysed using the SPSS software PASW Statistics v. 18.0 (SPSS Inc., Chicago, IL) except for the CFA that was conducted with the MPLUS version 4.1 software (Muthen & Muthen, Los Angeles, CA). Different data sets were created for the various analyses performed. For the total sample (N = 232), the BPI-S ratings from the UK sample (n = 84) were merged with data gathered during the first assessment period by the primary raters from the US sample (n = 148). The data were arranged in either a racked or stacked orientation, depending on the statistical procedure being performed and the assumptions associated with each.1 The data from each administration were either racked horizontally to yield a sample size of 232 (equal to the 1

Stacking data is a common tool in psychometric analysis since tests of model fit do not require independence of observations as is required in hypothesis testing because the consistency among test items is being evaluated rather than the consistency among participants (Wright 2003). Therefore, we used stacked data for estimates of internal consistency and also for a confirmatory factor analysis.

564 Journal of Applied Research in Intellectual Disabilities

Table 1 Demographic information of the sample, total and by individual samples Total (N = 232) Characteristic Sex Male Female Age groups 16–27 28–37 38–45 46–71 ASD1 diagnosis No Yes Ethnicity Caucasian African American Asian/Pacific Islander American Indian African British Hispanic Other Level of MR No MR2 Mild Moderate Severe Profound Seizure disorder diagnosis No Yes Verbal communication No Yes

UK (n = 84)

USA (n = 148)

N

%

n

%

n

%

157 75

67.7 32.3

54 30

64.3 35.7

103 45

69.6 30.4

64 59 56 53

27.6 25.4 24.2 22.8

14 16 33 21

16.7 19 39.3 25

50 43 23 32

33.8 29 15.6 21.6

157 75

67.7 32.3

71 13

84.5 15.5

86 62

58.1 41.8

196 16 9 4 4 1 2

84.5 6.9 3.9 1.7 1.7 0.4 0.9

77 – 3 – 4 – –

91.7 – 3.6 – 4.8 – –

119 16 6 4 – 1 2

80.4 10.8 4.1 2.7 – 0.7 1.4

5 54 74 66 33

2.2 23.3 31.9 28.4 14.2

5 25 31 23 –

6.0 29.8 36.9 27.4 –

– 29 43 43 33

– 29.6 29.1 29.1 22.3

138 94

59.5 40.5

45 39

53.6 46.4

93 55

62.8 37.2

84 148

36.2 63.8

24 60

28.6 71.4

60 88

40.5 59.5

Autism spectrum disorder (i.e. autistic disorder, Asperger’s disorder, or pervasive developmental disorder – not otherwise specified). 2 Average to below average intellectual functioning. 1

number of participants) or stacked vertically yielding 676 individual completed BPI-S questionnaires. Subscale scores were computed for each BPI-S subscale: SIB, Aggressive/Destructive Behaviour and Stereotyped Behaviour. To calculate subscale scores, the frequency scores and severity scores2 were summed separately.

2

Contrary to the BPI-01, the Stereotyped Behaviour subscale of the BPI-S does not have severity scores.

There were several instances of missing data. Two participants from the Minnesota sample were missing all values for one of their four administration times. These participants were not included in inter-rater and test–retest reliability analyses. There were also data missing in 21 participants from the Minnesota sample, for a total of 59 missing ratings. The last observed rating for each participant was carried forward and imputed into the missing ratings in these cases. Frequency and severity ratings for item 18 (‘Bullying’) were missing for 19 participants from the Welsh sample as well as data missing © 2015 John Wiley & Sons Ltd, 28, 561–571

Journal of Applied Research in Intellectual Disabilities

at random (eight participants, 18 total missing ratings). These missing data were treated as data entry errors in which the transcriber skipped entering several values of 0, the most common rating. A rating of 0 was imputed into each of these missing cases to correct the error.

Results Table 2 represents item endorsement across the racked aggregated sample and individual samples. Prevalence rates were computed by transforming the ratings into dichotomous values. Aggressive/Destructive Behaviour items had the highest rates of endorsement overall, with the aggregated sample demonstrating 20.7% endorsement of item #12 (‘Biting others’) to 60.8% of item #17 (‘Destroying things’). The individual sample prevalence of Aggressive/Destructive Behaviour ranged from 9.5% (#12) of the Welsh sample to 75.7% (#17) of the Minnesota sample. Stereotyped Behaviour had the second highest prevalence rates, ranging from 17.7% endorsement of item #29 (‘Clapping hands’) to 44.0% of item #24 (‘Yelling/screaming’) in the aggregated sample. Finally, SIB items demonstrated the lowest prevalence rates ranging from 3% endorsement of item #6 (‘Inserting objects in nose, ears, anus, etc.’) to 32.8% of item #2 (‘Head hitting’) in the aggregated sample. Overall, the Welsh sample demonstrated lower prevalence of the behaviours measured by the BPI-S than the Minnesota sample. The means and standard deviations of the frequency and severity subscales are shown in Table 2 as well. Overall, and consistent with the pattern of data on individual items, the Welsh sample had lower means and standard deviations than the Minnesota sample. Taking a closer look at the Minnesota sample, Table 3 displays the means and standard deviations of the frequency and severity subscales across the four test administrations. During the second test administration, secondary raters rated participants most highly on SIB frequency and severity and Stereotyped Behaviour frequency, while during the first administration, primary raters rated Aggressive/Destructive Behaviour frequency and severities highest.

Reliability Internal consistency Cronbach’s alpha (a) coefficients (Cronbach 1951) were calculated to determine the internal consistency of the BPI-S subscales for the stacked sample and for the two © 2015 John Wiley & Sons Ltd, 28, 561–571

565

individual samples. For the stacked sample (N = 676), the a-coefficient for the entire scale was 0.91 (across the three frequency subscales), while a for the frequency and severity scales separately were 0.89 and 0.83, respectively. The resulting a-coefficient for the SIB scale was 0.85, with separate frequency and severity scale a-coefficients of 0.73 and 0.70, respectively. The frequency and severity scales of the Aggressive/Destructive Behaviour domain demonstrated internal consistency coefficients of 0.78 and 0.86, respectively, with a total subscale a-coefficient of 0.89. The Stereotyped Behaviour subscale, which only consists of frequency, had an a-coefficient of 0.86. Table 4 displays the a-coefficients for the stacked sample and the individual administrations. Overall, the Welsh sample (n = 84) demonstrated relatively weaker internal consistency. The SIB frequency and severity scales demonstrated values of 0.44 and 0.45, while the Minnesota sample yielded a-coefficients ranging from 0.75 to 0.72. The Welsh sample also had weaker internal consistency of Stereotyped Behaviour items (0.75 compared to an average a of 0.86 for the Minnesota sample), but stronger internal consistency of the Aggressive/Destructive Behaviour subscale (0.80 and 0.89 for frequency and severity scales compared to average a-coefficients of 0.78 and 0.85 for the Minnesota sample).

Inter-rater reliability Inter-rater reliability was determined for the Minnesota sample only (n = 147), as their participants were the only ones to be rated by multiple raters. The degree of agreement for subscale scores was expressed by ICC computed under a two-way random effects model, in which single measure coefficients have been reported and interpreted (Shrout & Fliess 1979). The results are shown in Table 5. Values are reported for the frequencies and severities of each subscale. Overall, primary raters agreed more on the frequency scales than the severity scales. The ICC coefficients for the total subscales demonstrated moderate to substantial between-rater agreement and ranged from 0.46 (Aggressive/Destructive Behaviour) to 0.66 (Stereotyped Behaviour), although the SIB frequency subscale alone yielded the highest reliability estimate of 0.74 (Nunnally & Bernstein 1994).

Test–retest The lag between first and second BPI-S administration was an average of 42 days, ranging from 31 to 57 days.

566 Journal of Applied Research in Intellectual Disabilities

Table 2 Descriptive scale statistics (M, SD) and endorsement of Behaviour Problems Inventory-Short Form items (n, %) across the samples Total (N = 232)

SIB subscale Frequency Severity 1. Self-biting 2. Head hitting 3. Body hitting 4. Self-scratching 5. Pica 6. Inserting objects 7. Hair pulling 8. Teeth grinding A/D Behaviour subscale Frequency Severity 09. Hitting others 10. Kicking others 11. Pushing others 12. Biting others 13. Grabbing/pulling others 14. Scratching others 15. Pinching others 16. Verbal abuse to others 17. Destroying things 18. Bullying Stereotyped Behaviour subscale Frequency 19. Rocking 20. Sniffing 21. Waving/shaking arms 22. Manipulating objects 23. Hand/finger movements 24. Yelling/screaming 25. Pacing/jumping/bouncing 26. Rubbing self 27. Gazing at hands/objects 28. Bizarre body posture 29. Clapping hands 30. Grimacing

M

SD

3.4 2.4

4.1 2.6

n

68 76 56 58 37 7 31 28 7.7 6.9

11.0

USA1,2 (n = 148)

UK (n = 84) %

M

SD

2.6 2.0

3.1 2.4

29.3 32.8 24.1 25.0 15.9 3.0 13.4 12.1

6.2 6.4

19 19 16 23 7 4 6 10 5.3 5.4

137 88 135 48 122 80 57 109 141 78

59.1 37.9 58.2 20.7 52.6 34.5 24.6 47.0 60.8 33.6

101 56 62 65 70 102 87 61 69 51 41 73

43.5 24.1 26.7 28.0 30.2 44.0 37.5 26.3 29.7 22.0 17.7 31.5

11.0

n

7.5

%

M2

SD2

4.5 3.0

5.2 3.2

22.6 22.6 19.0 27.4 8.3 4.8 7.1 11.9

5.4 6.5

8.3 7.4 36 25 26 8 24 16 13 51 29 26

42.9 29.8 31.0 9.5 28.6 19.0 15.5 60.7 34.5 31.0

28 15 15 12 18 32 30 16 16 11 12 17

33.3 17.9 17.9 14.3 21.4 38.1 35.7 19.0 19.0 13.1 14.3 20.2

7.9

13.7

n1

%1

49 57 40 35 30 3 25 18

33.1 38.5 27.0 23.6 20.3 2.0 16.9 12.2

101 63 109 40 98 64 44 58 112 52

68.2 42.6 73.6 27.0 66.2 43.2 29.7 39.2 75.7 35.1

73 41 47 53 52 70 57 45 53 40 29 56

49.3 27.7 31.8 35.8 35.1 47.3 38.5 30.4 35.8 27.0 19.6 37.8

6.1 5.8

11.9

SIB, self-injurious behaviour. USA endorsement statistics calculated from data obtained during the first administration by the primary raters only. 2 USA descriptive statistics across all four administrations. 1

Test–retest reliability of the subscales was analysed for the Minnesota sample only (n = 146 for primary raters, n = 148 for secondary raters by computing Pearson’s r correlation coefficients). Table 6 shows the test–retest

reliability values for this sample for the two sets of raters. Correlations were calculated across the frequency and severities of each subscale. The strongest correlation was found among the primary raters with the © 2015 John Wiley & Sons Ltd, 28, 561–571

Journal of Applied Research in Intellectual Disabilities

567

Table 3 Descriptive scale statistics across the five Behaviour Problems Inventory-Short Form measures in the USA sample Primary raters

Secondary raters

Time 1 (n = 147) M

Time 2 (n = 147)

SD

SIB (eight items) Frequency 3.9 4.5 Severity 2.6 2.7 Aggressive/Destructive Behaviour (10 items) Frequency 9.1 6.1 Severity 7.8 6.2 Stereotyped Behaviour (12 items) Frequency 13.0 11.9

Time 1 (n = 148)

Time 2 (n = 148)

M

SD

M

SD

M

SD

4.2 2.7

5.0 3.0

4.6 3.1

4.9 3.2

5.3 3.6

6.5 3.9

8.8 7.8

6.2 6.0

7.5 7.0

5.4 5.1

7.9 6.9

6.8 5.8

12.3

11.6

13.7

11.6

15.6

12.6

SIB, self-injurious behaviour.

Table 4 Internal consistencies of the BPI-S total scale and subscales (Cronbach’s a) Aggregate (stacked) (n = 676)a

UK (n = 84)a

BPI-S scale (48 items) Frequency (30 items) 0.89 0.84 Severity (18 items) 0.83 0.86 SIB (16 items) Frequency 0.73 0.44 (eight items) Severity 0.70 0.45 (eight items) Aggressive/Destructive Behaviour (20 items) Frequency (10 items) 0.78 0.80 Severity (10 items) 0.86 0.89 Stereotyped Behaviour (12 items) Frequency 0.86 0.75

USA (n = 592) Mean1 a

0.89 0.82 0.75

Time 1 (n = 147)ICC SIB Frequency 0.74 Severity 0.58 Aggressive/Destructive Behaviour Frequency 0.58 Severity 0.38 Stereotyped Behaviour Frequency 0.66

Time 2 (n = 147)ICC

0.68 0.61 0.58 0.50 0.55

0.72 ICC, intraclass behaviour.

correlation

coefficient;

SIB,

self-injurious

0.78 0.85

Factor structure 0.86

BPI-S, Behaviour Problems Inventory-Short Form; SIB, selfinjurious behaviour. 1 Mean a across all four administrations.

Stereotyped Behaviour subscale [r(146) = 0.91, P < 0.01], and the weakest correlation was among the secondary raters with the Aggressive/Destructive Behaviour subscale [r(148) = 0.66, P < 0.01]. Overall, primary raters demonstrated stronger test–retest reliability than secondary raters, with respective ranges of total subscale scores of 0.79 (Aggressive/Destructive Behaviour) to 0.91 (Stereotyped Behaviour) and 0.66 (Aggressive/ Destructive Behaviour) to 0.84 (SIB). © 2015 John Wiley & Sons Ltd, 28, 561–571

Table 5 Inter-rater reliability between primary rater and secondary rater for Minnesota sample (n = 148)

Confirmatory factor analysis A CFA was conducted on the stacked sample (N = 676). Excluding severity ratings from the factor analysis eliminated the chances of the construct of severity appearing as a fourth factor within the sample. Frequency ratings alone depict the latent factors more accurately and are adequate indicators as the frequency and severity scores of the current sample have a strong correlations ranging between r(232) = 0.82, P < 0.001 and r(232) = 0.92, P < 0.001. These findings were similar to the findings of Rojahn et al. (2001). Two analyses were conducted to determine which factor model best fit the sample. The expected model is that the items appropriated to each subscale would

568 Journal of Applied Research in Intellectual Disabilities

Table 6 Test–retest reliability (Pearson correlations) between time 1 and time 2 for the USA sample

SIB Frequency Severity Aggressive/Destructive Frequency Severity Stereotyped Behaviour Frequency

Primary rater (n = 146)

Secondary rater (n = 148)

0.90** 0.83** Behaviour 0.83** 0.76**

0.84** 0.87** 0.69** 0.65**

0.91**

0.78**

** p < .01 significance. SIB, self-injurious behaviour.

construct three oblique factors [SIB (items 1–8), Aggressive/Destructive Behaviour (items 9–18) and Stereotyped Behaviour (items 19–30)]. In comparison, a one-factor solution in which all items were hypothesized to measure a single factor was also analysed. Resulting chi-square test of model fit, root mean square error of approximation (RMSEA), standardized root mean square residual (SRMS), comparative fit index (CFI) and Tucker-Lewis index (TLI) values confirm that the three-factor solution, v2(402) = 2018.8, P < 0.001; CFI = 0.74, TLI = 0.72, SRMS = 0.08, RMSEA = 0.08 (90% CI = 0.075–0.081), is a better fit than the one-factor solution, v2(405) = 2922.7, P < 0.001; CFI = 0.60, TLI = 0.57, SRMR = 0.09, RMSEA = 0.1 (90% CI = 0.094 to 0.100). Table 7 includes factor loadings for each item according to the three-factor solution. The items hypothesized to fit under the Stereotyped Behaviour factor had the strongest loadings overall, ranging from 0.47 to 0.72 (M = 0.57). The Aggressive/Destructive (A/ D) Behaviour and SIB factors had one or two items that did not load well, particularly items #16 (‘Verbal abuse to others’) and #18 (‘Bullying’) as well as #8 (‘Teeth grinding’) has low factor loadings of 0.08, 0.26 and 0.23, respectively. Overall, the A/D factor had item loadings ranging from 0.08 to 0.72 (M = 0.52) and the SIB factor had item loadings of 0.23–0.72 (M = 0.28). Item of variance accounted for by the factor, R2, or effect sizes, is also reported in Table 7. R2 values for the items in the Stereotyped Behaviour subscale ranged from 0.22 to 0.52 (M = 0.33), the Aggressive/Destructive Behaviour item effect sizes ranged from 0.01 to 0.52 (M = 0.31) and the SIB subscale items ranged from 0.05 to 0.52 (M = 0.19).

Item-total correlations Item-total correlations were calculated by correlating the frequency score of each item to the average frequency score of the remaining items within the subscale. The last three columns of Table 7 depict these correlation coefficients. The coefficients in bold denote the correlations between the item and the assigned subscale score, while the coefficients in regular font indicate cross-subscale correlations. The correlation coefficients suggest that all of the items correlate best with their assigned factors.

Discussion The Rojahn et al. (2012a) paper presented the heretofore only psychometric analysis of the BPI-S, which, however, was based on a retrospectively constructed version of the instrument derived from archival BPI-01 data (Rojahn et al. 2012b). The importance of the present study is that it represents the first reliability and factor analyses on a newly collected sample of adults with intellectual disabilities. The results were consistent with the findings reported in the retrospective analysis of the BPI-S (Rojahn et al. 2012a) indicating that the current samples by and large have decent psychometric properties, with some variance between the Welsh and the Minnesota locations. The results suggested that the BPI-S is a reliable measurement with good to excellent internal consistency (>0.9 = excellent, >0.8 = good, >0.7 = acceptable, >0.6 = questionable, >0.5 = poor and 0.80 = nearly perfect, >0.75/80 = excellent, 0.60–0.74/0.79 = good, 0.40–0.59 = fair,

The Behaviour Problems Inventory-Short Form: Reliability and Factorial Validity in Adults with Intellectual Disabilities.

The Behaviour Problems Inventory-Short Form (BPI-S) is a spin-off of the BPI-01 that was empirically developed from a large BPI-01 data set. In this s...
138KB Sizes 0 Downloads 7 Views