551551

research-article2014

ASMXXX10.1177/1073191114551551AssessmentGirard et al.

Article

Wechsler Adult Intelligence Scale–IV Dyads for Estimating Global Intelligence

Assessment 2015, Vol. 22(4) 441­–448 © The Author(s) 2014 Reprints and permissions: sagepub.com/journalsPermissions.nav DOI: 10.1177/1073191114551551 asm.sagepub.com

Todd A. Girard1, Bradley N. Axelrod2, Ronak Patel1, and John R. Crawford3

Abstract All possible two-subtest combinations of the core Wechsler Adult Intelligence Scale–IV (WAIS-IV) subtests were evaluated as possible viable short forms for estimating full-scale IQ (FSIQ). Validity of the dyads was evaluated relative to FSIQ in a large clinical sample (N = 482) referred for neuropsychological assessment. Sample validity measures included correlations, mean discrepancies, and levels of agreement between dyad estimates and FSIQ scores. In addition, reliability and validity coefficients were derived from WAIS-IV standardization data. The Coding + Information dyad had the strongest combination of reliability and validity data. However, several other dyads yielded comparable psychometric performance, albeit with some variability in their particular strengths. We also observed heterogeneity between validity coefficients from the clinical and standardization-based estimates for several dyads. Thus, readers are encouraged to also consider the individual psychometric attributes, their clinical or research goals, and client or sample characteristics when selecting among the dyadic short forms. Keywords WAIS-IV, two-subtest short forms, intelligence assessment Wechsler Intelligence Scales (WIS) have long been a core component of cognitive assessment across various settings and populations (Kaufman, 1990; Piotrowski, 1999; L. A. Rabin, Barr, & Burton, 2005; Thompson, LoBello, Atkinson, Chisholm, & Ryan, 2004; Watkins, Campbell, Nieberding, & Hallmark, 1995; Wolber, Reynolds, Ehrmantraut, & Nelson, 1997). Short forms and brief cognitive tools date back about a century, but their continued use and development appears particularly relevant in the recent climate of mental health care delivery that demands efficiency in psychological assessment (Eisman et al., 2000; Piotrowski, 1999). Moreover, long administration times for comprehensive assessment can place strains on some individuals’ levels of tolerance and motivation, and unduly challenge those with certain psychological, sensory, motor, and/or other physical constraints, which may be unnecessary in situations requiring only a rough gauge or screening of global intelligence (Crawford, Allum, & Kinion, 2008). For example, a quick gauge of global intellectual functioning may be sought under conditions of time constraints, accommodating idiosyncratic testing factors (e.g., poor stamina), or for reevaluations/monitoring of clients at follow-ups to initial more comprehensive testing (Christensen, Girard, & Bagby, 2007; Sattler & Ryan, 2009). Similarly, short forms are used to quickly characterize global intelligence of research samples (e.g., Girard, Christensen, & Rizvi, 2010). Both time-savings and such client characteristics were key reasons reported for

use of short-form and brief intelligence tests in an international survey by Thompson et al. (2004). They also reported that the most commonly used short forms were those derived from selected subtests of WIS. In addressing the need for shortened tests, the Wechsler Abbreviated Scale of Intelligence (WASI; Wechsler, 1999) was designed to provide brief (two or four subtests) alternatives to the lengthy WAIS-III (Wechsler Adult Intelligence Scale–III; Wechsler, 1997) for assessing global intelligence, as measured by the full-scale intelligence quotient (FSIQ). While the WASI is an independent brief instrument, several short forms are readily derived by using selected subtests directly from the WAIS. Past research indicated mixed support regarding WASI estimates of WAIS-III scores, and between WASI and WAIS-III SF scores. For example, although comparable, Axelrod (2002) reported a higher validity coefficient and prediction accuracy for the WAISIII dyad of Vocabulary + Matrix Reasoning than the 1

Ryerson University, Toronto, Ontario, Canada John D. Dingell Department of Veterans Affairs Medical Center, Detroit, MI, USA 3 University of Aberdeen, Aberdeen, UK 2

Corresponding Author: Todd A. Girard, Department of Psychology, Ryerson University, 350 Victoria Street, Toronto, Ontario, M5B 2K3, Canada. Email: [email protected]

Downloaded from asm.sagepub.com at CAMBRIDGE UNIV LIBRARY on August 9, 2015

442

Assessment 22(4)

two-subtest WASI composed of Vocabulary + Matrix Reasoning. However, the WAIS-III dyad tended to overestimate FSIQ to a greater degree than the WASI dyad. Since the study by Axelrod (2002), Pearson Assessments developed newer editions of both tests: the WAIS-IV (Wechsler, 2008) and WASI-II (Wechsler, 2011). However, comparability of WASI-II and the WAIS-IV has not yet been demonstrated independently from the publisher and assessment of WAIS-IV short-forms is nascent. Use of WAIS-IV short forms offers a practical advantage in that one only requires the WAIS-IV whether interested in a full assessment or a quicker estimate of IQ (depending on client or research sample), without having to also have the WASI-II purchased. If further testing is subsequently desired, remaining subtests can be administered. Although the WASI-II can similarly be substituted for the corresponding subtests in such a follow-up full assessment (i.e., administering the remaining subtests), WAIS-IV short forms provide greater flexibility to select different subtest combinations depending on client or research specific goals. Therefore, the current report focuses on evaluation of two-subtest short forms of WAIS-IV FSIQ. The WAIS-IV is the most recent full-scale WIS (Wechsler, 2008). Updates to the WAIS-IV include changes at the item, subtest, scoring, and conceptual levels. For instance, scales were modified to increase the developmental appropriateness, enhance psychometric attributes, and reduce biased measures (e.g., increased item range, decreased emphasis on demands for motor dexterity and speed on visual–spatial tasks, decreased emphasis on auditory processing for working memory, new items, modified instructions, discontinue rules, and scoring rules). At the conceptual level, the WAIS-IV further integrates the Cattell–Horn–Carroll theoretical framework, with increased measurement of fluid intelligence. At the subtest level, these changes are instantiated with the removal of two subtests (Object Assembly and Picture Arrangement) and addition of three new tests (Visual Puzzles, Figure Weights, and Cancellation). Of the latter, Visual Puzzles is now one of 10 core subtests contributing added measurement of fluid reasoning in the calculation of FSIQ. The WAIS-IV expands on its predecessors in several important ways, also claiming to reduce administration time by 15%. Nonetheless, the WAIS-IV is still estimated to take 60 to 90 minutes to administer the 10 core subtests required for FSIQ for normative samples, and often longer for some clinical samples. These modifications warrant psychometric evaluation of short-form estimates of intelligence based on the new WAIS-IV. Prior studies have examined WIS dyads using different versions of the WIS. Silverstein’s (1982) dyad of Vocabulary + Block Design is an example of a popular short form originally derived from the WAIS-R. This dyad was also recommended by Sattler and Ryan (1998, 2009) for both the

WAIS-III and WAIS-IV. Merchan-Naranjo et al. (2012) recently reported that out of five dyads assessed, only Information + Block Design performed adequately for use with Asperger syndrome. This finding underscores the need to also assess the utility of short forms in clinical populations. Sattler and Ryan (2009) report on the validity and reliability of a set of top 10 dyads plus some deemed particularly suitable for specific clinical issues (e.g., hearing impairment). The coefficients were derived from the WAIS-IV standardization data and dyadic short-form estimates of FSIQ have not yet been empirically evaluated for use with clinical samples. Of note, some dyads yielded comparable validity and/or reliability measures as longer short forms (triads-pentads). The dyads with the highest validity coefficients included Vocabulary paired with any of the Block Design, Visual Puzzles, or Figure Weights subtests. All three dyads yielded correlations with FSIQ of r′ = .87 to .88 (Sattler & Ryan, 2009); r′ reflects r values after correcting for redundant error variance (Girard & Christensen, 2008). Umfleet, Ryan, Gontkovsky, and Morris (2012) subsequently evaluated short forms for the Verbal Comprehension and Perceptual Reasoning indices. Despite focus in the short-form literature on the correlation coefficient as one useful index of validity, it is important to note that a high correlation does not mean high agreement and the strength of correlation depends on the range of data (Bland & Altman, 1986; Spinks et al., 2009). For example, a short form with a perfect correlation may consistently overestimate the full-scale score. Moreover, short-form validity should be assessed as a multifaceted construct (Boone, 1990; Silverstein, 1990; Spinks et al., 2009; Thompson, Howard, & Anderson, 1986). The purpose of the present study was to evaluate all possible dyad combinations using the 10 core subtests from the WAIS-IV. Thus, we provide an assessment of the reliability and several measures of validity of WAIS-IV dyads based on data from the standardization sample and a large clinical sample. While we encourage readers to weigh each of these validity measures to their own accord, we also derive an unbiased composite measure of psychometric performance to aid evaluation of the dyads. That is, as Cyr and Brooker (1984) note, it may be difficult for many users to mentally derive a composite summary interpretation of short-form performance across measures, even when just comparing across two indices. They ranked short forms using a metric of “psychometric effectiveness” calculated as the average of the reliability and validity coefficients. Here we extend this approach in providing Rc as a composite measure of psychometric performance incorporating the multiple indices assessed. Silverstein (1990) not only acknowledged the advantage of this integrative approach but also noted that separate reliability and validity indices may be of interest in their own right under different situations. Thus, we provide an overall summary score (Rc) as well as the data for each

Downloaded from asm.sagepub.com at CAMBRIDGE UNIV LIBRARY on August 9, 2015

443

Girard et al. psychometric index separately to maximize information for and comprehension by readers.

Method Clinical Sample Our sample was composed of 482 persons administered at least the 10 core subtests from the WAIS-IV through a medical neuropsychology consultation clinic within an urban veteran’s medical center with referrals from neurology, psychiatry, and primary care. The sample was mostly male (93.6%), and primarily of Caucasian (63.5%) and African American (32.8%) self-identified ethnicity. The participants averaged 51.1 years of age (SD = 17.9), had 12.8 (SD = 2.0) years of education, and generated a mean FSIQ in the high end of the low average range (M = 88.9; SD = 14.2). The FSIQ data were normally distributed (Mdn = 89.0; skew = 0.14, SE = 0.11; kurtosis = 0.02, SE = 0.22) with scores ranging from 51 to 144.

Procedures WAIS-IV subtests had been administered and scored according to standardized procedures (Wechsler, 2008) as part of clinical evaluations. The archival data were deidentified and approved for research use in accordance with policy at the host medical facility.

Analyses In total, we calculated deviation quotients (DQs) for all 45 dyad combinations of the 10 core WAIS-IV subtests. DQs are derived via linear scaling of composite subtest scaled scores to obtain scores sharing the WIS full-scale mean of 100 and SD of 15 (FSIQ itself being a DQ; Tellegen & Briggs, 1967). Composite reliability (rxx) and correlation coefficients (rstd) between the short-form dyad DQs and FSIQ were calculated using the WAIS-IV standardization data as per methods outlined by Crawford and colleagues (Crawford, Anderson, Rankin, & MacDonald, 2010; Crawford et al., 2008), which apply long-standing methods in the short-form literature (Levy, 1967; Moiser, 1943; Tellegen & Briggs, 1967). Consistent with prior reports (e.g., Girard, Axelrod, & Wilkins, 2010; Sattler & Ryan, 2009), internal consistency reliabilities from the standardization sample were used for this purpose, with the exception of Symbol Search and Coding for which only test–retest reliabilities are available (Wechsler, 2008). Correlations (r) were also computed for each dyad based on the neuropsychological sample data. All correlations were corrected (r′std; r′) for redundant error variance using Levy’s (1967) formula because of dyad subtests being embedded within the full-scale administration (Girard & Christensen, 2008).

In addition, the magnitude of differences between each of the dyad DQs and FSIQ were assessed using paired t tests. Taking into account both the consistency of scores and systematic differences between dyad and full-scale scores, we also used the intraclass correlation coefficient (ICC) using a two-way model assessing absolute agreement (model A.1 in McGraw & Wong, 1996). As an additional measure of agreement, we calculated the proportion of DQs for each dyad falling within ± 10 IQ points of participants’ respective FSIQ scores. This level of agreement was selected as a meaningful range corresponding to that of the qualitative categories used in WIS test interpretation (e.g., 80-90 = below average, 110-120 = above average). Nonetheless, it is possible that different interval widths may affect measurement sensitivity and/or conclusions. To address this concern, we ran parallel analyses using intervals of ±2 and 5 IQ points, roughly corresponding to ±1 and 2 SEM, respectively (SEM is 2.16 for WAIS-IV FSIQ; Wechsler, 2008). Last, to facilitate interpretation across the multiple psychometric indices, a composite measure of the above psychometric measures (Rc) was computed adapting the approach of Cyr and Brooker (1984). More specifically, whereas Cyr and Brooker averaged the reliability and validity (Pearson’s r) coefficients to provide an index of “psychometric effectiveness,” we average across all the indices assessed here. In keeping with the original metric, the unweighted approach to the composite was selected because there are no sufficiently strong reasons to differentially weight the indices and the parsimony of a simple average facilitates interpretation. The reliability and corrected validity coefficients derived from standardization data, the corrected validity coefficient based on our sample, the ICC, and the proportion of DQs within ± 10 IQ points all fall on scales from 0 to 1 in magnitude. To integrate the difference scores with these measures, we first converted the corresponding mean differences to positive r effect-sizes (incorporating the normative SD of 15 for WAIS-IV IQ scores) to represent the absolute degrees of discrepancy on a 0 to 1 scale. Because high values on the former measures, but low discrepancy scores represent good dyad performance, we subsequently subtracted the latter r values from unity to yield a comparable metric of agreement (1 − r). Then, we calculated the mean of these six values as an overall psychometric composite score Rc for each dyad. Thus, the Rc index is a multifaceted composite measure of psychometric performance that can be interpreted on a scale from 0 (absent) to 1 (perfect) correspondence between the short form and full scale.

Results Reliability and validity data for the 45 dyadic short forms are provided in Table 1. The dyads are rank ordered with respect

Downloaded from asm.sagepub.com at CAMBRIDGE UNIV LIBRARY on August 9, 2015

444

Assessment 22(4)

Table 1.  Reliability and Validity of Dyadic WAIS-IV Short Forms for FSIQ. Dyad Coding + Information Coding + Matrix Reasoning Block Design + Digit Span Symbol Search + Vocabulary Coding + Visual Puzzles Arithmetic + Similarities Matrix Reasoning + Vocabulary Coding + Vocabulary Coding + Similarities Digit Span + Visual Puzzles Similarities + Symbol Search Arithmetic + Coding Arithmetic + Symbol Search Block Design + Vocabulary Matrix Reasoning + Symbol Search Digit Span + Vocabulary Arithmetic + Vocabulary Arithmetic + Block Design Visual Puzzles + Vocabulary Digit Span + Similarities Information + Symbol Search Block Design + Coding Digit Span + Information Symbol Search + Visual Puzzles Arithmetic + Visual Puzzles Digit Span + Matrix Reasoning Block Design + Symbol Search Similarities + Visual Puzzles Arithmetic + Matrix Reasoning Matrix Reasoning + Similarities Block Design + Similarities Arithmetic + Information Digit Span + Symbol Search Coding + Digit Span Arithmetic + Digit Span Block Design + Information Information + Matrix Reasoning Block Design + Matrix Reasoning Similarities + Vocabulary Matrix Reasoning + Visual Puzzles Information + Visual Puzzles Information + Similarities Block Design + Visual Puzzles Information + Vocabulary Coding + Symbol Search

rxx

r′std***

r′***

p(±10)

Mdiff

ICC

Rc

.922 .917 .931 .907 .909 .919 .947 .929 .904 .936 .881 .909 .887 .934 .896 .957 .943 .917 .940 .932 .903 .904 .951 .891 .922 .942 .887 .917 .928 .924 .913 .939 .907 .928 .887 .931 .943 .925 .945 .931 .937 .939 .927 .962 .900

.850 .815 .833 .845 .810 .857 .864 .851 .839 .830 .830 .833 .823 .873 .805 .852 .860 .846 .865 .846 .822 .819 .851 .780 .835 .836 .789 .848 .850 .854 .850 .839 .786 .798 .821 .854 .849 .823 .817 .809 .839 .822 .773 .810 .698

.860 .856 .856 .848 .857 .859 .867 .851 .853 .858 .840 .857 .850 .846 .846 .815 .834 .849 .852 .816 .844 .823 .842 .846 .860 .826 .821 .849 .844 .841 .829 .846 .824 .829 .788 .841 .853 .819 .777 .843 .834 .799 .817 .763 .774

.832 .842 .834 .830 .832 .838 .830 .807 .836 .811 .828 .813 .805 .790 .801 .768 .780 .797 .784 .772 .784 .797 .757 .780 .774 .763 .778 .782 .757 .768 .776 .770 .761 .749 .739 .734 .691 .712 .730 .703 .691 .720 .697 .658 .703

–0.494 –0.320 1.288*** –0.830* –0.366 2.128*** 3.527*** –1.666*** –1.679*** 1.890*** –0.798* –1.915*** –1.043** 2.884*** 0.581 0.783* 2.227*** 2.693*** 3.567*** 0.701 1.559*** –1.018** 2.845*** 0.823* 3.390*** 1.839*** 0.162 3.607*** 3.224*** 3.515*** 2.983*** 4.434*** –2.196*** –3.085*** 0.790 5.160*** 5.741*** 4.262*** 2.966*** 4.981*** 5.906*** 4.881*** 4.732*** 5.056*** –3.112***

.871 .866 .859 .867 .872 .859 .851 .860 .863 .856 .862 .865 .867 .842 .855 .816 .835 .851 .838 .818 .859 .839 .832 .863 .849 .820 .843 .835 .833 .827 .825 .821 .825 .817 .805 .807 .803 .797 .764 .806 .785 .765 .789 .730 .775

.887 .881 .878 .878 .878 .877 .874 .874 .873 .871 .869 .869 .866 .865 .864 .864 .863 .862 .860 .860 .860 .858 .856 .855 .854 .854 .852 .852 .851 .850 .849 .845 .838 .836 .836 .833 .825 .823 .822 .821 .816 .814 .808 .793 .791

Note. FSIQ = full-scale intelligence quotient; rxx = composite reliability coefficients of dyadic short forms (Moiser, 1943; based on WAIS-IV standardization data); r′std = Pearson product–moment correlations between dyad deviation quotients (DQs) and FSIQs corrected for redundant error variance (Girard & Christensen, 2008); r′ = corrected validity coefficients based on neuropsychological sample data; p(±10) = proportion of sample for whom dyad DQs fall within ±10 points of their respective FSIQs; Mdiff = mean difference scores between the dyad DQs and FSIQ scores; ICC = two-way intraclass correlation coefficient modeling absolute agreement; Rc = composite measure of psychometric performance averaged across the six preceding measures (see text for details). Although consideration of individual indices is recommended, dyads are displayed in descending order by Rc with subtests in each pair labeled alphabetically. Boldfaced values highlight the top five dyads according to each psychometric measure. *p < .05. **p < .01. ***p < .001.

Downloaded from asm.sagepub.com at CAMBRIDGE UNIV LIBRARY on August 9, 2015

445

Girard et al. to composite Rc values. The metrics for each reliability and validity measure are also provided for each dyad. In the interest of parsimony, we will highlight the summary statistics for the top few dyads according to the Rc score followed by those for reliability and each validity measure. Psychometric composite values ranged from Rc = .79 for the combination of Coding + Symbol Search to Rc = .89 for Coding + Information (M = 0.85). Consistent with the former Processing Speed dyad, all eight dyads composed of single-index domains fell in the bottom quarter overall. The Working Memory dyad of Arithmetic + Digit Span ranked highest among this subset, Rc = .84. Although the Coding + Information dyad failed to top the list for any individual measure, it ranked highest overall on the Rc score and in the top five for the sample-derived validity coefficient (r′, second), proportion of scores falling with ±10 points of FSIQ (fifth), mean discrepancy from FSIQ (fourth; being one of only seven dyads with a nonsignificant discrepancy), and ICC (second). It fared slightly less well in terms of its validity coefficient (r′std, 11th) and particularly its reliability (25th) derived from the standardization data; nonetheless, it exceeded criteria used previously for acceptable reliability (>.90) and validity (>.82; e.g., Christensen et al., 2007; Girard, Axelrod, et al., 2010). With respect to the coefficients derived from the standardization data, three dyads exceeded reliability of rxx = .95: Information + Vocabulary and each of these two paired with Digit Span. Only six dyads fell below rxx = .90. As noted above, although the top Rc dyad Coding + Information fared lower in terms of reliability than on other measures, it still met the latter acceptable level (rxx = .92). Because of its lower subtest reliability, all nine dyads including Symbol Search fell among the 12 least reliable dyads, albeit the minimum rxx = .88. Although Information + Vocabulary was the most reliable combination (rxx = .96), this single-domain VerbalComprehension dyad ranked in the bottom 10 across the five validity measures and next to last overall, Rc = .79. As in Sattler and Ryan (2009), the dyads of Vocabulary with Block Design and with Visual Puzzles yielded the highest r′std (both .87). However, these dyads ranked 14th and 19th on our composite Rc measure, respectively. They scored lowest in terms of their mean discrepancy scores, tending to overestimate FSIQ on average by 2.88 and 3.57 IQ points, respectively. The highest validity coefficient based on the sample data was for Matrix Reasoning + Vocabulary (r′ = .87). This dyad also ranked 7th overall (Rc = .87), as reflected in terms of its high reliability (4th; r′xx = .95) and validity based on the standardization data (3rd; r′std = .86), the proportion of scores within 10 points of FSIQ (8th), and ICC (17th). However, it fell 12th from last in terms of its discrepancy score, overestimating FSIQ by 3.53 points on average. Coding + Matrix Reasoning was the dyad with the highest proportion of scores falling within 10 points of FSIQ (.84). This dyad ranked second overall (Rc = .88) and in

terms of its mean discrepancy score, underestimating FSIQ by less than one third of an IQ point. It ranked 5th for ICC (.87) and 9th in terms of its sample validity coefficient (r′ = .86), but substantially lower in terms of the coefficients based on standardization data (35th, r′std = .82; 28th, rxx = .92). As expected, rates of inclusion were lower for the narrower bandwidths of ±2 (.15-.24) and ±5 (.36-.53). However, the correlation between Rc scores reported in Table 1 based on all six psychometric indices with the five after omitting the proportion of scores falling within the ±10 point interval was near perfect, r = .989. Moreover, inspection of changes in rank order of dyads by Rc scores revealed minimal differences; the mean shift was −0.02 places. In terms of magnitude, one quarter of the dyads yielded discrepancies of less than one IQ point, but a handful deviated by 5 or more IQ points. Notably, all the top five dyads on this discrepancy measure included one of the Processing Speed tasks (Coding or Symbol Search). On average, the Block Design + Symbol Search dyad only slightly overestimated FSIQ by less than one sixth of an IQ point, ranking it best in terms of mean discrepancy. However, it ranked only 27th overall (Rc = .85), 23rd in terms of the proportion of scores within 10 points of FSIQ, and among the bottom 10 among the validity and reliability coefficients. In general, dyads tended to overestimate FSIQ (mean of mean differences = +1.70) and ranged from −3.11 (underestimating) to +5.91 (overestimating) IQ points. Even with the large sample size, seven dyads, including the Block Design + Symbol Search, Coding + Information, and Coding + Matrix Reasoning dyads discussed above, revealed nonsignificant discrepancies at alpha level of .05, four more at .01, and two more at .001. The rest were highly significant at p < .001 (ranging down to p = 5 × 10−49). Coding + Visual Puzzles yielded the highest ICC (.87). This dyad also ranked in the top five overall (Rc = .88), third in terms of mean discrepancy, and sixth with respect to both the sample validity coefficient and proportion of scores within 10 points of FSIQ. However, it ranked in the bottom third for validity and reliability based on standardization data.

Discussion We report the validity and reliability coefficients of 45 dyadic short forms of the WAIS-IV based on standardization data as well as their multifaceted validity in a large clinical sample. More specifically, we assessed correlations and magnitudes of discrepancies between dyads and FSIQ, as well as the proportion of dyad DQs falling within ±10 points of participants’ respective FSIQ scores. Although assessment of individual measures is encouraged in accordance with one’s goals, we further present a composite measure of psychometric performance (Rc) based on the approach of Cyr and Brooker (1984) to aid interpretation across the multiple indices.

Downloaded from asm.sagepub.com at CAMBRIDGE UNIV LIBRARY on August 9, 2015

446

Assessment 22(4)

Readers are encouraged to consider the individual measures relative to their goals for short-form use when evaluating or selecting a dyad. In this vein, we note the heterogeneity in psychometric performance across measures. There is no dyad that consistently ranked among the five highest values for each measure (see boldfaced values in Table 1). One source of variance is the discrepancy between validity coefficients in our sample and those derived from the standardization data, highlighting the importance of evaluating short-form validity across relevant samples. In addition, the variability in performance across sample validity measures highlights the value of considering multiple forms of validity when assessing short forms. Future research may yield further insight regarding the reasons for these discrepancies. Despite variability across measures, the results do also yield relative consistencies in performance supporting superior performance of some dyads. Overall, the dyad comprising Coding + Information ranked highest on the composite Rc measure and was among the top five dyads across all four sample validity measures. Consistent with Sattler and Ryan (2009), this dyad failed to rank in the top 10 according to the scores derived from the standardization data. Nonetheless, the latter authors did recommend it as a good dyad for rapid screening. Moreover, the standardization-based reliability and validity coefficients exceed recommended thresholds for acceptable reliability (rxx > .90) and validity (r′std > .82) in the WIS literature (e.g., Christensen et al., 2007; Girard, Axelrod, et al., 2010). Thus, the current results further support this dyad for obtaining a quick gauge of intellectual functioning. The dyads of Block Design + Vocabulary and Visual Puzzles + Vocabulary have the highest r′std = .87, consistent with their place atop Sattler and Ryan’s (2009) ranking by validity. However, these dyads ranked lower on our additional and sample-based measures, particularly in their mean overestimation of FSIQ. Likewise, although Information + Vocabulary is the most reliable combination based on standardization data, it ranked in the bottom 10 in terms of validity and next to last overall (Rc = .79). Coding + Matrix Reasoning ranked second overall with strong sample validity, including the highest rate of inclusion for DQs within 10 points of FSIQ, and acceptable levels for the standardization data. Of note, this dyad may be preferred for use with hearing impaired samples (Sattler & Ryan, 2009). Other dyads recommended by Sattler and Ryan (2009) for this purpose ranked far lower overall (Arithmetic + Matrix Reasoning, 29th; Block Design + Matrix Reasoning, 38th). The Matrix Reasoning + Vocabulary dyad proved strong with respect to its reliability and validity coefficients across samples and ranked seventh on our list overall. However, further highlighting the importance of assessing multiple indices, it tended to overestimate FSIQ in the clinical

sample by more than 3.5 points, on average. Very similar results were obtained by Axelrod (2002) with this combination for estimating FSIQ on the WAIS-III. Use of these tasks from the two-subtest WASI yielded not only less discrepancy but also a poorer validity coefficient (Axelrod, 2002). Similar empirical comparisons between WAIS-IV short forms and the WASI-II will be important. This dyad is also integrated in the two-subtest Oklahoma Premorbid Intelligence Estimate–3 (OPIE-3) equation along with demographic variables (Schoenberg, Duff, Scott, & Adams, 2002), which was also found to overestimate FSIQ on average (Spinks et al., 2009). Notably, we found that five dyads including Processing Speed measures yielded the smallest mean discrepancies in our clinical sample. These findings should also be taken into consideration in the development of future versions of the WASI and OPIE tests. In addition to psychometric attributes, it is important to consider clinical factors and suitable applications. Dyad short forms are most appropriate when the research or referral question is aimed at estimating global intelligence (FSIQ). That is, these short-form estimates are insufficient grounds for key verdicts regarding individual clients, such as placement decisions (Sattler & Ryan, 2009; Silverstein, 1990; Thompson et al., 1986). For instance, the top dyads yielded inclusion rates of 84% of the short-form estimates falling with ±10 points of FSIQ scores in our clinical sample. Slightly better, Sattler and Ryan (2009) reported 95% inclusion rates with bandwidths of 7 to 12 IQ points, based on WAIS-IV standardization data, for their recommended dyads. Further assessment of bandwidths of ±2 and 5 reflected maxima of only a quarter and a half of cases falling within roughly 1 and 2 SEM, respectively. Overall, these rates and ranges warrant against making important individual-level interpretations based on dyad FSIQ Equivalents. On the other hand, most dyads present adequate reliability, validity coefficients, and mean discrepancies to support their use for research or for clinical screening or monitoring purposes (Sattler & Ryan, 2009). Moreover, the drop in validity for these purposes is weighed against the substantial time-savings of administering only 2 of the 10 core WAIS-IV subtests. Nonetheless, it will be useful to directly assess administration times for specific short forms in both clinical and nonclinical samples. Although such goals may be achieved with the use of independent brief tests of intelligence, the list of dyads assessed here offers users more flexibility in choice and the advantage of being able to simply follow up a quick screen with full administration of the remaining WAIS-IV if more comprehensive intelligence assessment is needed. Despite the variation across validity measures, most dyads rank within a reasonably stable range of performance. For instance, the top 10 dyads on the composite measure range from Rc = .87 to .89. Thus, users might take into account clinical considerations, such as the referral question or client/sample characteristics when selecting a short form.

Downloaded from asm.sagepub.com at CAMBRIDGE UNIV LIBRARY on August 9, 2015

447

Girard et al. For example, as noted above, Coding + Matrix Reasoning is recommended for hearing-impaired samples. Use of Matrix Reasoning versus Information along with Coding (i.e., top two dyads in Table 1) may also depend on whether fluid or crystallized intelligence is of more interest, respectively. Likewise, whereas Information and Vocabulary assess more concrete knowledge, Similarities is desired for assessment of abstract verbal reasoning and may prove more sensitive to nuances of psychosis (Christensen et al., 2007; Crawford et al., 2008; Donders, 1997; Girard, Axelrod, et al., ), although this requires empirical assessment. The dyads involving Similarities with Arithmetic and with Coding ranked in the top 10 overall (sixth and ninth; Table 1), but over- and underestimated FSIQ by about two points in our sample, respectively. Nonetheless, although statistically significant deviations, the magnitude of these mean discrepancies fall within the normative standard error of measurement for FSIQ. It is also important to consider the sample’s characteristics with respect to the generalization of the current results. We report here on a large neuropsychological sample for whom the FSIQ scores were normally distributed with a full range of scores. The sample comprised predominately males of Caucasian and African American descent with a mean low average FSIQ of 89. Some apparent discrepancies across indices may reflect differential sensitivity of some dyads to the samples or measures used, an observation that should be taken into account regarding generalizability and the need for future research in this regard. For instance, discrepancies between r′std and sample r′ values likely relate at least in part to sample differences between the standardization data set and the clinical sample. It is notable in this regard that all dyads including at least one Processing Speed task (Coding or Symbol Search; see Table 1) yielded numerically higher validity coefficients for the clinical sample than for the standardization data (except for Coding + Vocabulary, for which they were equal); performance across sample statistics was more balanced across other domains. This observation may relate to the particular sensitivity of Processing Speed measures to clinical conditions (e.g., Christensen et al., 2007; Gorlyn et al., 2006; Hawkins, 1998; Taylor & Heaton, 2001). It is notable that the lower reliabilities of dyads including Coding and/or Symbol Search are likely partly because of reliance on test–retest correlations for these subtests (Wechsler, 2008). As reviewed by Sattler and Ryan (2009), these assessments were based on a smaller subsample and among other subtests test–retest values are consistently lower than their internal consistency counterparts. Nonetheless, the enhanced sensitivity noted above cannot be explained by this method difference, as less reliable measures typically yield less sensitive tests and because the same reliability values were used for validity estimates in both samples. In future, it will be valuable to have larger assessments of reliability on a common metric, as well as to directly estimate reliabilities in

clinical samples. In contrast, some dyads were closely matched; for example, Arithmetic + Similarities and Matrix Reasoning + Vocabulary ranked among the highest five validity coefficients on both data sets and differed only at the third decimal place. Nonetheless, cross-validation studies on other representative samples will be useful in future. Moreover, it will be important to investigate short-form performance at different ability levels, particularly the tails of the distribution (Spinks et al., 2009). Future work should also reassess the dyads’ psychometric properties from their administration independently of the full-scale WAIS-IV. The statistical correction for redundant error variance (r′) estimates the validity coefficient “as if” obtained from separate administrations (Girard & Christensen, 2008). However, systematic sources of variance (e.g., influence of other subtests) and clinical considerations (e.g., motivation/attention) may differentially affect full-score and short-form performance. For instance, short forms may yield higher subtest scores and more reliable performance (Levy, 1968; Wymer, Rayls, & Wagner, 2003), but with reduced validity coefficients with FSIQ when isolated than when embedded in the full battery, particularly for dyads (Thompson et al., 1986). However, Axelrod (2002) failed to observe any significant order effects regarding the administration of the two- (or four-) subtest WASI and the full WAIS-III, suggesting that at least some measures are robust to potential influences of test administration time. These issues deserve further empirical attention with the WAIS-IV. Declaration of Conflicting Interests The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding The authors received no financial support for the research, authorship, and/or publication of this article.

References Axelrod, B. N. (2002). Validity of the Wechsler Abbreviate Scale of Intelligence and other very short forms of estimating intellectual functioning. Assessment, 9, 17-23. Bland, J. M., & Altman, D. G. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, 1, 307-310. Boone, D. E. (1990). Short forms of the WAIS-R with psychiatric inpatients: A comparison of techniques. Journal of Clinical Psychology, 46, 197-200. Christensen, B. K., Girard, T. A., & Bagby, R. M. (2007). WAISIII short form for Index and IQ scores in a psychiatric population. Psychological Assessment, 19, 236-240. Crawford, J. R., Allum, S., & Kinion, J. E. (2008). An index-based short form of the WAIS-III with accompanying analysis of reliability and abnormality of differences. British Journal of Clinical Psychology, 47, 215-237.

Downloaded from asm.sagepub.com at CAMBRIDGE UNIV LIBRARY on August 9, 2015

448

Assessment 22(4)

Crawford, J. R., Anderson, V., Rankin, P., & MacDonald, J. (2010). An index-based short-form of the WISC-IV with accompanying analysis of the reliability and abnormality of differences. British Journal of Clinical Psychology, 49, 235-258. Cyr, J. J., & Brooker, B. H. (1984). Use of appropriate formulas for selecting WAIS-R short forms. Journal of Consulting and Clinical Psychology, 52, 903-905. Donders, J. (1997). A short form of the WISC-III for clinical use. Psychological Assessment, 9, 15-20. Eisman, E. J., Dies, R. R., Finn, S. E., Eyde, L. D., Kay, G. G., Kubiszyn, T., . . .Moreland, K. L. (2000). Problems and limitations in the use of psychological assessment in the contemporary health care delivery system. Professional Psychology: Research and Practice, 31, 131-140. Girard, T. A., Axelrod, B. N., & Wilkins, L. (2010). Comparison of WAIS-III short-forms for measuring index and full-scale scores. Assessment, 17, 400-405. Girard, T. A., & Christensen, B. K. (2008). Clarifying problems and offering solutions for correlated error when assessing the validity of selected-subtest short forms. Psychological Assessment, 20, 76-80. Girard, T. A., Christensen, B. K., & Rizvi, S. (2010). Visualspatial episodic memory in schizophrenia: A multiple systems framework. Neuropsychology, 24, 368-378. Gorlyn, M., Keilp, J. G., Oquendo, M. A., Burke, A. K., Sackeim, H. A., & Mann, J. J. (2006). The WAIS-III and Major Depression: Absence of VIQ/PIQ Differences. Journal of Clinical and Experimental Neuropsychology, 28, 1145-1157. Hawkins, K. A. (1998). Indicators of brain dysfunction derived from graphic representations of the WAIS-III/WMS-III technical manual clinical samples data: A preliminary approach to clinical utility. The Clinical Neuropsychologist, 12, 535-551. Kaufman, A. S. (1990). Assessing adolescent and adult intelligence. Needham Heights, MA: Allyn & Bacon. Levy, P. (1967). The correction for spurious correlation in the evaluation of short-form tests. Journal of Clinical Psychology, 23, 84-86. Levy, P. (1968). Short-form tests: A methodological review. Psychological Bulletin, 69, 410-416. McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1, 30-46. Merchan-Naranjo, J., Mayoral, M., Rapado-Castro, M., Llorente, C., Boada, L., Arango, C., & Parellada, M. (2012). Estimation of the intelligence quotient using Wechsler Intelligence Scales in children and adolescents with Asperger Syndrome. Journal of Autism and Developmental Disorders, 42, 116-122. Moiser, C. I. (1943). On the reliability of a weighted composite. Psychometrika, 8, 161-168. Piotrowski, C. (1999). Assessment practices in the era of managed care: Current status and future directions. Journal of Clinical Psychology, 55, 787-796. Rabin, L. A., Barr, W. B., & Burton, L. A. (2005). Assessment practices of clinical neuropsychologists in the United States and Canada: A survey of INS, NAN, and APA Division 40 members. Archives of Clinical Neuropsychology, 20, 33-65.

Sattler, J. M., & Ryan, J. J. (1998). Assessment of children: Revised and updated third edition. WAIS-III supplement. San Diego, CA: Sattler. Sattler, J. M., & Ryan, J. J. (2009). Assessment with the WAIS-IV. La Mesa, CA: Sattler. Schoenberg, M. R., Duff, K., Scott, J. G., & Adams, R. L. (2002). An evaluation of the clinical utility of the OPIE-3 as an estimate of premorbid WAIS-III FSIQ. Clinical Neuropsychologist, 17, 308-321. Silverstein, A. B. (1982). Two- and four-subtest short forms of the Wechsler Adult Intelligence Scale–Revised. Journal of Consulting and Clinical Psychology, 50, 415-418. Silverstein, A. B. (1990). Short forms of individual intelligence tests. Psychological Assessment, 2, 3-11. Spinks, R., McKirgan, L. W., Arndt, S., Caspers, K., Yucuis, R., & Pfalzgraf, C. J. (2009). IQ estimate smackdown: Comparing IQ proxy measures to the WAIS-III. Journal of the International Neuropsychological Society, 15, 590-596. Taylor, M. J., & Heaton, R. K. (2001). Sensitivity and specificity of WAIS-III/WMS-III demographically corrected factor scores in neuropsychological assessment. Journal of the International Neuropsychological Society, 7, 867-874. Tellegen, A., & Briggs, P. F. (1967). Old wine in new skins: Grouping Wechsler subtests into new scales. Journal of Consulting Psychology, 31, 499-506. Thompson, A. P., Howard, D., & Anderson, J. (1986). Two- and four-subtest short forms of the WAIS-R: Validity in a psychiatric sample. Canadian Journal of Behavioral Science, 18, 287-293. Thompson, A. P., LoBello, S. G., Atkinson, L., Chisholm, V., & Ryan, J. J. (2004). Brief intelligence testing in Australia, Canada, the United Kingdom, and the United States. Professional Psychology: Research and Practice, 35, 286-290. Umfleet, L. G., Ryan, J. J., Gontkovsky, S. T., & Morris, J. (2012). Estimating WAIS-IV Indexes: Proration versus linear scaling in a clinical sample. Journal of Clinical Psychology, 68, 390-396. Watkins, C. E., Jr., Campbell, V. L., Nieberding, R., & Hallmark, R. (1995). Contemporary practice of psychological assessment by clinical psychologists. Professional Psychology: Research & Practice, 26, 54-60. Wechsler, D. (1997). WAIS-III: Wechsler Adult Intelligence Scale–third edition administration and scoring manual. San Antonio, TX: Psychological Corporation. Wechsler, D. (1999). Wechsler Abbreviated Scale of Intelligence (WASI). San Antonio, TX: Psychological Corporation. Wechsler, D. (2008). Wechsler Adult Intelligence Scale–fourth edition (WAIS-IV). San Antonio, TX: Psychological Corpora-tion. Wechsler, D. (2011). Wechsler Abbreviated Scale of Intelligence– second edition (WASI-II). San Antonio, TX: Psychological Corporation. Wolber, G. J., Reynolds, B., Ehrmantraut, J. E., & Nelson, A. J. (1997). In search of a measure of intellectual functioning for an inpatient psychiatric population with low cognitive ability. Psychiatric Rehabilitation Journal, 21, 59-63. Wymer, J. H., Rayls, K., & Wagner, M. T. (2003). Utility of a clinically derived abbreviated form of the WAIS-III. Archives of Clinical Neuropsychology, 18, 917-927.

Downloaded from asm.sagepub.com at CAMBRIDGE UNIV LIBRARY on August 9, 2015

Wechsler Adult Intelligence Scale-IV Dyads for Estimating Global Intelligence.

All possible two-subtest combinations of the core Wechsler Adult Intelligence Scale-IV (WAIS-IV) subtests were evaluated as possible viable short form...
290KB Sizes 0 Downloads 4 Views