Neuropsychology Review, VoL 2, No. 2, 1991

The Wechsler Memory Scale--Revised: Psychometric Characteristics and Clinical Application R i c h a r d W. Eiwood I

The psychometric characteristics of the Wechsler Memory Scale--Revised (WMS-R) are evaluated and related to its clinical utility. The accuracy of the scale scores is shown to be limited by their high standard error, low reliabilities, and consequent large standard errors of measurement. Specific procedures are discussed for establishing confidence intervals and for testing the significance of differences between scores. It is concluded that the WMS-R, like the original Wechsler Memory Scale, provides only a rough estimate of overall memory functioning. The multidimensional index scores have not been shown effective in describing the nature or the pattern of memory deficits. Recommendations for the clinical use of the WMS-R are provided. KEY WORDS: Wechsler Memory Scale--Revised; Wechsler scales; neuropsycholog~cal tests; memory.

INTRODUCTION The Wechsler Memory Scale (WMS) has long been used in the clinical assessment of memory functions, despite its conspicuous psychometric deficiencies (Lezak, 1983; Prigatano, 1977, 1978). Wechsler (1987) obviously made a concerted effort to overcome those deficiencies in the Wechsler Memory Scale--Revised (WMS-R) and early reviews of the WMS-R considered it a vast improvement over the original WMS (Franzen, 1989; Loring, 1989; Powel, 1988). Since its generally favorable reception, enough studies have been conducted on the WMS-R to now warrant its critical 1Psychology Service (116B), Veterans Administration Medical Center, Tomah, Wisconsin 54660. 179 i040-7308/91/0600-0179506.50/0© 1991 PlenumPublishingCorporation

Elwood

180

reappraisal. The present review is intended more as an update than an introduction to the WMS-R and thus does not duplicate the thorough descriptions provided in previous reviews. Moreover, the review endeavors to relate the scale's psychometric characteristics directly to its clinical application. Following the standard convention, references to the WMS-R will cite Wechsler (1987), recognizing that much of the scale's development was carried out after his death in 1981.

STANDARDIZATION Test norms are only as good as the standardization sample on which they are based. Accordingly, it seems appropriate to begin the evaluation of the WMS-R norms by first considering the standardization sample and the sampling procedures used to construct it. The sample consists of 316 subjects between the ages of 16 and 74, divided into six age groups, each with equivalent quota for gender, race, and geographic variables. At first glance, the sample appears to be a straightforward case of stratified random sampling with age as the primary classification variable. However, a closer inspection of the sampling procedures reveals that they fail to meet basic conditions required for age stratification. Stratified sampling involves assigning every element in the target population to one of the strata and then drawing an independent random sample from each stratum (Jaeger, 1984; Snedecor and Cochran, 1980). In other words, the combined strata must cover the entire target population. However, while the WMS-R target population includes the ages of 16-74, the standardization sample systematically excluded subjects in the ages of 18-19, 25-34, and 45-54 using interpolated values in their place. Actually, the combined strata account for only about 65% of the target population age range. The age ranges vary considerably, spanning between 2 and 10 years. Wechsler (1987) confuses the sampling method by referring to the "norms stratified at nine age levels" (p. 2) and the "stratification of the sample" (p. 43), but then, quite rightly, treats the normative sample as six independent random samples. The standard deviations of the average composite index scores (p. 55) were calculated by simply pooling the standard deviations of the six age groups. That method is appropriate for simple random sampling and ignores the effects of stratification altogether. The WMS-R suffers a substantial loss by forfeiting the advantages of stratified sampling. Test norms are simply estimates of population means and the accuracy of those estimates is measured by their standard error. Stratified sampling results in a much lower standard error than simple random sampling because it is based only on the variance within each relatively

The Wechsler Memory Scale--Revised

181

homogeneous stratum rather than across the entire sample (Jaeger, 1984; Snedecor and Cochran, 1980). Stratified sampling is thus a much more efficient means of allocating sampling elements. That is, it provides more precise estimates of population means for a given sample size and permits a smaller sample to achieve a given level of precision. Generally, the advantage in sampling efficiency gained by stratified sampling is a function of the association between the sampling variables and the classification variable (Jaeger, 1984; Snedecor and Cochran, 1980). Since the primary WMSR classification variable (age) is highly related to the sampling variables (subtest scores), stratified sampling has an enormous advantage not only for the WMS-R but for any scale using age-related memory tasks. Sampling efficiency directly effects the determination of required sample size. In absolute terms, a sample of only 316 seems hardly adequate for the standardization sample, given the broad age range of its target population. In relative terms, the comparison between the WSM-R sample and the 2000 subjects used for the WAtS-R (Wechsler, 1981) is striking, particularly since the WAIS-R employed true stratified sampling over a comparable age range. The manual justifies the small sample by proposing that "fifty cases are considered sufficient to provide stable estimates of the population mean" (Wechsler, 1987, p. 43). This seems a curious oversimplification, since required sample sizes are not absolute but depend upon various factors, such as the allowable sampling error and the reliability of the test or scale being used. The manual seems to confuse required sample size in a generic sense with the fact that 30-40 cases are usually considered sufficient to allow the use of normal probabilities to estimate population parameters. Moreover, that rule addresses only how estimates are made from the sample, not whether the sample is an adequate basis for a clinical scale. The simplicity of simple random sampling in standardizing the W M S - - R results in large sampling errors. The sampling errors, in turn, pose a severe penalty in large standard errors of measurement and ultimately in the scale's clinical utility. Loring (1989) criticizes the use of interpolated norms quite apart from the limitations they impose on sampling procedures. He argues that the interpolation rests on the unwarranted assumption of linear agewise declines in scale scores. For example, he points out that the increments in mean composite scores between adjacent age groups vary widely, whereas linear agewise declines would predict similar differences between successive age groups. Loring argues that the missing 45-54 age group is particularly important since it is at this age range that so many patients first present with suspected memory deficits. While agewise declines in memory task performances are well known, the precise gradients of those declines have not been established. However, unpublished norms have since been pre-

Elwood

182

sented for 25-34-year-old subjects (Mittenberg et al., (1991) that are reported to differ slightly, though significantly, from the interpolated values found in the WMS-R manual. D'Elia et al. (1989) take a more extreme view and actually discourage using the revised memory scale altogether with subjects in the excluded ages groups. The authors recommend that the original Scale be used instead, along with one of several updated norms, depending on the clinical population. Wechsler (1987) administered the full WAIS-R to subjects in the 3544 and 65-69 age groups. Full-scale IQ scores for the remaining subjects were estimated from a composite of Vocabulary, Arithmetic, Picture Completion, and Block Design subtests. This particular combination of WAIS-R subtests has not been studied before, though it is similar to the V-A-BD-PA (Silverstein, 1982) and I-A-PC-BD (Reynolds et al., 1983) WAIS-R shortform tetrads that have been fairly well validated (Kaufman, 1990). Wechsler (1987) notes the mean full-scale IQ of 103.9 in the normative sample is higher than ideal but suggests the difference is not clinically significant. However, as Kaufman (1990) points out, the high average IQ in the normative sample could easily have ben corrected by simple score transformations.

RELIABILITY The reliability of psychological tests sets the upper limit on their validity (Cronbach, 1970) and is thus essential to their clinical utility. Yet most reviews of the WMS-R have mentioned reliability only briefly and without critical comment (Franzen, 1989; Loring, 1989; Powel, 1988). To date, Kaufman alone has raised serious issue over what he regards as the scale's "disappointing" reliability (Kaufman, 1990, p. 590). Few of the WMS-R subtests or even composite indexes meet even minimal standards for reliability. The reliability coefficients averaged across age range from 0.41 to 0.88 (shown in Table I) with mean of 0.61 (Wechsler, 1987). Various standards for reliability have been proposed, but there is a general consensus that coefficients in the 0.70-0.80 range (Golden et al., 1984) or 0.80 and above (Anastasi, 1988; Cunningham, 1986) are adequate for individual subtest scores. Thus, by current standards, only Digit Span and Visual Memory subtests meet both criteria, while only Logical Memory I and II fulfill the more lenient standard. N o n e of the remaining subtests meet even the most liberal criteria for reliability. The manual acknowledges the "low" reliability of "several" subtests and advises users to "exercise appropriate caution" in their interpretation (Wechsler, 1987, p. 59). The WMS-R index score reliability coefficients (shown in Table I) range from only 0.70 to

The Wechsler Memory Scale--Revised

183

Table I. Mean WMS-R Reliability Coefficients (rx) Internal consistencyb Subtest

Stability~

Mental Control FiguraI Memory Logical Memory I Visual Paired Associates I Verbal Paired Associates I Visual Reproduction I Digit Span Visual Memory Span Logical Memory II Visual Paired Associates II Verbal Paired Associates II Visual Reproduction II

0.51 0.43 0.71 0.58 0.60 0.71 0.77 0.75 0.75 0.58 0.4I 0.69

a

Split-half

0.44 0.74

0.59 0.88 0.81 0.75

0,46

Index General Memory Attention/Concentration Verbal M e m o r y Visual M e m o r y

Delayed Recall

aAge groups 20-24, bN = 316.

0.81 0.90 0.77 0.70 0.77

55-64, and 70-74; N = 151.

0.90. Cunningham (1986) recommends that the more rigorous reliability standard of 0.90 be applied to composite or summary scores such as these that are interpreted on their own. By that criterion, only the attention-concentration index would be considered reliable. Even if judged by the more liberal standard of 0.80, only two of the five indexes, general memory and attention-concentration, would be considered reliable. The low reliability of the WMS-R becomes more obvious by comparing it to the familiar WAIS-R. The average WMS-R subtest reliability of 0.61 falls far short of the 0.83 average found for the WAIS-R subtests (Wechsler, 1981). The General Memory Index reliability of 0.81 contrasts sharply with the 0.97 of the analogous full scale IQ on the WAIS-R (Wechsler, 1981). We may recall that the reliability of any scale is a function of its length; simply increasing the number of items on a scale can often compensate for otherwise inadequate reliability (Anastasi, 1988; Cronbach, 1970). The WMSR compounds the low reliability by restricting the range of scores on so many of its subtests. In this regard, Powel (1988) criticizes the WMS-R for not dividing the Delayed Recall index into separate verbal and visual

Elwood

184

scores. However, the poor reliability of the existing Delayed Recall index would be reduced even more by any further partitioning (Kaufman, 1990).

THE STANDARD ERROR OF MEASUREMENT

The standard error of measurement (SEre) of a test or scale determines its accuracy under actual clinical conditions. SEm is a joint function of the sampling error and reliability and given the large sampling errors and low reliabilities of the WMS-R, its standard errors of measurement are predictably quite large. Such high measurement error poses two major clinical problems in for the WMS-R (a) determining the accuracy of actual scores and (b) testing the significance of differences between those scores.

Confidence Intervals

There is now a wide consensus that, because of their inherent measurement error, IQ scores should be described in terms of ranges of scores rather than by their absolute values (Kaufman, 1990; Matarazzo, 1990). Such ranges, or confidence intervals, are intended to define the band of error around a subject's true score. Accordingly, various tables of IQ confidence intervals have been prepared both for the WAIS-R (Naglieri, 1982; Wechsler, 1981) and WISC-R (Silverstein, 1989). The routine clinical use of confidence intervals is also supported by the Ethical Principles of Psychologists (American Psychological Association, 1981), which hold that "in reporting assessment results, psychologists indicate any reservations that exist regarding validity or reliability" (p. 637). Moreover, the Diagnostic and Statistical Manual (DSM-III-R; American Psychiatric Association, 1987) explicitly recognizes confidence limits in its use of IQ score error bands to define the levels of mental retardation. Most major intellectual and achievement tests now incorporate confidence intervals in their scoring procedures, such as the Kaufman Assessment Battery for Children, Peabody Individual Achievement Test--Revised, Vineland Adaptive Behavior Scales, and the new Kaufman Brief Intelligence Test. By contrast, the WMS-R manual briefly mentions the standard error of measurement, but does not describe the calculation or recording of confidence intervals. The conventional method of establishing confidence intervals is to simply add and subtract a specified multiple of the standard error of measurement to the score a subject obtains on a test (Glutting et al., 1987; Silverstein, 1989)

The Wechsler Memory Scale --Revised

185

X _+SEm(X)Z where X is the obtained test score, SEm(X) is the standard error of measurement of the obtained score, and Z is the standard score that corresponds to the desired confidence level (e.g., Z = 1.00 for a 68% confidence interval, 1.96 for 95%). The standard errors of measurement for observed index scores in the WMS-R manual (Wechsler, 1987, p. 63) reflect values of SEre(X), calculated by

SEm(X ) = s where S is the standard error (i.e., the standard deviation of the standardization sample) and r is the reliability coefficient. Reliabilities of the WMS-R Mental Control and the four Paired Associate subtests represent test-retest correlations; reliabilities of the remaining seven subtests are based on measures of internal consistency. Applied to the WAIS-R, we note that the average WAIS-R full scale IQ SEre(X) is 2.51 (Wechsler, 1981). By arbitrarily setting a confidence level at 95%, the error band around obtained IQ scores would be equal to -+(1.96 x 2.51) = 4.92 or = 5. Thus, if a subject obtained a full scale IQ score of 100, we would report that within a 95% confidence level their true IQ score is within the 95-105 range. Because of its greater standard errors, the argument for using confidence intervals with WMS-R is even more compelling than for the WAISR. For example, given the General Memory Index SEre(X) o f 6.81 (Wechsler, 1987) and again setting the confidence level at an arbitrary 95%, the error band would equal _+14. Thus, if a subject obtained a general memory index score of 90, we could say that, within a 95% probability, the subject's true general memory score is between 76 and 104, or in other words, broadly within the borderline to average ranges. While the use of confidence intervals is certainly warranted for the WMS-R, they are often misinterpreted. Conventional confidence intervals have been mistakenly described (Franzen, 1989; Naglieri, 1982) as the range of true scores that is expected from a given obtained score. Actually, they provide just the opposite: the range of actual scores that is expected from a given true score (Dudek, 1979; Knight, 1983; Silverstein, 1989). Of course, it is precisely the unknown true score that is of the greatest interest. The expected value of the true score is not the score a subject actually obtains on a test but is an estimate that is regressed toward the test mean (except at the test mean, where the two scores are equal). In order to establish a confidence interval around a subject's estimated true score, we must first estimate the true score for each obtained score. This fact atone makes tables of true score confidence intervals impractical for routine clinJ-

186

Elwood

cal use. Fortunately, estimates of true scores can easily be calculated by (Glutting et al., 1987; Silverstein, 1989; Stanley, 1971) - rxX + (1 - rx)M A

where T is the subject's estimated true score, r x is the reliability coefficient of the respective test, X is the obtained score, and M is the mean score of the test or subtest (i.e., 100 for each of the indexes). This equation shows that the difference between the two true and observed scores becomes greater (a) as the observed score deviates more from the test mean, and (b) as the reliability of the test decreases. Both conditions are particularly relevant to the WMS-R since the scores of most clinical interest are precisely those furthest from the mean, and because the subtest reliabilities are so poor. While various standard errors have been proposed for error bands around true score estimates (Lord and Novick, 1968; Nunnally, 1967), the standard error of measurement for estimated true scores is generally accepted as the most appropriate (Glutting et al., 1987; Silverstein, 1989; Stanley, 1971). Fortunately, this calculation too is quite straightforward A

SEre(T) = r~SE~(X) A

where SEre(T) is the standard error of measurement of estimated true scores, rx is the reliability coefficient, and SEm(X ), as before, is the standard error of measurement of the obtained scores. Since the standard error of measurement for estimated true scores is smaller than that for obtained scores, the resultant intervals are actually narrower than those derived from the conventional method described earlier. Kaufman (1990) acknowledges that the conventional method of setting confidence intervals is imprecise but considers it adequate for clinical reporting of WAIS-R IQ scores, arguing that while the true score method may be more accurate, it is also more complicated and prone to clerical error. In practice, the conventional WAIS-R confidence interval/~ reasonably accurate and practicing clinicians may simply refuse to use the more complex procedures. Given that choice, one could argue that any confidence interval is better than no interval at all. While conceding that the conventional method may be reasonably accurate for WAIS-R IQ scores, the discrepancy between the two methods is much greater when applied to the WMS-R due to that scale's far lower reliability. For example, a 95% true score confidence interval for a WAIS-R full scale IQ of 85 is 85.6 + 4.8 = 81-90, or virtually equivalent to the 80-90 range found by the conventional method. However, in the case of

The Wechsler Memory Scale--Revised

187

a WMS-R General Memory Index of 85, the true score interval of 76-100 (88 _+ 12) differs markedly from the 71-99 (85 _+ 14) range found by using the conventional method. In practical terms, the calculation of true score confidence intervals is not as complicated as it first appears. If we can assume that reliability does not vary systematically with age, confidence intervals can be based on the average reliabilities, without having to use separate coefficients for each age group. Further, the indexes are all standard scores with a common mean of 100. These conditions permit the use of a worksheet for calculating true score confidence intervals that is simple enough for routine clinical use (see Appendix, part I). The average reliability coefficients and standard errors of measurement from the WMS-R manual can be preprinted on the worksheet according to the respective formula. The clinician can simply enter the subject's actual index scores and easily compute the respective confidence intervals on a pocket calculator. Whether one derives confidence intervals on the WMS-R according to the conventional method or the more accurate true score method, the two procedures agree in one fundamental respect: they both clearly demonstrate how broadly index scores on the revised memory scale should be interpreted.

Significance of Differences Between Scores An important clinical question is whether observed differences between WMS-R scores represent differences in memory functions or mere artifacts of measurement error. Indeed, the rationale for multiple scores on the WMS-R is that differences between those scores can be used to differentiate separate memory components. Moreover, differences between subjects' successive scores on the same test are used to evaluate the recovery or deterioration of memory functions. The significance of a difference between two test scores can be established by either of two methods, corresponding to the conventional and true score methods of setting confidence intervals. The most commonly used, or "conventional," method (Silverstein, 1989; Stanley, 1971) is expressed by the equation

&-& zx =

,,(xo2 ; se,,,tx 5 -

where Zx is the standard score value, and SEmJ(n), as before, denotes the standard error of measurement of the obtained scores Xn, while the sub-

188

Elwood

scripts identify the respective scores being compared. Wechsler (1987) provides a table based on the conventional formula for testing the significance of three common pairwise index score comparisons. The analogous standard score for the differences between true scores (Silverstein, 1989; Stanley, 1971) can be derived by A

A

T1 - 7"2 Z~ ~

A

A

SEre(TO a + SEre(T2) 2

The significance of the difference between actual scores Zx, or between corresponding estimated true scores Z:?, can be determined by simply comparing the respective Z score to any normal distribution table. The conventional method becomes increasingly more liberal than the true score method (a) with greater differences between the reliabilities of the tests being compared and (b) with greater deviations of the obtained scores from the test mean, again due to the regression of true scores toward the mean (Silverstein, 1989). Even though the index score reliabilities on the revised memory scale are low, the differences between those reliabilities are relatively small, especially for those comparisons generally thought to be useful. Applied to the WMS-R, simple calculations show that the methods are comparable when the obtained scores are near the mean. However, at a 95% confidence level when the higher of the two index scores is 85, the conventional method would require a 22-point difference between verbal and visual indexes, while the true score method would require a 24-point difference. Such differences are even more extreme when we consider that a significant difference between two index scores does not mean the differences are clinically relevant. If pairwise differences of a certain degree are common in normal subjects, then similar differences inpatients cannot be considered abnormal or clinically significant. Unfortunately, the base rates of differences between WMS-R scores among normal subjects are not available. Certainly, in order to interpret any pairwise comparisons of index scores they must at least be statistically significant. Beyond that, any interpretations of such contrasts must be made cautiously and with regard for the unknown base rates of such comparisons among normal individuals. Formats for calculating both true score confidence intervals and significance tests can be easily combined in a common worksheet (see Appendix) to supplement the existing record form in routine clinical practice. Atkinson (1991) has compiled a table of similar confidence intervals and tests for significant score differences for each of the age ranges in the WMS-R nor-

The Wechsler Memory Scale--Revised

189

mative sample, though the corresponding true score estimates must be calculated separately. These or similar formats could be easily included in the WMS-R record form itself. Establishing the significance of differences between successive scores on the same test is crucial in determining whether those differences represent actual declines in subjects' memory functions or merely reflect the instability of the test. The WMS-R manual ignores the significance of testretest differences altogether, and provides no table or formula by which to test them. Fortunately, the equations described above can be used to test the differences between successive scores. Based on a 95% confidence level, successive general memory index scores would have to differ by 19 points, visual memory scores by over 24 points, before they could even be considered significant differences. The analogous true score method again must consider each observed score but the two methods are similar for observed index scores near the mean. The test for the significance of true score differences (Z~), appears at first glance to be more complicated, since true scores must be estimated for each obtained score. Of course, the test is easy if made in conjunction with true score confidence intervals, since the estimated true scores will already have been calculated. Whether one uses the conventional or the recommended true score methods to test the significance of differences between successive scores, the thresholds they set are so high that they render the WMS-R virtually insensitive to all but the most extreme test-retest discrepancies. Moreover, statistical significance alone does not ensure that differences between successive scores are large enough to be clinically meaningful. Serial testing almost always results in an increase in scores that is attributed to practice effects. Just as the scores of clinical subjects are compared to those of normal controls, the apparent practice effect over successive administrations should also be compared to the gains made by normal controls (Kaufman, 1990). In order to make such a comparison, we must determine the prevalence of base rates of test-retest differences among normal individuals. Matarazzo and Herman (1984) provide these base rates for the WAIS-R, which show that even large gains in IQ scores over successive testing may be common with normal individuals. The need to determine if actual test-retest gains are abnormal applies equally to a W test used in serial assessment. Unfortunately, Wechsler does not provide the base rates of changes in WMS-R scores over serial administrations. The lack of such base rates and the high thresholds for statistical significance render the WMS-R ineffective in evaluating the recovery or deterioration of memory functions.

190

Elwood

VALIDITY Few would argue that the primary purpose of any clinical test of memory is to detect memory deficits (Erickson and Howieson, 1986). How effective a test is in achieving that purpose is typically reflected in how accurately it discriminates between control subjects and those with presumed memory deficits. Given the substantial measurement error of the WMS-R, the issue of its discriminative validity is especially important. Support for the validity of the WMS-R is based largely on the traditional method of comparing the mean scores of subjects in criterion clinical groups with those of presumably unimpaired control groups. Group Discrimination Studies Numerous studies have shown that the WMS-R is able to grossly discriminate various clinical samples from groups of normal individuals. Wechsler (1987) compared the mean index scores of 14 clinical subgroups with those of the standardization sample. With few exceptions, he found significant differences between normative and clinical samples on each of the WMS-R indexes. Ryan and Lewis (1988) compared 40 recently detoxified chronic male alcoholics with an equal number of male controls from the WMS-R standardization sample who were matched on age and education. They found significant differences between the two groups on all five WMS-R indexes and on 5 of the 12 subtests. Fisher (1988) compared 45 patients with confirmed multiple sclerosis with a normal control group of 25 subjects who had been matched on age, education, and sex. Fisher reported significant differences between the two groups of all five WMS-R indexes and on 10 of the 12 subtests. In an unpublished study, Reid and Kelly (1991) report finding significant differences between normal control and head-injured patients on all five WMS-R index scores. Butters and his colleagues (1988) compared amnesic, Alzheimer's, and Huntington's patients with both middle-aged and elderly normal controls. They found significant differences between each of the clinical groups and at least one of the control groups of all five WMS-R indexes. The only exception was that the mean attention-concentration index of the amnesic group did not differ significantly from that of normal controls. Taken together, these studies show all of the WMS-R index scores tend to be lower for virtually every clinical population that has been examined. Bornstein et al. (1989) computed several discrepancy scores based on within-subject differences between IQ scores and memory index scores. They then compared the discrepancy scores in a mixed clinical sample with

The Wechsler Memory Scale--Revised

191

the subjects in the WMS-R standardization sample who had been administered the full WAIS-R. They found that neither the VIQ-Verbal Memory or PIQ-Visual Memory discrepancy scores differed significantly between the two groups. Bornstein and his colleagues did find significant differences in the FSIQ-Detayed Recall Index discrepancy scores, but even under these such optimal discrimination conditions, those scores misclassified too many subjects to have any direct clinical application. Most would agree that the secondary purpose of a memory test is to describe the pattern or the nature of memory deficits. While the group comparisons discussed earlier (Fisher, 1988; Ryan and Lewis, 1988; Wechsler, 1987) suggest that some mean WMS-R scores may differ significantly between normal from clinical groups, they do not demonstrate that the WMS-R can distinguish between clinical populations. Butters et al. (1988) investigated the effectiveness of patterns of WMS-R index scores to discriminate clinical subgroups. They first calculated differences between the attention-concentration (AC) and general memory (GM) indexes for each subject, and reported that these AC-GM scores not only discriminated the clinical groups from the normal controls but could actually discriminate each of the clinical groups from one another. The differences in mean AC-GM scores were striking, ranging from 31 (i.e., AC > GM) for amnesic patients to -11 for controls. Butters et al. then calculated discrepancy scores based on the difference between the general and delayed memory indexes. They reported that the GM-DM scores discriminated the amnesic subjects from every other group though no other between-group differences were significant. Finally, Butters and his group calculated savings scores for the four subtests that are used to measure both immediate and delayed recall (Logical Memory, Visual Reproduction, and both Visual and Verbal Paired-Associates). The savings scores are analogous to the percent retained scores (immediate/delayed × I00) that Russell (1975) had devised for the logical memory and visual reproduction tests on the original WMS. Butters et aL reported that (a) Logical Memory savings scores discriminated each clinical group from both control groups; (b) Visual Reproduction savings scores discriminated only the amnesic subjects from both control groups; (c) the Huntington's group could be distinguished only from the young normal control group, whereas the Alzheimer's group differed only by comparison with the older controls; and (d) the Visual and Verbal Paired Associates save scores generally failed to discriminate the clinical groups from either nor-° real controls or from each other. The Butters et al. (1988) study suggests that derivations on several WMS-R scores may discriminate some clinical populations under certain conditions. While these results are encouraging, the immediate clinical ap-

192

Elwood

plication of these scores is limited for several reasons. First, the target populations that were compared in the study (Alzheimer's, Huntington's, and amnesic) have relatively distinct clinical features and their discrimination is not a particularly rigorous test of the derived scores. Second, many of the comparisons were found to be significant only with respect to one of the two control groups. The paradox is obvious: one must know what group a subject is in before being able to choose the criterion by which to compare his/her scores, yet discovering which group he/she is in is the reason for the comparison. Third, if the derived scores that Butters et al. propose were to be used in clinical practice, they would be subject to the same psychometric standards required of the existing WMS-R scores. Finally, the study used such small clinical samples (N = 16-24) that any attempt to extend these findings to other clinical populations is premature. Chelune and Bornstein (1988) compared WMS-R scores of patients with well-lateralized structural lesions. They found that Verbal Memory was the only index score that differed significantly between the right- and lefthemisphere lesion groups. When they compared the individual subtests, they found significant differences only on Logical Memory I and II and on Verbal Paired Associates I and II, the scores in each case being predictably lower with left-hemisphere involvement. The authors then computed savings scores (Butters et aL, 1988; Russell, 1975) from the Logical Memory and Visual Reproduction subtests, and found no significant differences in either savings score between the two patient groups. However, they did find a significant difference in the within-subject relationships of those savings scores. Namely, Logical Memory savings scores were greater than Visual Reproduction savings scores among patients with right-hemisphere lesions. A trend toward relatively greater visual reproduction savings scores of left-hemisphere patients was not significant. Loring and his colleagues (Loring et aL, 1989) used the discrepancy between verbal and visual index scores to compare samples of patients with right or left temporal lobectomies. Regardless of what cutoff scores were used, verbal-visual discrepancies could not predict the correct hemisphere beyond a chance level. Of course, significant between-group differences in mean test scores alone do not ensure that a test can classify individual subjects with the accuracy required for clinical use. Significant differences between criterion groups comprise a necessary, though not a sufficient, condition for the discriminative validity of a test or scale. Only two of the validity studies revealed the proportion of subjects that were correctly classified, and the results in both cases were clearly disappointing. Moreover, the discriminations of most of these studies were made under contrived situations that do not resemble those found in typical clinical settings. For example, most of the studies compared samples with high base rates for each subject

The Wechsler Memory Scale--Revised

193

group, whereas clinical tests are required to discriminate between many such groups, each with a relatively low base rate. The effect of low base rate on diminishing the accuracy of classifications or predictions has long been recognized (Meehl, 1955) and has been recently rediscovered (Baldessarini et al., 1983; Root-Bernstein, 1990). The available studies have not yet demonstrated the discriminative validity of the WMS-R under conditions similar to those encountered in typical clinical situations. Factor Analytic Studies Wechsler (1987) reported that principal component analyses of agecorrected immediate recall subtest scores in both the clinical and standardization samptes yielded two factors. The first was characterized as a general memory and learning factor, the second as an attention and concentration factor. Wechsler (1987) does not describe the criteria that were used to determine the number of factors nor the proportion of the common variance that was accounted for by each factor. The manual simply reports that the patterns of factor loadings are similar across the standardization and clinical and samples. However, the two samples showed marked differences in the patterns of subtests with distinct loadings, that is, those loading on only one factor or the other. In both the clinical and standardization samples, the general memory factor includes the paired associate tests and visual reproduction, while the attentional factor includes mental control and digit span. Figurat and logical memory loaded distinctly on the general memory factor in the clinical sample but were equivocal in the standardization sample, with moderate Ioadings on both factors. However, visual memory loads only on the attentional factor in the standardization sample but is equivocal in the clinical sample. Inconsistent loadings of logical memory and visual memory across samples are important since they are among the few reliable WMS-R subtests. Moreover, it raises the question of how consistent the WMS-R factors are across diverse clinical populations. Unstable factors are obviously not unique to the WMS-R. The WAIS-R picture arrangement and digit symbol subtests also toad differently across populations and their interpretation is confounded as well (Kaufman, 1990). Wechsler does not specify the delayed recall loadings and thus the stability of the loadings across the two samples remains unknown; the manual merely notes that all the delayed subtests loaded on the same genera! memory factor as did their immediate recall analogues. Roid and his colleagues (Roid et al., 1988) subsequently conducted confirmatory factor analyses on both the normative and clinical samples, again using just the eight immediate recall subtests. They found that two-factor solutions fit

194

Elwood

better than either the one- or two-factor models on each of the several measures used, though the goodness-of-fit measures for all three models were only marginal in the clinical sample. Bornstein and Chelune (1988) conducted several principal component analyses of WMS-R subtest scores drawn from a large clinical sample. In the first analysis, based on the immediate recall tests alone, they identified the same two-factor structure that Wechsler (1987) had described earlier. Again, visual memory was equivocal, loading nominally on both factors. Bomstein and Chelune (1988) reported that when they added the delayed recall subtests to the analysis, they could distinguish material-specific verbal and visual factors. The verbal factor consisted of logical memory and verbal paired associate subtests, whereas the visual factor comprised figural memory, visual reproduction, and visual paired associate subtests. A third, "weak" factor, so qualified because of its low eigenvalue (0.88), was reportedly very similar to the attentionat factor described in the previous analyses. The three factors were said to account for 73% of the common variance. This study was the first, and still the only, to report any factor analytic support for the verbal and visual index scores of the revised memory scale. Elsewhere (Elwood, in press) I challenged the Bornstein and Chelune study on both methodological and statistical grounds. First, I contend there is no rationale for retaining a principal component whose associated eigenvalue is less than one (Cliff, 1988) and that such a practice is not supported by any current factor retention criteria (Zwick and Velicer, 1986). Second, I conducted principal component analyses (Elwood, in press) of a mixed clinical sample (N = 168) and found that only one factor could be identified when the more accurate scree test was used to determine the number of components. The single factor was found whether the analysis included all or just the immediate recall scores and whether or not the subtest scores were corrected for age. Since the scree test tends to overestimate the number of components (Zwick and Velicer, 1986), it is a conservative test of a unitary factor structure. The discrepancy between the studies that found single factor (Elwood, in press) and multiple factor solutions (Bornstein and Chelune, 1988; Roid et al., 1988; Wechster, 1987) may be due to differences between their respective samples. It is pertinent to note that my study included relatively more cases of schizophrenia and alcohol abuse than did the other studies. This explanation would suggest the two factors that Wechsler (19878) first identified may not generalize to typical mixed clinical populations that include more substance abuse and serious mental disorders. On the other hand, the disparities between these studies could reflect their different their different statistical criteria. For example, the eigenvalues I found (Elwood, in press) found were virtually

The Wechsler Memory Scale--Revised

195

identical to those reported by Bornstein and Chelune (1988), suggesting that the discrepancies between these two studies may simply reflect different criteria that were used to determine how many factors should be retained. Interpreting the so-called general memo@earning and attentional factors is further complicated by the loadings that are found by adding IQ to the analyses. Wechsler (1987) reported that full scale IQ loaded on the attentional factor in both the standardization and clinical samples. Bornstein and Chelune (1988) also found that both verbal and performance IQ loaded on the apparent attentional factor in their clinical sample. If indeed the first factor represents learning, one would expect it to load more with IQ scores. This finding reminds us of what Cliff (1983) termed the nominalistic fallacy: assuming that by naming a factor we know what it means. Thus the available factor analytic studies do not achieve a clear consensus of the factor structure of the WMS-R in clinical populations. Four independent studies of overlapping clinical samples disagree over one, two, and three WMS-R factors. The three-factor solution, the only one to include separate verbal and visual factors, is untenable on statistical grounds. Discrepancies between studies suggesting one- and two-factor structures may reflect actual differences in their clinical samples, but may also result from the use of different factor retention criteria. The question is whether the WMS-R in typical clinical populations includes discrete general memory and attentional factors, or whether it should be considered a singlefactor scale. In either case, the factor analytic studies to date provide no support for either visual m e m o u or delayed recall WMS-R factors. The lack of such factors certainly undermines the pattern of index scores on the WMS-R. While a delayed memory factor has not been found in analyses of the standard subtest scores, Roth and his colleagues (1990) have identified delayed recall factors on the WMS-R by confirmatory factor analyses that partialled out the common variance between immediate memory and delayed recall subtests. They evaluated seven different models of between one and five factors in a sample (N = 107) of patients with histories of traumatic head injuries. Three of the models that were tested met conventional goodness-of-fit criteria, and of those, all included one or two delayed recall factors. Of the best fitting models, the most parsimonious comprised three factors: immediate memory, delayed recall, and attention. Roth et at. thus provide the first evidence of a delayed recall factor in WMS-R scores obtained from a clinical population. This is an important new finding, although it is somewhat tempered by the study's small sample. Further, since the sample was composed entirely of patients with head injuries, the factor structure it found may not extend to other clinical populations. Moreover, the delayed

196

Elwood

recall factor was identified only by statistical manipulations of subtest scores that do not translate into clinically feasible scoring procedures. I (Elwood, 1991) replicated the age-corrected principal component analyses described earlier but substituted save scores for the original delayed recall scores. Both the scree test and Velicer's minimum average partial test retained only a single memory factor. In other words, the proportional save scores did not load on a common factor any more than did the conventional delayed recall scores. That Roth et al. (1990) found a delayed recall factor in their sample of head-injured patients suggests that such a factor structure may be confined to certain clinical populations. On the other hand, the finding may reflect the differences between component analysis and the structural model used in confirmatory methods.

SUMMARY AND EVALUATION The standardization of the WMS-R is inadequate by current standards. Its sample size is small for such a broad target population and a third of that population is represented by interpolated scores. The use of simple random sampling compounds the effect of a small sample size, and results in large standard errors of both the subtests and the composite indexes. In short, the norms "fall far short of good psychometric practice" (Kaufman, 1990, p. 590). Of the five WMS-R composite memory indexes, only General Memory and Attention-Concentration meet even the most liberal standards for reliability. Only 4 of the 12 WMS-R subtests meet any reliability standards: Digit Span, Visual Memory, and Logical Memory I and II. The poor reliabilities are compounded by the brevity of the scale. The WMS-R simply attempts to measure too many different memory functions for so short a scale. The large standard errors and poor reliabilities create huge standard errors of measurement. As a result, the WMS-R provides only an approximate estimate of overall memory functioning. The scale is largely insensitive to differences between scores, either on the same or on successive administrations of the scale. In clinical populations, the WMS-R appears to have a single general memory factor, though a second attention-concentration factor may be found in certain clinical populations. Even these constructs are difficult to interpret because both WAIS-R VIQ and PIQ scores consistently load on the so-called attentional factor. The multidimensional structure of the WMS-R index scores is not supported: the verbal, visual, and delayed memory indexes have no parallel factors nor have they been shown to meet current standards for either reliability or validity. One useful way to evaluate the WMS-R is to follow the recommendation of Cronbach (1970) and consider how well the scale fulfills its

The Wechsler Memory Scale--Revised

197

author's own intentions. Wechsler (1987) claims the WMS-R was intended to assist in the clinical assessment of memory for such purposes as (a) assessing the pattern and localization of brain damage, (b) diagnosing brain dysfunction, (c) measuring changes in subjects' memory after therapy, and (d) providing useful information for rehabilitation or training. Considering these objectives in turn, first we find that there is yet no evidence the WMSR can either describe the pattern or predict the laterality of memory disorder. There is some tentative evidence that the scale can discriminate certain clinical groups from each other, if under highly contrived conditions. The poor accuracy and stability of the WMS-R make it essentially insensitive to changes in subjects' scores over successive administrations. Finally, the WMS-R does not appear to provide information that is relevant to rehabilitation planning, though that criticism is shared by most current neuropsychological instruments (Heinrichs, 1990). Clearly, the WMS-R fails to fulfill its author's own intended purposes, tn absolute terms, the revised memory scale is a vast improvement over the original WMS. However, in relative terms, the WMS-R hardly seems an advancement when compared to the current psychometric state of the art. Any description of the psychometric shortcomings of the WMS-R must be balanced by the recognition that those same deficiencies apply equally to many other neuropsychological instruments. Mortensen and his colleagues observe that "it is a curious fact that psychometric theory has played no major role in the development and application of many neuropsychological tests" (Mortensen et at., 1991, p. 361). However, the focus of this review on the WMS-R alone is justified because, by aI1 accounts, the revised scale appears to have inherited the immense popularity of its predecessor and thus will likely play a major role in clinical memory assessment. Moreover, with the exception of a few established batteries, few neuropsychological instruments have been developed by commercial publishers (Spreen and Strauss, 1991). One may argue that test publishers who market psychometric materials for profit incur a responsibility beyond that of individual authors who simply provide informal tests in the public domain. To the publisher's credit, the comprehensive and explicit WMS-R manual make the scale's deficiencies all the more apparent. Informal laboratory-based tests are seldom accompanied by such extensive documentation, allowing their weaknesses to go unnoticed.

RECOMMENDATIONS The WMS-R, like the original WMS, will probably see wide use in clinical memory assessment, despite having severe psychometric deficien-

Eiwood

198

cies. Those deficiencies clearly limit its clinical application and require procedures beyond those outlined in the manual. First, the substantial standard errors of measurement must be considered when interpreting both the WMS-R subtest and composite index scores. Moreover, the measurement error virtually requires that index scores be reported in terms of confidence intervals. The more accurate true score confidence intervals are recommended over those based on obtained scores, though any confidence intervals are preferable to the use of absolute scores. No doubt there is a practical price to pay for this truth in clinical advertising. Consider informing a referring neurologist that the General Memory Index of their patient is somewhere in the average to borderline ranges. Such apparent vacillation, however justified, would not likely impress our colleague or encourage further neuropsychological consultations. Second, the differences between current or successive index scores should not be considered unless they are shown to be statistically significant and abnormal. Thus, even significant differences should be interpreted with caution unless the base rates of those differences in normal subjects can be determined. The supplemental worksheet described earlier is recommended for routine calculations of confidence internals, and for tests of the significance of score differences. Alternately, the standard errors of measurement provided by Atkinson (1991) may be used for the same purpose. Third, of the individual WMS-R subtests, only Digit Span, Visual Memory, and the two Logical Memory subtests have shown sufficient reliability to be interpreted on their own. The problem of measurement error applies just as much to individual subtest scores as it does to summary index scores. Thus, if subtest scores are considered individually, they too should be interpreted in terms of confidence intervals rather than absolute scores. These recommendations will not overcome the severe clinical limitations of the WMS-R but hopefully they will minimize errors in its interpretation.

REFERENCES American Psychiatric Association. (1987). Diagnostic and Statistical Manual of Mental Disorders (3rd ed., rev.), Author, Washington, DC. American P%,chological Association. (1981). Ethical principles of psychologists. American Psychologist 36: 633-638. Anastasi, A. (1988). Psychological Testing (6th ed.), Macmillan, New York. Atkinson, L. (1991). Three standard errors of measurement and the Wechsler Memory ScaleRevised. Psychological Assessment: A Journal of Consulting and Clinical Psychology 3: 136138. Baldessarini, R. J., Finklestein, S., and Arana, G. W. (1983). The predictive power of diagnostic tests and the effect of prevalence of illness. Archives of General Psychiatry 40: 569573.

The Wechsler Memory Scale--Revised

199

Bornstein, R. A., and Chelune, G. J. (1988). Factor structure of the Wechsler Memory ScaleRevised. The Clinical Neuropsycholog~t 2: 107-115. Bornstein, R. A., Chelune, G. J., and Prifitera~ A. (1989). IQ-memory discrepancies in normal and clinical samples. PsychologicalAssessment 1: 203-206. Butters, N., Salmon, D.. P., Cullurn, C. M., Cairns, P., Tr6ster, A. I., Jacobs, D., Moss, M. and Cermak, L. S. (1988). Differentiation of amnesic and demented patients with the Wechsler Memory Scale-Revised. The Clinical Neuropsychologist 2: 133-148. Chelune, G. J., and Bornstein, R. A. (1988). WMS-R patterns among patients with unilateral brain lesions. The Clinical Neuropsychologist 2: 121-132. Cliff, N. (1988). The eigenvalues-greater-than-one rule and reliability of components. Psychological Bulletin 103: 276-279. Cliff, N. (1983). Some cautions concerning the application of causal modeling methods. Multivariate Behavior Research 56: 81-105. Cronbach, L. J. (1970). Essentials of Psychological Testing, Harper & Row, New York. Cunningham, W. R. (1986). Psychometric perspectives: Validity and reliability. In Poon, L. W. (ed.), Clinical Memory Assessment in Older Adults, American Psychological Association, Washington, DC, pp. 27-31. D'Etia, L., Satz, P., and Schretten, D. (1989). Wechsler Memory Scale: A critical appraisal of the normative studies. Journal of Clinical and Experimental Neuropsychotogy I1: 551-568. Dudek, F. J. (1979). The continuing misinterpretation of the standard error of measurement. Psychological Bulletin 86: 335-337. Elwood, R. W. (1991). Delayed recall on the Wechsler Memory Scale-Revised (WMS-R): The factor structure revisited. Manuscript submitted for publication. Elwood, R. W. (in press). Factor structure of the Wechsler Memory Scale-Revised (WMS-R) in a cl/nieal sample: A methodological reappraisal. The Clinical Neuropsychologist. Erickson, R. C., and Howieson, D. (1986). The clinician's perspective: Measuring change and treatment effectiveness. In Poon, L. W. (ed.), Clinical Memory Assessment in Older Adults American PsychologicalAssociation, Washington, DC, pp. 69-80. Fisher, J. S. (1988). Using the Wechsler Memory Scale-Revised to detect and characterize memory deficits in multiple sclerosis. The Clinical Neuropsychologist 2: 149-172. Franzen, M. D. (1989). Reliability and Validity in Neuropsychological Assessment, Plenum Press, New York. Glutting, J. J., McDermott, P. A., and Stanley, J. C. (1987). Resolving differences among methods of establishing confidence limits for test scores. Educational and Psychological Measurement 47: 607-614. Golden, C. J., Sawicki, R. F., and Franzen, M. D. (1984). Test construction. In Goldstein, G., and Hersen, M. (eds.), Handbook of PsychologicalAssessment, Pergamon Press, New York, pp. 19-37. Gorsuch, R. L. (1983). Factor Analysis (2d ed.), Erlbaum, Hillsdale, NL Heinrichs, R. W. (1990). Current and emergent applications of neuropsyehologicat assessment: Problems of validity and utility. Professional Psychology: Research and Practice 3:171-176. Jaeger, R. M. (1984). Sampling in Education and the Social Sciences, Longman, New York. Kaufman, A. S. (1990). Assessing Adolescent and Adult Intelligence, Allyn and t~acon, New York. Knight, R. R. (1983). On interpreting the several standard errors of the WAIS-R: Some farther tables. Journal of Consulting and Clinical Psychology 51: 671-673. Lezak, M. (1983). Neuropsychological Assessment (2nd ed.), Oxford University Press, New York. Lord, F. M,, and Novick, M. R. (1968). Statistical Theories of Mental Test Scores, AddisonWesley, Reading, MA. Loring, D. A. (1989). The Wechsler Memory Scale-Revised, or the Wechsler Memory ScaleRevisited? The Clinical Neuropsychologist 3: 59-69. Loring, D. W., Lee, G. P., Martin, R. C., and Meador, K. J. (1989). Verbal and visual memory index discrepancies from the Wechsler Memory Scale-Revised: Cautions in interpretation. PsychologicalAssessment 1: 198-202.

200

Eiwood

Matarazzo, J. D. (1990). Psychological assessment versus psychological testing. American Psychologist 45: 999-1017. Matarazzo, J. D., and Herman, D. O. (1984). Base rate data for the WAIS-R: Test-retest stability and VIQ-PIQ differences, Journal of Clinical Neuropsychology 6: 351-366. Meehl, P. E. (1955). Antecedent probability and the efficiency of psychometric signs, patterns, or cutting scores. Psychological Bulletin 52: 194-216. Mittenberg, W., Burton, D. B., Thompson, G. B., and Darrow, E. (1991, February) Normative data for the WMS-R: 25- to 34-year-olds. Paper presented at the annual meeting of the International Neuropsychotogical Society, San Antonio, TX. Mortensen, R. L, Gade, A., and Reinisch, J. M. (1991). A critical note on Lezak's 'best performance method' in clinical neuropsychology. Journal of Clinical and Experimental Neuropsychology 13: 361-371. Naglieri, J. A. (1982). Two types of tables for use with the WAIS-R. Journal of Consulting and Clinical Psychology 50: 319-321. Nunnally, J. C, (1967). Psychometric Theory, McGraw-Hill, New York. Powel, J. (1988). Review of the Wechsler Memory Scale-Revised. Archives of Clinical Neuropsychology 3: 397-403. Prigatano, G. P. (1977). The Wechsler Memory Scale is a poor screening test for brain dysfunction. Journal of Clinical Psychology 33: 772-777. Prigatano, G. P. (1978). Wechsler Memory Scale: A selective review of the literature. Journal of Clinical Psychology 34: 816-832. Reid, D. B., and Kelly, M. P. (1991). A study of the Wechsler Memory Scale-Revised in closed-head injury. Paper presented at the annual meeting of the International Neuropsychological Society, San Antonio, TX. Reynolds, C. R., Wilson, V. L., and Clark, P. L. (1983). A four-test short form of the WAIS-R for clinical screening. Journal of Clinical Neuropsychology 5: 111-116. Roid, G. H., Prifitera, A., and Ledbetter, M. (1988). Confirmatory analysis of the factor structure of the Wechsler Memory Scale-Revised. The Clinical Neuropsychologist 2: 116-120. Root-Bemstein, R. S. (1990, March/April). Misleading reliability. The Sciences pp. 6-8. Roth, D. L., Conboy, T. J., Reeder, K. P., and Boll, T. J. (1990). Confirmatory factor analysis of the Wechsler Memory Scale-Revised in a sample of head-injured patients. Journal of Clinical and Experimental Neuropsychology 12: 834-842. Russell, E. W. (1975). A multiple scoring method for the assessment of complex memory functions. Journal of Consulting and Clinical Psychology 43: 800-809. Ryan, J. J., and Lewis, C. V. (1988). Comparison of normal controls and recently detoxified alcoholics on the Wechsler Memory Scale-Revised. The Clinical Neuropsychologist 2: 173180. Silverstein, A. B. (1982). Two- and four-subtest short forms of the Wechsler Adult Intelligence Scale-Revised. Journal of Consulting and Clinical Psychology 50: 415-418. Silverstein, A. B. (1989). Confidence intervals for test scores and significance tests for test score differences: A comparison of methods. Journal of Clinical Psychology 45: 828-832. Snedecor, G, W., and Cochran, W. G, (1980). Statistical Methods (7th ed.), Iowa State University Press, Ames, IA. Spreen, O., and Strauss, E. (1991). A Compendium of Neuropsychologicat Tests:Administration, Norms, and Commentary, Oxford University Press, New York. Stanley, J. C. (1971). Reliability. In R. L. Thorndike, R. L. (ed.), Educational Measurement (2nd ed.), American Council on Education, Washington, DC, pp. 356-442. Wechsler, D. (1981). Manual for the WechslerAdult Intelligence Scale-Revised, The Psychological Corporation, New York. Wechsler, D. (1987). Manual for the Wechsler Memory Scale-Revised, The Psychological Corporation, San Antonio, TX. Zwick, R., and Velicer, W. F. (1986). Comparison of five rules for determining the number of factors. Psychological Bulletin 99: 432-442.

The Wechsler Memory Scale--Revised

201

Appendix. WMS-R Supplemental Worksheet Name

ID

Date

I. W M S - R c o n f i d e n c e intervals Obtained score (X)

Index V e r b a l M e m o r y 0.77 Visual M e m o r y 0.70 G e n e r a l M e m o r y 0.81 Attention/Concentration0.90 D e l a y e d M e m o r y 0.77

x × x × ×

Estimated true score (T)

Confidence interval ~ %

Margin of error

+ 23 = + 30 = + 19 =

± (5.61 z ± (5.93 z ± (5.52 z

= = =

) = ) = ) =

+ 10 = + 23 =

± (4.37z ± (5.74 z

= _) = =~) =

C o n f i d e n c e level 68% z = 1.00

85% 1.44

90% 1.64

95% 1.96

99% 2.58

II. Significance o f pairwise index score c o m p a r i s o n s T r u e scores Index comparison General/Attention Concentration Verbal/Visual General/Delayed

Tl

/'2

Score difference

Z score

= (

+ 7,05 =

= ( = (

+ 8.15 = + 7.95 =

Significance level

The Wechsler Memory Scale-Revised: psychometric characteristics and clinical application.

The psychometric characteristics of the Wechsler Memory Scale-Revised (WMS-R) are evaluated and related to its clinical utility. The accuracy of the s...
2MB Sizes 0 Downloads 0 Views