Neurol Sci DOI 10.1007/s10072-013-1573-8

ORIGINAL ARTICLE

Normative data for measuring performance change on parallel forms of a 15-word list recall test Giovanni A. Carlesimo • Marco De Risi • Marco Monaco • Alberto Costa • Lucia Fadda • Angelo Picardi • Giancarlo Di Gennaro Carlo Caltagirone • Liliana Grammaldo



Received: 25 July 2013 / Accepted: 30 October 2013 Ó Springer-Verlag Italia 2013

Abstract Declarative memory evaluation is an essential step in the clinical and neuropsychological assessment of a variety of neurological disorders. It typically addresses the issue of normality/abnormality of an individual’s performance. Another clinical application of the neuropsychological assessment of declarative memory is the longitudinal evaluation of an individual’s performance change. In fact, in a variety of neurological conditions repeated assessments are needed to evaluate the modifications of a memory disorder as a function of time or in response to a pharmacological or rehabilitation treatment. This study was aimed at collecting data for measuring and interpreting performance change on a memory test for verbal material. For this purpose, we administered to 100 healthy subjects (age range 20–80 years; years of formal education range 8–17 years) three parallel forms of a test

requiring the immediate and delayed recall of a 15-word list. The subjects performed the recall test three times (each time with a different list) at least 1 week apart. The order of the lists was randomized across subjects. Results revealed that performance on the three lists was highly correlated and did not vary as a function of the order of presentation. However, accuracy of recall was slightly better on a list compared to the others. Based on a method devised by Payne and Jones (J Clin Psychol 13:115–121, 1957), we provide normative data for establishing whether a discrepancy in recall accuracy on two versions of the test exceeds the discrepancy expected based on the performance of normal controls.

Electronic supplementary material The online version of this article (doi:10.1007/s10072-013-1573-8) contains supplementary material, which is available to authorized users.

Introduction

G. A. Carlesimo  L. Fadda  C. Caltagirone Neurology Clinic, Tor Vergata University, Rome, Italy G. A. Carlesimo (&)  M. Monaco  A. Costa  L. Fadda  C. Caltagirone IRCCS Santa Lucia Foundation, Via Ardeatina 306, 00179 Rome, Italy e-mail: [email protected] M. De Risi  G. Di Gennaro Epilepsy Surgery Unit, Department of Neurological Sciences, IRCCS ‘‘NEUROMED’’, Pozzilli, IS, Italy A. Picardi  L. Grammaldo Mental Health Unit, Centre of Epidemiology, Surveillance and Health Promotion, Italian National Institute of Health, Rome, Italy

Keywords Memory assessment  Performance change  Word list recall  Parallel forms

Declarative memory assessment is an essential step in the clinical and neuropsychological evaluation of a variety of neurological conditions, including dementia syndromes [8], epileptic syndromes involving the mesio-temporal structures [18], and vascular, tumoral, infective and anoxic conditions that directly or indirectly affect some components of the so-called extended hippocampal system [15]. The neuropsychological evaluation of memory typically addresses the issue of normality/abnormality of an individual’s performance. For this purpose, a neuropsychological instrument must include normative data from a sample of healthy individuals that permit adjusting raw performance scores according to the principal sources of variance (i.e., age, education and gender). Moreover, conventionally established limits of tolerance in the score

123

Neurol Sci

distribution should be furnished. This makes it possible to assign the individual’s adjusted score to the abnormal or normal range and, in the latter case, to some level of the normal distribution. Norms for a variety of declarative memory tests are available in the Italian language. They include word list learning tests [9, 13, 16], recall tests of short passages of prose [1, 7, 10, 12, 16, 21], tests of immediate and delayed reproduction of non-verbalizable figures [3, 6, 10] and tests involving the recognition of words and unknown faces [20]. Another clinical application of the neuropsychological assessment of memory could be the longitudinal evaluation of an individual’s performance change. In fact, in a variety of neurological conditions repeated assessments are needed to evaluate the progression of a declarative memory disorder or the effectiveness of treatment to ameliorate or at least prevent the worsening of memory decline. Typical examples include monitoring memory deficits in patients affected by presymptomatic (i.e., prodromal) or symptomatic mild Alzheimer’s disease and related disorders, laboratory assessment of memory performance in individuals with closed-head injuries who have undergone a cognitive rehabilitative intervention, or the immediate or delayed post-surgical assessment of hippocampal epileptic patients. In this case, a neuropsychological instrument must meet two requirements to correctly interpret the normality/ abnormality of a performance change: First, to avoid test– retest learning effects parallel forms of the same test are needed in which the administration procedures and scoring are kept constant and memoranda characteristics are controlled for equivalence; second, the actual equivalence of the parallel forms must be demonstrated by administering them to a group of healthy subjects and measuring possible performance changes related to a particular version of the test or the serial order of administration. In the literature, word list recall is considered the most suitable memory test for obtaining multiple versions with very similar material. Parallel forms have been reported for Rey’s Auditory Verbal Learning task [19], the California Verbal Learning test [11] and the Hopkins Verbal Learning test [5]. These studies report data on relatively large groups of healthy controls. Results typically confirm comparable recall performance of healthy individuals on the parallel

forms [5, 11, 19]. Generally, however, these papers lack normative data on the tolerance intervals of the performance score change observed in the normal population passing from one form of the test to another. At the individual level, some differences should be expected when they are compared with parallel forms of the same test after a period of time. In a clinical perspective, it is important that norms be available which permit establishing whether an individual’s performance change (i.e., improvement or worsening) exceeds that expected in normal controls. Aim of this study was to provide normative data on performance score changes relative to three parallel forms of a 15-word list recall test. For this purpose, we administered the three versions of the test to a sample of 100 healthy individuals who varied as for age, years of formal education and gender. Normative data relative to performance scores on the immediate and delayed recall of the three lists and correlational data among lists are presented to estimate the abnormality of a test score difference according to the method devised by Payne and Jones [17].

Materials and methods Subjects A total of 100 healthy subjects (mean age 50.85 ± 18.19 years, range 20–80; mean education 12.35 ± 3.84, range 8–17), including 47 males (mean age 53.60 ± 18.56 years; mean education 12.30 ± 3.64) and 53 females (mean age 48.42 ± 17.67 years; mean education 12.40 ± 4.04) were enrolled in this study on a voluntary basis (Table 1). Inclusion criteria for participants were age between 20 and 80 years, education and gender. For each of seven decades, we enrolled a grossly comparable number of participants for each of three education levels, namely, middle school degree (8 years), high school degree (13 years) and university degree (17 or 18 years). Finally, we recruited a comparable number of males and females in each decade. Exclusion criteria included: (1) suspected cognitive impairment or dementia based on a MMSE score B23.8 [14] (actually, no patient in the recruited sample

Table 1 Experimental sample as a function of age and gender Decade

20–29

30–39

40–49

50–59

60–69

70–80

Total

Number

17

14

15

18

18

18

100

M/F

7/10

6/8

7/8

8/10

10/8

9/9

47/53

8 years of schooling

5

5

5

7

7

7

36

13 years of schooling 17 years of schooling

7 5

5 4

6 4

5 6

4 7

4 7

31 33

123

Neurol Sci

obtained a corrected MMSE score less then 26), (2) subjective complaint of memory difficulties or any other cognitive deficits interfering or not with daily life activities, (3) reported psychiatric or neurological disorders and (4) known or suspected history of alcoholism or drug dependence or abuse during the lifetime.

Materials Experimental materials consisted of three lists of words, each consisting of 15 semantically unrelated items belonging to the category of concrete objects (see supplementary Appendix A). The words in the three lists were comparable for length (4–10 letters) and average frequency of occurrence in the Italian written language, according to Bortolini et al. [4] (list A 111.1 ± 189.9, list B 110.7 ± 171.9, list C 98.1 ± 140.1, F = 0.20; p = ns with 2.42 df), and in the Italian spoken language, according to Bertinetto et al. [2] (list A 419.1 ± 743.7, list B 473.0 ± 836.7, list C 539.1 ± 935.2, F = 0.07, p = ns with 2.42 df). Procedure The examiner informs the subject that he will be presented with a list of words that he has to recall as accurately as possible and regardless of the presentation order in a successive memory test. When the subject demonstrates that he has understood the instructions, the examiner reads aloud the 15 words at a rate of about 2 s per word. Immediately after, he is invited to start the word recall. The examiner records the order of the words recalled accurately on a sheet of paper (see supplementary Appendix B). There are five consecutive list presentation trials followed by the immediate recall. Fifteen minutes after the fifth trial has ended (during which the subjects is involved in other nonverbal cognitive tasks), he has to recall the word list again, this time without presentation. The number of words correctly recalled in the five immediate trials is the total immediate recall score (range 0–75) and the number of words correctly recalled in the 15-min delayed trial is the delayed recall score (range 0–15). The subjects performed the recall test three times (each time with a different word list) at least 1 week apart. The order of the lists was randomized across subjects. For the immediate and delayed recall of list A, normative data for score adjustment according to subjects’ age and years of formal education as well as tolerance intervals for normality/abnormality of adjusted scores were previously published [9].

Statistical analyses We first examined the normality of the test score distribution by means of the Kolmogorov–Smirnov test. To determine whether the three lists were equivalent, we first performed correlational analyses (by means of Pearson’s r) among performance scores obtained for immediate and delayed recall on the three lists. Then, we compared accuracy scores on the three lists using one-way ANOVAs. For post hoc analyses, we used the Tukey’s HSD test. To evaluate whether performance varied according to the order of the testing session, we submitted performance on the first, second and third session to one-way ANOVAs, regardless of the list administered. To investigate the relationship between age and education and performance scores on immediate and delayed recall, we calculated the correlation coefficients by means of Pearson’s r. The effect of gender was investigated with one-way ANOVAs. The main aim of the present study was to calculate parameters for establishing whether a discrepancy in recall accuracy on two versions of the word list recall test exceeds the discrepancy expected based on the performance of normal controls. For this purpose, we suggest using a variant of the method devised by Payne and Jones [17] for estimating the abnormality of score differences on two tests. In brief, using this method an individual’s raw scores on two tests are first converted to Z scores (ZX and ZY) based on the mean and standard deviation of performance in a normative sample. The difference between the two Z scores is in turn expressed as a Z score (ZD) according to the following formula: ZD¼ ZX  ZY pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2  2rxy where rxy represents the correlation between the two tests and all other terms are as defined above. We applied the formula for calculating the ZD to all combinations and administration orders of the word lists for both immediate and delayed recall. To estimate the probability of exceeding the discrepancy observed in the normative sample, the ZD value can be referred to a table of the area under the normal curve. We considered as significantly exceeding the tolerance intervals those ZD values with a probability of occurring in the 5 % of normative population. To control for the influence of personal variables on ZD values, we computed Pearson’s correlation coefficients between ZD values and age and years of formal education of participants, and compared the ZD values of males and females by means of one-way ANOVAs. All statistical analyses were performed by SPSS statistical package.

123

Neurol Sci

Results

(a)

55

Average accuracy score

(a)

15

Average accuracy score

Figures 1 and 2 report the mean scores for immediate and delayed recall of each list according to age decades and years of education. Performance scores on the three lists for both immediate and delayed recall were normally distributed (d ranging between 0.07 and 0.13; p consistently [0.05). Performance on the three lists was highly correlated for both immediate recall (list A vs. list B r = 0.80, p \ 0.001; list A vs. list C r = 0.82, p \ 0.001; list B vs. list C r = 0.76, p \ 0.001) and delayed recall (list A vs. list B r = 0.78, p \ 0.001; list A vs. list C r = 0.78, p \ 0.001; list B vs. list C r = 0.77, p \ 0.001). One-way ANOVAs documented a significant difference in performance accuracy on the three lists for both immediate (F = 15.11; p \ 001 with 2.198 df) and delayed (F = 8.44; p \ 0.001 with 2.198 df) recall. Post hoc tests conducted to qualify the significant group effect on immediate recall showed that individuals in the normative sample obtained higher recall scores on list A (46.49 ± 8.54) than lists B (43.97 ± 8.87; p \ 0.001) and C (43.59 ± 8.28; p \ 0.001) which, in turn, did not differ from each other (p [ 0.80). The same was true for delayed recall (list A 10.09 ± 2.31; list B 9.71 ± 2.59; list C

9.70 ± 2.59; A vs. B p \ 0.03; A vs. C p \ 0.001; B vs. C p \ 0.07). List A was administered first to 31 subjects, second to 32 subjects and third to 37 subjects. List B was administered first to 32 subjects, second to 38 subjects and third to 30 subjects. List C was administered first to 37 subjects, second to 30 subjects and third to 33 subjects. Accuracy of immediate and delayed recall did not vary as a function of the order of presentation. Indeed, as for immediate recall, average accuracy on the lists that were administered first (44.49 ± 8.89), second (43.85 ± 8.87) and third (44.97 ± 8.73) did not differ significantly (F = 1.77; p = ns). The same was true for delayed recall (lists administered first 9.91 ± 2.51; second 9.53 ± 2.59; and third 9.56 ± 2.52; F = 2.85; p = ns). Immediate recall performance on the three lists correlated negatively with age (list A r = -0.58, p \ 0.001; list B r = -0.49, p \ 0.001; list C r = -0.55, p \ 0.001) and positively with years of education (list A r = 0.29, p \ 0.001; list B r = 0.36, p \ 0.001; list C r = 0.35, p \ 0.001). A similar relationship pattern was also observed for delayed recall (age: list A r = -0.46,

50 45 40 List A

10

ListA ListB ListC

List B

35

5

List C

20-29

30-39

30 20-29

30-39

40-49 50-59 Age Decades

60-69

70-80

(b)

List A List B List C

45 40 35

70-80

ListB ListC

10

5

30 8

13

17

Years of education

Fig. 1 Average performance of the normative samples in the immediate recall of the three lists as a function of age (a) and years of education (b)

123

60-69

ListA

Average accuracy score

Average accuracy score

50

50-59

15

(b)

55

40-49

Age Decades

8

13

17

Years of education

Fig. 2 Average performance of the normative samples on the delayed recall of the three lists as a function of age (a) and years of education (b)

Neurol Sci Table 2 Performance parameters of the normative sample needed to calculate ZD for each combination of lists Mean (standard deviation) of the normative sample

Correlation coefficient (Pearson’s r)

List A

List C

A-B

A-C

B-C

43.59 (8.28)

0.80

0.82

0.76

9.70 (2.59)

0.78

0.78

0.77

List B

Immediate recall 46.49 (8.53) 43.97 (8.87) Delayed recall 10.09 (2.31) 9.71 (2.59)

p \ 0.001; list B r = -0.48, p \ 0.001; list C r = -0.46, p \ 0.001; education: list A r = 0.24, p \ 0.02; list B r = 0.36, p \ 0.001; list C r = 0.34, p \ 0.001). Females did not differ from males on both immediate (45.86 ± 8.88 vs. 43.35 ± 8.74; F = 2.54; p = ns with 1.98 df) and delayed recall (10.08 ± 2.68 vs. 9.34 ± 2.35; F = 2.62; p = ns with 1.98 df). Table 2 reports the parameters needed to calculate the ZD for each list and each combination of lists. The limits of the tolerance intervals for the ZD distribution (including at least 95 % of the normal population) were ± 1.65. The number of subjects in the normative sample whose ZD exceeded the tolerance intervals for any combination of the lists ranged between 4 and 5 % in immediate and delayed recall. In the normative sample, none of the ZD values correlated significantly with subjects’ age (r ranging between -0.15 and ?0.09) or years of formal education (r ranging between -0.19 and ?0.03). Males and females did not differ for average ZD values (F ranging between 0.01 and 0.42). An Excel file to calculate an individual’s ZD starting from raw recall scores is available upon request.

Discussion The main goal of this study was to provide normative data for interpreting performance changes across the administration of three parallel forms of a 15-word list recall test. A performance change which did not exceed that observed in the normative sample should be interpreted as due to chance. Instead, a performance change which exceeds that of the normative sample should be interpreted as due to the variable of interest (e.g., disease progression, pharmacological or rehabilitative treatment, etc.). Performance accuracy on the three versions of the test was highly correlated (Pearson’s r ranged between 0.73 and 0.84 in immediate and delayed recall tests). At variance with expectations, however, the three forms of the test were not completely equivalent. Indeed, performance

accuracy on list A was slightly, but statistically significant, better than performance on lists B and C. We do not have a straightforward explanation of this finding. Although we controlled that the three lists were equated for frequency of occurrence of words, we did not control for other sources of variability that might interfere with the memory of words. In our opinion, however, discrepant accuracy in word list recall should not affect the reliability of the test for measuring performance changes. Indeed, to the extent that the difference in recall accuracy between the lists is systematic (and statistically measurable), the interpretation of an individual’s performance change should not be different from that which could be made if the lists were actually equivalent. The order of presentation did not influence recall accuracy. This finding is not obvious in that previous normative studies on free recall of parallel word lists reported a significant increase in accuracy across successive testing sessions [19]. This was interpreted as due to an improvement of encoding or retrieval strategies produced by familiarization with the test. The fact that we did not find such an effect renders interpretation of performance discrepancy between the lists more straightforward, because it is not contaminated by the order of presentation. Unlike previous studies, which only reported normative data relative to the equivalence of parallel forms of word lists in recall tests, we also suggest a practical method for statistically interpreting an individual’s performance discrepancy between lists. This consists of applying a formula devised by Payne and Jones [17] to estimate the abnormality of a test score difference. The critical value extracted by applying this formula is the difference between the Z-transformed scores on two lists, weighted for correlation of performance on the two lists. This ZD is then referred to a table of the area under the normal curve to determine whether it falls inside or outside the tolerance interval of the normative population. The lack of any effect of personal variables on the ZD values demonstrates that the performance discrepancy between lists did not vary as a function of age, education or gender. This further simplifies interpretation of this performance discrepancy parameter. There have been some limits in the sample recruitment for the present study. First, the lack of neuropsychological instruments (other that MMSE) to exclude dementia and of questionnaires investigating subjective memory and/or cognitive complaints and functional efficiency could have led to the inclusion of some subjects with mild cognitive impairment or very mild dementia. Second, recruited subjects did not undergo any scale for level of mood assessment. This could have resulted in the inclusion of some patients with minor depression. Finally, we did not include octogenarians in the normative sample, thus precluding the possibility of using this test for investigating memory

123

Neurol Sci

changes in an age group showing the peak of incidence of Alzheimer’s disease. In conclusion, here we provide normative data for interpreting modifications in the recall performance on parallel forms of a word list. The fact that our normative sample included healthy subjects representing a wide age range and the finding that the discrepancy parameter (ZD) did not vary as a function of age makes the word list test applicable to various clinical populations, including elderly individuals with initial cognitive decline and young individuals with cognitive sequelae of closed-head injury or who have undergone the surgical removal or structures in the mesial temporal lobes for the relief of pharmacologically intractable epileptic seizures. Acknowledgments This study was supported by the ‘Neurone’ Foundation for research in neuropsychobiology and clinical neurosciences, Rome, Italy.

References 1. Barigazzi R, Della Sala S, Laiacona M, Spinnler H, Valenti V (1987) Esplorazione testistica della memoria di prosa. Ricerche di psicologia 1:50–80 2. Bertinetto PM, Burani C, Laudanna A, Marconi L, Ratti D, Rolando C, Thornton AM (2005) Corpus e Lessico di Frequenza dell’Italiano Scritto (CoLFIS) http://linguistica.sns.it/CoLFIS/ Home.htm 3. Bertolani L, De Renzi E, Faglioni P (1993) Normative data on non-verbal memory test of clinical interest. Archivio di Psicologia Neurologia e Psichiatria 54(4):477–486 4. Bortolini U, Tagliavini C, Zampolli A (1971) Lessico di frequenza della lingua italiana contemporanea. Garzanti, Milano 5. Brandt J (1991) The Hopkins verbal learning test: development of a new memory test with six equivalent forms. Clin Neuropsychol 5:125–142 6. Caffarra P, Vezzadini G, Dieci F, Zonato F, Venneri A (2002) Rey–Osterrieth complex figure: normative values in an Italian population sample. Neurol Sci 22:443–447 7. Capitani E, Della Sala S, Laiacona M, Marchetti C, Spinnler H (1994) Standardizzazione ed uso clinico di un test di memoria di prosa. Bollettino di Psicologia Applicata 209:47–63

123

8. Carlesimo GA, Oscar-Berman M (1992) Memory deficits in Alzheimer’s patients: a comprehensive review. Neuropsychol Rev 3:119–169 9. Carlesimo GA, Caltagirone C, Gainotti G, The group for the standardization of the Mental Deterioration Battery (1996) The mental deterioration battery: normative data, diagnostic reliability and qualitative analyses of cognitive impairment. Eur Neurol 36:378–384 10. Carlesimo GA, Buccione I, Fadda L, Graceffa A, Mauri M, Lorusso S, Bevilacqua G, Caltagirone C (2002) Standardizzazione di due test di memoria per uso clinico. Breve racconto e figura di Rey. Nuova Rivista di Neurologia 12(1):1–13 11. Dc Delis, McKee R, Massman P, Kramer JH, Kaplan E, Gettman D (1991) Alternate forms of the California verbal learning test: development and reliability. Clin Neuropsychol 5:154–162 12. De Renzi E, Faglioni P, Ruggerini C (1977) Prove di memoria verbale di impiego clinico per la diagnosi di amnesia. Archivio di Psicologia, Neurologia e Psichiatria 38:303–318 13. Mauri M, Carlesimo GA, Graceffa AMS, Loasses A, Lorusso S, Sinforiani E, Bono G, Caltagirone C (1997) Standardizzazione di due nuovi test di memoria: apprendimento di liste di parole correlate e non correlate semanticamente. Archivio di Psicologia Neurologia e Psichiatria 58:621–645 14. Measso G, Cavarzeran F, Zappala` G, Lebowitz BD, Crook TH, Pirozzolo FJ, Amaducci LA, Massari D, Grigoletto F (1993) The mini-mental state examination: normative study of an Italian random sample. Dev Neuropsychol 9(2):77–95 15. Milner B (2005) The medial temporal-lobe amnesic syndrome. Psychiatr Clin N Am 28:599–611 16. Novelli G, Papagno C, Capitani E, Laiacona M, Cappa SF, Vallar G (1986) Tre test clinici di memoria verbale a lungo termine. Archivio di Psicologia Neurologia Psichiatria 47:278–296 17. Payne RW, Jones HG (1957) Statistics for the investigation of individual cases. J Clin Psychol 13:115–121 18. Saling MM (2009) Verbal memory in mesial temporal lobe epilepsy: beyond material specificity. Brain 132:570–582 19. Shapiro DM, Harrison DW (1990) Alternate forms of the AVLT: a procedure and test of form equivalency. Arch Clin Neuropsychol 5:405–410 20. Smirni D, Turriziani P, Oliveti M, Smirni P, Cipolotti L (2010) Standardizzazione di tre nuovi test di memoria di riconoscimento verbale e non verbale: uno studio preliminare. Giornale Italiano di Psicologia 1:227–248 21. Spinnler H, Tognoni G (1987) Standardizzazione e Taratura Italiana di Test Neuropsicologici. Italian J Neurol Sci 6(8):44–46

Normative data for measuring performance change on parallel forms of a 15-word list recall test.

Declarative memory evaluation is an essential step in the clinical and neuropsychological assessment of a variety of neurological disorders. It typica...
262KB Sizes 0 Downloads 0 Views