JSLHR

Research Article

Reliability and Validity of the Computerized Revised Token Test: Comparison of Reading and Listening Versions in Persons With and Without Aphasia Malcolm R. McNeil,a,b Sheila R. Pratt,a,b Neil Szuminsky,a,b Jee Eun Sung,a,b Tepanta R. D. Fossett,a,b Wiltrud Fassbinder,a,b and Kyoung Yuel Lima,b

Purpose: This study assessed the reliability and validity of intermodality associations and differences in persons with aphasia (PWA) and healthy controls (HC) on a computerized listening and 3 reading versions of the Revised Token Test (RTT; McNeil & Prescott, 1978). Method: Thirty PWA and 30 HC completed the test versions, including a complete replication. Reading versions varied according to stimulus presentation method: (a) fullsentence presentation, (b) self-paced word-by-word fullsentence construction, and (c) self-paced word-by-word presentation with each word removed with the onset of the next word. Participants also received tests of aphasia and reading severity.

Results: The listening version produced higher overall mean scores than each of the reading versions. Differences were small and within 1 standard error of measurement of each version. Overall score test–retest reliability among versions for PWA ranged from r = .89 to r = .97. Correlations between the listening and reading versions ranged from r = .79 to r = .85. All versions correlated highly with aphasia and reading severity. Correlations were generally low for the HC due to restricted variability. Factor analysis yielded a 2-factor solution for PWA and a single-factor for HC. Conclusions: Intermodality differences were small, and all 4 versions were reliable, concurrently valid, and sensitive to similar linguistic processing difficulties in PWA.

C

phonologic activation and integration across linguistic domains; as well as response selection, planning, and output processing. Furthermore, most of the psycholinguistic variables that affect listening comprehension—such as stimulus length, word frequency, semantic and syntactic complexity, short-term memory limitations, and attentional or resource demands—also affect reading comprehension (Daneman & Carpenter, 1980; Sachs, 1974). Additionally, supraordinate impairments of linguistic or cognitive functions might be expected to create equivalent impairments across modalities, as hypothesized by some aphasiologists (Darley, 1982; Marie, 1906; Schuell, Jenkins, & Jimenez-Pabon, 1964). Nonetheless, well-established differences between linguistic information processed via hearing as opposed to vision suggest that clinically relevant modality differences also might be expected (Sandhu & Dyson, 2012). Although average speed of transmission from sensory receptor to primary visual (V1) cortex or primary auditory (A1; Heschl’s Gyrus) cortex is roughly equivalent at about 45–50 ms, there are differences between modalities depending on the specific stimulus characteristics (Howard et al., 2000; Van Rullen & Thorpe, 2001). These include fundamental differences in

omparisons between listening and reading performance in persons with aphasia (PWA) can be important for theoretical reasons and differential diagnosis, as well as for uncovering the underlying nature of the impairment and directing intervention. There is evidence supporting both significant and nonsignificant differences in performance between listening and reading comprehension. Nonsignificant differences may be expected because reading and listening comprehension share many cognitive processes and task demands. These demands include perceptual analysis and interpretation; semantic, lexical, syntactic, and

a

Geriatric Research Education and Clinical Center, VA Pittsburgh Healthcare System, PA b University of Pittsburgh, PA Correspondence to Malcolm R. McNeil: [email protected] Jee Eun Sung is now at Ewha Womans University, Seoul, South Korea. Tepanta R. D. Fossett is now at Indiana University, Bloomington. Kyoung Yuel Lim is now at University of Texas–El Paso. Editor: Rhea Paul Associate Editor: Jessica Richardson Received January 30, 2013 Revision received November 12, 2013 Accepted October 30, 2014 DOI: 10.1044/2015_JSLHR-L-13-0030

Disclosure: The authors have declared that no competing interests existed at the time of publication.

Journal of Speech, Language, and Hearing Research • Vol. 58 • 311–324 • April 2015 • Copyright © 2015 American Speech-Language-Hearing Association

311

the anatomy as well as modality-processing dominance, with (a) greater visual than auditory processing speed, even when equated for task difficulty (Ben-Artzi & Marks, 1995; Patching & Quinlan, 2002); (b) greater temporal acuity for audition and superior spatial acuity for vision (Bermant & Welch, 1976; Bertelson & Aschersleben, 2003; Welch & Warren, 1980); (c) greater stimulus duration or persistence for vision than audition, with auditory language stimuli typically being fleeting and visual being more permanent and available for reanalysis (Catts & Kamhi, 2005); (d) a weaker attentional alerting system (i.e., requiring more attentional resources) for the visual than the auditory modality (Posner, Nissen, & Klein, 1976; Sinnett, Spence, & Soto-Faraco, 2007); and (e) superiority of the visual modality under high attentional demand conditions (Posner et al., 1976). Additionally, known contextual effects, such as those found with positive and negative priming paradigms, can alter modality dominance in favor of either modality depending on the context in which the stimulus is delivered (Mulligan, 2011). Likewise, there are fundamental differences in the psycholinguistic architecture of the perceptual content processed while listening to speech versus reading print, even with the same lexical and semantic content. For example, reading may require grapheme-to-phoneme mapping, a process not required by the competent language listener. Furthermore, individual experience, native expertise, or subtle impairments could provide an advantage for one modality of linguistic processing over the other. The experimental evidence supporting or negating modality differences in PWA is mixed and sparse. Studies examining performance across listening and reading comprehension in PWA have identified impaired performance across both modalities (DeDe, 2012; Duffy & Ulrich, 1976; Gallaher & Canter, 1982). These studies also suggested that different mechanisms may be responsible for impaired performance and severity differences between listening and reading comprehension. Duffy and Ulrich (1976) found no difference between listening and reading comprehension in PWA with moderate to severe impairment. Gallaher and Canter (1982) found impaired performance in both modalities, with poorer reading than listening comprehension in persons classified as having Broca’s aphasia. Using the same tasks as Gallaher and Canter, Peach, Canter, and Gallaher (1988) found no significant differences across the two modalities or between “fluent” (anomic and conduction) subgroups. Odell (1983) found no significant differences between “fluent” and “non-fluent” aphasic subtypes in listening and reading but did find significantly better performance in listening than reading comprehension for both healthy controls (HC) and PWA using the Revised Token Test (RTT; McNeil & Prescott, 1978). Examining the effects of word frequency and modality on sentence comprehension in HC and PWA, DeDe (2012) reported significant modality differences relative to word frequency in the PWA but not in the HC group. No statistical differences were reported among the represented aphasic subtypes. Poeck and Hartje (1978) found no difference in the number of errors generated by modality of presentation on the German version of the Token Test.

312

Additionally, no differences were found by aphasia syndrome (e.g., amnestic, Broca, Wernicke, global). Current standardized tests for aphasia make the comparison between listening and reading quite opaque because they use different stimuli and task administration procedures, response modes, and scoring procedures. However, it is important to develop measures that control all of these relevant test factors to determine modality differences within and across individuals. Although the Discourse Comprehension Test (Brookshire & Nicholas, 1997) uses the same linguistic stimuli and scoring approach (true–false questions with the same response set) to assess both listening and reading comprehension, psychometric properties of the reading version (including appropriate reference data for pathological populations, concurrent or criterion validity, test–retest reliability) have not been established. Additionally, its direct comparability within the same pathological or normal sample has not been assessed. Other published aphasia tests assess performance in one modality (e.g., auditory) or communication function (e.g., listening comprehension) with stimuli, tasks, and/or scoring procedures differing from those used to elicit or evaluate behavior in another modality (e.g., vision) or communication function (e.g., reading). Even the Porch Index of Communicative Ability (PICA; Porch, 1981), known for its systematic utilization of the same stimuli across all 18 subtests, requires the comprehension of prepositional sentences for reading but simple imperative sentence for listening. The commands also vary in the nature of the verb and the sentence length. These, and other differences within and across aphasia tests, make modality and communication function comparisons difficult or impossible, despite the importance of these comparisons in defining and classifying aphasia and in making treatment decisions. One test that holds promise for making more direct comparisons across modalities is a computerized version of the RTT. Odell, Miller, and Lee (1996) first developed a computerized version of the RTT in order to increase control over the intensity, rate, and prosodic properties of the acoustic stimuli, as well as placement and distribution of visual stimuli used in the presentation and response. Some online scoring also was incorporated. In this pilot work, Odell et al. compared the performance of seven PWA on their auditory computerized RTT to that of the clinician administered and scored RTT. They found no significant differences in the overall and subtest scores between the two test administrations. McNeil et al. (2008) subsequently developed a computerized listening version (Computerized Revised Token Test–Listening [CRTT-L]) of the RTT. This version uses an updated computer language and provides the ability to manipulate stimulus timing, precisely measure participant response times, measure more accurately the multidimensionally scored responses, measure the response kinematics, and record and replay the entire exam (including responses) to allow for offline analysis. Three reading versions of the CRTT-L also were developed to explore hypothesized differential tasks demands between listening and reading. These differential demands include

Journal of Speech, Language, and Hearing Research • Vol. 58 • 311–324 • April 2015

such cognitive variables as working memory and online integration of information under varying amounts of available information (Caplan, Waters, DeDe, Michaud, & Reddy, 2007; DeDe, 2013; Kennedy & Murray, 1984; Sung et al., 2011). The goals of this development were to explore the conceptual and clinical differences and similarities in listening and reading with tasks that share considerable conceptual and methodological overlap and to eventually develop an assessment tool that allows for valid comparison between listening and reading comprehension performance. The current study evaluated the classic psychometric properties of the CRTT-L and the three experimental reading versions (Computerized Revised Token Test–Reading [CRTT-R]), including the test–retest reliability and concurrent and construct validity.1

Experimental Questions Assessed Using the methods and procedures described below, the following experimental questions were addressed. Questions 1 and 2 focused on construct validity: (1) Are there significant (p ≤ .05) differences between groups and among tests on the overall scores for the CRTT-L and the three CRTT-R versions as measured by a two-way repeated measure analysis of variance (ANOVA)? We predicted that the Computerized Revised Token Test–Reading–Full Sentence (CRTT-R-FS) would not differ significantly from the CRTT-L and that the other CRTT-R versions (Computerized Revised Token Test–Reading–Word Constant [CRTT-R-WC]; Computerized Revised Token Test–Reading–Word Fade [CRTT-R-WF], described below in Materials and Experimental Procedures) would differ significantly. (2) Which version of the CRTT-R is most highly associated with the CRTT-L in HC and PWA as evidenced by the strength of the correlation coefficients? We predicted that the CRTT-R-FS task would correlate most highly with the CRTT-L. Question 3 addressed concurrent validity: (3) Which CRTT test version correlates significantly (p ≤ .05) and most highly (r ≥ .70)2 with the PICA and the Reading Comprehension Battery for Aphasia–2 (RCBA-2; LaPointe & Horner, 1998)? We predicted that the CRTT-L would correlate most highly with the PICA, and the CRTT-R-FS would correlate most highly with the RCBA-2. Questions 4 and 5 addressed construct validity: (4) Do the CRTT-L and the three versions of the CRTT-R measure a common construct for both HC and PWA as 1

A detailed description of the listening and reading components of the CRTT computer program, including the timing and scoring parameters used for the following experiments, can be found in the online supplemental materials. The CRTT maintained the 15.00-point multidimensional scoring system of the RTT in which scores are generated based on the accuracy, completeness, promptness, and efficiency of the response. No measures of timing were used as dependent variables in this study. 2 The rationale for the correlation coefficient of ≥ .70 was derived from the decision that at least half of the variance should be shared in order to demonstrate a meaningful relationship between the variables for the purposes of this study.

evidenced from factor loadings determined from exploratory factor analysis? We predicted that there would be no difference in the derived factor structure between the HC and PWA groups. (5) Are there individuals with aphasia who exhibit significant differences between the listening and reading presentations of the CRTT as determined by the Revised Standardized Difference Test (RSDT; Crawford & Garthwaite, 2005)? We predicted that there would be a large number of PWA who showed significant idiosyncratic differences between all four versions of the CRTT. Question 6 addressed reliability and validity: (6) Are each of these results replicated on full replication retesting with the same participants, using the same experimental procedures? We predicted that all CRTT versions would demonstrate high correlations (r > .70) and nonsignificant differences between test and retest for each of the tests. All research questions used the overall and subtest mean-scores as the primary unit of analysis.

Method and Procedures This study received approval from the local institutional review board and the research integrity committees before the study was initiated. All participants provided oral and written informed consent prior to the administration of any study procedure.

Participants Sixty individuals (30 HC and 30 PWA) participated in the study. All participants met the following selection criteria, in that (a) they were native English speakers, (b) they had at least 8 years of formal education, (c) they received scores of 50% or better on the Picture Identification Task (Wilson & Antablin, 1980), (d) they had visual acuity of 20/40 or better (with correction if necessary) when screened with the reduced Snellen chart, and (e) their performance on the immediate/delayed story retelling subtest of the Assessment Battery of Communication in Dementia (Bayles & Tomoeda, 1993) yielded a ratio greater than .70 on the delayed recall compared with the immediate recall (delayed recall/immediate recall × 100). In addition, all participants completed the CRTT screening pretest, which was designed to help ensure knowledge of test vocabulary, visual and auditory sensitivity and acuity, and motor abilities sufficient for completing the experimental tasks. The participants completed the PICA to document the presence and severity of aphasia and the RCBA-2 to measure reading status. Both tests also were used to establish test concurrent or criterion validity. The PWA (17 men and 13 women) with left-hemisphere cerebrovascular accidents demonstrated language performance consistent with McNeil and Pratt’s (2001) definition of aphasia, as determined by the selection criteria and their performance on the PICA. No attempt was made to classify the PWA, because no questions were motivated by their subdivision into any prescribed groups and the majority of the previous studies comparing Token Test–like commands

McNeil et al.: Computerized Revised Token Test

313

have found no difference within or between groups in either modality. The HC reported no history of communication, neurological, or psychiatric disorder, and performance on the PICA was consistent with typically functioning adults. The PWA had a mean age of 63 years, range = 38– 90 years, SD = 13, and none were acute, with an average of 124 months post onset, range = 6–564 months, SD = 155. The mean PICA overall percentile score was 66.93, range = 26–89, SD = 16.80, and the mean of the RCBA-2 overall scores was 169.63 out of 190, range = 86–190, SD = 24.63. The HC (18 men and 12 women) had a mean age of 65 years, range = 38–83 years, SD = 12. The mean PICA overall percentile score based on the reference data for HC (Duffy & Keith, 1980) was 20.30, range = 2–95, SD = 20.42, and the mean of the RCBA-2 overall scores was 186.83 out of 190, range = 172–190, SD = 3.90. There was no significant difference in age, t(df = 58) = –.281, p = .78, or education, t(df = 58) = .238, p = .81, between the groups. As expected, the PWA performed significantly lower on the RCBA-2, t(df = 59) = –3.72, p < .000. For descriptive comparisons, PICA scores are expressed in percentiles, derived from two separate populations. These scores did not overlap between the groups and differences were not tested statistically. In the online supplemental materials, Supplemental Table 7 summarizes each participant’s overall PICA score, age, education, months-post-onset, and gender for the PWA. For the HC group, these same data are summarized in Supplemental Table 8, minus the months-post-onset. For descriptive purposes, individual PICA subtest scores for the PWA are presented in Supplemental Table 9.

the pretest was lexically equivalent to that for the CRTT-L, which was modeled directly from the RTT pretest. The exception to the equivalence was a change in some words in the subtest instructions in the reading versions, such as “listen to” to “read” and “objects” to “objects on the screen.” The actual test stimuli were, however, identical among all versions. The presentation order of the CRTT-L and the three reading versions was randomized across participants. For the CRTT-L, the stimuli were prerecorded using natural voice commands. The stimuli were presented at an average rate of 3.0–3.5 syllables per s, without exaggerated emphasis (rate, pitch, or intensity) on any element. Earlier research (Pratt et al., 2006) has verified that a presentation level of 75 dB SPL produces maximum audibility of the signal for these stimuli regardless of the hearing status. Therefore, all acoustic stimuli were presented at 75 dB SPL via computer speakers. Signal intensity was pre-established using a calibration noise matched to the average root-mean-square of the stimuli and measured with a sound-level meter at the pinna. The three reading versions were constructed to challenge components of the processing system that could be differentially impaired in PWA. These versions included: 1.

a full-sentence (CRTT-R-FS) presentation, whereby the entire sentence is displayed and remains on the screen until a response is produced. This version is most frequently encountered in nonexperimental reading situations.

2.

self-paced, word-by-word sentence assembly, with each successive word remaining on the screen (word constant; CRTT-R-WC) until a response is given. This version allows the measurement of word-by-word reading and requires the motor response for the display of each word but allows rereading of previously displayed words, as in Version 1, though without taxing short-term memory.

3.

self-paced word-by-word sentence assembly, with each previous word disappearing with the onset of the next word (word-fade; CRTT-R-WF). This version requires the same cognitive-motor demands and allows the same measurement opportunities as Version 2 but without the ability to reread previously presented stimuli, thus taxing short-term-memory. This version is more akin to the listening version.

Materials and Experimental Procedures The CRTT, like the original noncomputerized version of the RTT, was designed to minimize top-down language processing and to require full semantic understanding of each individual lexical item in the commands, with the linguistic contextual cues available only from word order. A full description of the CRTT tasks and scoring conventions is presented in the online supplemental material. The CRTT allows acoustic or visual stimuli to be presented via a computer, and the participants respond to each command by using either a touchscreen or a mouse with a computer monitor. For the current study, touchscreen access was used. Each lexical item was scored online with the 15.00-point multidimensional scale prescribed in the original noncomputerized version of the RTT (McNeil, Dionigi, Langlois, & Prescott, 1989; McNeil & Prescott, 1978). In the current study, all participants were administered the pretests of the CRTT-L and CRTT-R prior to the presentation of the experimental stimuli to ensure the participants’ knowledge of each critical lexical item; audibility of stimuli for the listening version and readability for the reading version; perceptual differentiation of the two shapes, five colors, and two sizes of the objects displayed on the computer monitor; as well as ability to make the required touch or touch-anddrag movements to accomplish the responses. Pretest results were not incorporated in test scores. The reading version of

314

The stimuli in each of the three reading versions were presented in Arial 36-point font within a textbox near the bottom of the 15-in. touchscreen. In the two self-paced versions (CRTT-R-WC and CRTT-R-WF), each word appeared successively in a left-to-right orientation with a single touch to the screen in the designated area in which the words were displayed. To avoid manual access response bias between groups, all participants responded with their nondominant (left) hand because a portion of the PWA would be expected to have right-arm paralysis or paresis. The first word appeared automatically following a 2 s delay after the response to the previous command. Each word,

Journal of Speech, Language, and Hearing Research • Vol. 58 • 311–324 • April 2015

sentence, subtest, and overall test was automatically scored according to the 15.00-point multidimensional scoring scale. All participants returned a second time per availability to complete the CRTT-L and the three CRTT-R versions to assess test–retest reliability. The experimental procedures were the same for the second administration as for the first, and the order of presenting all test versions also was randomized across the participants. The average duration between the first test and retest for the four test versions was 10 days (range = 1–56 days) for the HC group and 17 days (range = 0–112 days) for the PWA.

Results All statistical analyses used to examine the construct and concurrent validities of the four versions of the tasks were performed first using the initial test data and a second time using the retest data. This replication provided the necessary metrics of instrument reliability. A mixed-model two-way repeated-measures ANOVA, with groups as the between-subject variable and tests as the within-subject variable, was calculated using the overall scores for the CRTT-L and CRTT-R versions. All analyses were conducted at an alpha level of p ≤ .05. Relationships among the four conditions were examined by computing Pearson correlation coefficients and with an exploratory factor analysis to investigate the factor loadings for the four different test versions. Finally, individual differences in performance on the CRTT-L and CRTT-R conditions were examined in PWA using the RSDT. Concurrent validity was examined through the correlations and distributional differences among the four CRTT versions, the overall aphasia severity measure derived from the PICA, and reading ability as measured from the overall score of the RCBA-2. In order to maintain coherence among the findings, the results are reported by statistical procedure with reference to the relevant experimental questions and contrasts.

ANOVA Results The first question addressed significant differences in mean performance between the groups and among the four CRTT conditions. To determine whether these results would be fully replicated (Question 6), these analyses were repeated for the retest data. Means, standard deviations, and standard error of measurement data for both groups and each test condition for the initial and the retest, along with difference scores computed between the initial test and the retest scores, are presented for the retest data and are detailed in Table 1. There was a significant main effect for both the initial test: group, F(1, 227) = 1.65, p = .00009, and condition, F(3, 227) = 4.96, p = .002; and for the retest: group, F(1, 227) = 95.90, p = .00009, and condition, F(3, 227) = 6.55, p = .00009. The interaction between group and condition was not significant for either the test or retest comparisons. Examination of the mean scores for the groups across the conditions revealed that the HC scored significantly higher on all conditions than the PWA at both test and retest.

No HC (or PWA) participant performed at ceiling-level for any overall test or subtest. For the initial test, pairwise post-hoc comparisons using ANOVAs with Bonferroni corrections for multiple comparisons, tested at p = .008, across the conditions yielded a significantly higher score on the CRTT-L than all reading versions (ps < .001 for CRTT-R-FS, CRTT-R-WC, and CRTT-R-WF). However, there were no significant differences among the reading tests. For the retest, pairwise comparisons again yielded a significantly higher score (p < .001) on the CRTT-L than CRTT-R-FS and CRTT-R-WC, but not for CRTT-R-WF (p > .008). Additionally, the CRTT-R-WF condition produced significantly higher scores than the CRTT-R-FS (p = .004) and the CRTT-R-WC (p = .0009).

Correlations Overall Score Test–Retest Pearson correlations between test and retest performance for each CRTT condition were high for the PWA (CRTT-L, r = .96; CRTT-R-FS, r = .89; CRTT-R-WC, r = .94; CRTT-R-WF, r = .97). In contrast, the correlations were more modest for the HC, especially with the listening condition (CRTT-L, r = .43; CRTT-R-FS, r = .78; CRTT-R-WC, r = .74; CRTT-R-WF, r = .77). The distribution of scores for both administrations of the test is illustrated for both groups in a scatter in Supplemental Figure 2 and Supplemental Figure 3. It should be noted that the range of scores for the HCs was extremely restricted, whereas the distribution was considerably larger for the PWA. This restricted range of scores for the HC likely accounted for their lower coefficients. Question 2 addressed which version of the CRTT-R was most highly associated with the CRTT-L in HC and PWA as evidenced by the strength of the correlation coefficients. Pearson correlation coefficients were computed between the overall scores for the CRTT-L and those from the three reading versions. For the PWA, the correlations between the CRTT-L and all three reading versions were significant (p < .01) and were relatively high for both the initial test and for the retest, with the correlations ranging between .79 and .85 for the initial test and between .79 and .84 for the retest. The correlation coefficients were all high among the reading versions for both the test and retest. The overall average test scores for the HC participant group produced uniformly low and nonsignificant correlations, ranging from –.22 to .38, save for significant (p < .01) but modest correlations between the CRTT-R-WC and the CRTT-R-WF for both the test (r = .52) and the retest (r = .55). Limited within-group variability likely accounted for these nonsignificant correlations. The complete correlation matrices for both tests and groups using the overall average test scores are summarized in Table 2. Subtest Score Test–Retest To further address whether results would be replicated on the same participants using the same experimental procedures (Question 6), test–retest reliability also was assessed

McNeil et al.: Computerized Revised Token Test

315

Table 1. Means, standard deviations, and standard error of measurement for the initial test and the retest for each Computerized Revised Token Test (CRTT) version for both participant groups, as well as the mean change score. HC Session Initial test CRTT-L CRTT-R-FS CRTT-R-WC CRTT-R-WF Retest CRTT-L CRTT-R-FS CRTT-R-WC CRTT-R-WF

PWA

M

SD

SEM

14.59 14.01 13.80 14.14

0.34 0.40 0.56 0.45

0.21 0.21 0.26 0.21

14.74 14.01 13.97 14.33

0.17 0.50 0.44 0.40

0.21 0.21 0.26 0.21

M∆

+0.15 0.00 +0.17 +0.19

M

SD

SEM

13.06 12.68 12.65 12.76

1.49 1.08 1.20 1.39

0.30 0.35a 0.28 0.24

13.25 12.88 12.83 13.07

1.39 1.04 1.07 1.38

0.30 0.35 0.28 0.24

M∆

+0.19 +0.20 +0.19 +0.31

Note. HC = healthy controls; PWA = persons with aphasia; CRTT-L = Computerized Revised Token Test–Listening; CRTT-R-FS = Computerized Revised Token Test–Reading–Full Sentence; CRTT-R-WC = Computerized Revised Token Test–Reading–Word Constant; CRTT-R-WF = Computerized Revised Token Test–Reading–Word Fade. a

Value exceeds the 95% confidence interval around the mean.

These correlations were computed in order to assess concurrent validity of the CRTT-L and CRTT-R test versions as valid measures of listening and reading comprehension in PWA and are detailed in Table 5. For the HC, only the CRTT-L was significantly correlated (r = .39) with the PICA and only for the first test administration; again, this is attributable to the restricted range of the scores for these participants. In contrast, for the PWA the correlations for all conditions were uniformly moderately high and significant (p < .01) for the PICA, ranging from a high of .82 for the CRTT-L retest to a low of .72 for both the CRTT-R-FS test and retest. The correlations between the RCBA-2 and the four CRTT versions were each significant (p < .01) and higher than those for the PICA, ranging from .79 for the CRTT-R-WF to .89 for the CRTT-L. To ensure that the relationships among the four test versions measure language-specific processing, correlations were computed among the CRTT measures and the various PICA “modalities” or tasks grouped by cognitive or

at the subtest level for each test version for both groups. The resulting correlation coefficients were generally high across versions for most subtests for the PWA, ranging from .59 to .92 with average correlations ranging from .75 for Subtest I to .88 for Subtest III. Considerably more modest correlations were realized for the HC group, ranging from .00 to .77 with averaged correlations ranging from .33 for Subtest IX to .63 for Subtest VIII. The correlations for the CRTT-L were notably lower than the reading versions for the HC group. These data are summarized in Table 3. The standard error of measurement was similar across test versions and the overall standard error of measurement was slightly higher for PWA than HC, ranging from .24 to .35 for PWA and .21 to .26 for HC (see Table 4). Concurrent (Criterion) Validity Correlation coefficients were computed for both groups and all CRTT test conditions with the measure of overall severity of aphasia (PICA) and reading impairment (RCBA-2).

Table 2. HC and PWA group test and retest correlation coefficients among the four CRTT versions. CRTT version CRTT-L Session Initial test CRTT-L CRTT-R-FS CRTT-R-WC CRTT-R-WF Retest CRTT-L CRTT-R-FS CRTT-R-WC CRTT-R-WF

CRTT-R-FS

CRTT-R-WC

HC

PWA

HC

PWA

HC

PWA

HC

PWA





.24 —

.85** —

.27 –.06 —

.84** .84** —

.31 –.11 .52** —

.79** .88** .88** —





.38 —

.84** —

–.18 –.22 —

.79** .77** —

.15 .05 .55** —

.79** .84** .91** —

**p ≤ .01.

316

CRTT-R-WF

Journal of Speech, Language, and Hearing Research • Vol. 58 • 311–324 • April 2015

Table 3. Test–retest reliability by subtest for each CRTT version for each participant group. Subtest Group

CRTT Version

X

M (r)

Measure

I

II

III

IV

V

VI

VII

VIII

IX

M (r)

.58** .32 .55** .50* .49

.23 .23 .46** .47** .35

.25 .63** .62** .77** .57

.13 .62** .60** .56** .48

.31 .54** .73** .43* .50

.29 .36 .63** .54** .46

.06 .61** .73** .66** .52

.65** .66** .66** .54** .63

.31 .61** .24 .14 .33

.30 .58** .63** .00 .38

.31 .52 .59 .46 .47

M (r)

.90** .82** .67** .59** .75

.88** .77** .72** .82** .80

.88** .91** .82** .92** .88

.79** .89** .79** .85** .83

.84** .65** .88** .87** .81

.89** .82** .83** .89** .86

.82** .73** .87** .92** .84

.82** .80** .86** .92** .85

.85** .71** .87** .77** .80

.91** .65** .82** .88** .82

.86 .78 .81 .85 .83

HC CRTT-L CRTT-R-FS CRTT-R-WC CRTT-R-WF PWA CRTT-L CRTT-R-FS CRTT-R-WC CRTT-R-WF *p ≤ .05; **p ≤ .01.

communicative function, as is the standard practice of the PICA. We predicted that those specific tasks that require language processing, such as listening, reading, talking, pantomiming, and writing, would correlate significantly and highly with the CRTT versions and low and nonsignificantly with nonlinguistic tasks, such as picture and object visual matching and word copying. If confirmed, these predictions would serve as evidence that performance on all four versions of the CRTT is related to language-demanding tasks that are impaired in PWA and not a more general consequence of brain damage or unknown artifacts. As predicted and summarized in Table 6, the correlations for the language tasks were all significant (p < .05) and ranged from .45 to .85. Also as predicted, the correlations were all nonsignificant for the nonlanguage tasks and ranged from .00 to .10 for the visual matching tasks and from .21 to .31 for the copying tasks. The mean scores were either stable (CRTT-R-FS condition for the HC) or increased for the retest relative to the test across all conditions for both groups (see Table 1). Although these increased scores were statistically significant,

they were relatively small for both groups and did not exceed the standard error of measurement for any condition for either group.

Factor Analyses Question 4 sought to determine whether the CRTT-L and the three reading versions measure a common construct for both HC and PWA. Using the SPSS, Version 18.0 statistical package, exploratory factor analyses were conducted with overall performance from each CRTT version for each group and for the test and retest administrations. A principal component extraction procedure was applied and then rotated to final solution with a Varimax rotation, which is an orthogonal-correlated solution. Kaiser’s criterion was applied to select factors that had an Eigenvalue greater than one. Conventionally, a criterion of 0.3 (with less than 9% of the total variance explained) has been used as a cutoff factor loading to omit items (or tests) from considerations (Bryman & Cramer, 2005). However, considering the limited sample size in the current study, a more stringent

Table 4. Standard error of measurement for each subtest and overall score for the CRTT-L and each CRTT-R version for the PWA and HC groups. CRTT version CRTT-L

CRTT-R-FS

CRTT-R-WC

CRTT-R-WF

Subtest

HC

PWA

HC

PWA

HC

PWA

HC

PWA

I II III IV V VI VII VIII IX X Overall

0.19 0.22 0.35 0.58 0.41 0.55 0.56 0.36 0.26 0.33 0.21

0.42 0.53 0.50 0.68 0.65 0.60 0.76 0.77 0.55 0.49 0.30

0.37 0.33 0.33 0.50 0.54 0.62 0.47 0.42 0.50 0.51 0.21

0.37 0.48 0.43 0.50 0.64 0.46 0.63 0.56 0.72 0.74 0.35

0.40 0.46 0.41 0.53 0.36 0.46 0.34 0.49 0.64 0.44 0.26

0.28 0.40 0.49 0.64 0.42 0.52 0.48 0.52 0.50 0.59 0.28

0.40 0.45 0.34 0.42 0.58 0.42 0.48 0.46 0.60 0.67 0.21

0.49 0.39 0.48 0.73 0.57 0.50 0.48 0.50 0.69 0.54 0.24

McNeil et al.: Computerized Revised Token Test

317

Table 5. Correlation coefficients between CRTT versions and the overall scores from the Porch Index of Communicative Ability (PICA) and Reading Comprehension Battery for Aphasia–2 (RCBA-2) for the HC and PWA groups for both test and retest administrations. CRTT version CRTT-L Group HC PWA

CRTT-R-FS

CRTT-R-WC

CRTT-R-WF

Test (OA)

Test

Retest

Test

Retest

Test

Retest

Test

Retest

PICA RCBA-2 PICA RCBA-2

.39* .21 .81** .89**

.23 –.12 .82** .83**

.17 .18 .73** .83**

.12 .27 .72** .81**

–.29 .33 .77** .84**

–.14 .2 .75** .84**

–.06 –.03 .72** .79**

–.06 .11 .78** .82**

Note. OA = overall score. *p ≤ .05; **p ≤ .01.

criterion of 0.7 (with more than 49% of the variance explained) was used to select a shared factor. Initially, the overall mean scores for each group from each test version were independently entered into a principal component analysis (PCA) without Varimax rotation. The greatest percentage of the variance for the HC group was derived from the CRTT-L (44%), with successively less variance explained from the full-sentence (CRTT-R-FS, 31%), word-constant (CRTT-R-WC, 14%), and word-fade (CRTT-R-WF, 11%) reading conditions, respectively. With the first two conditions included, three-fourths of the variance was explained. With the third condition added, about 90% of the variance was explained, with the CRTT-R-WF condition adding only 11% of the total variance. The Eigenvalues were very similar between the initial test and the retest for all conditions. In contrast, the PCA results for the PWA revealed that 88% of the variance was accounted for by the CRTT-L, with small and differentially little variance accounted for by the three reading versions of the test (12% total). Test and retest results were nearly identical for each test version. The Eigenvalues and loadings (percent variance accounted) for both groups and each condition for the test and retest are summarized in Supplemental Table 10. When the Varimax rotation was employed, two factors emerged for the HC group (see Table 7). For the initial test, the word-constant (CRTT-R-WC) and word-fade (CRTT-R-WF) Table 6. Correlations between CRTT versions and PICA “modalities” for PWA.

reading conditions loaded on the first factor (.84 and .86, respectively) and the listening (CRTT-L) and full-sentence (CRTT-R-FS) reading conditions loaded on the second factor (.74 and .82, respectively). Results for the retest were consistent in producing two factors, with the same variables loading on each factor and with similar weightings. This analysis for the PWA, on the other hand, loaded on a single factor, with near identical weightings (range = .92–.95) and near identical weightings on the replication study for each test condition. The one-factor solution for the PWA accounted for 94% and 95% of the total variance for the test and retest respectively, indicating that all four test versions loaded on the same factor. The PCA with Varimax rotation (see Table 8) with the Kaiser normalization procedure for the two factors derived for the HC for both test administrations yielded a solution whereby Factor 1 (the CRTT-R-WC and CRTT-R-WF) accounted for the majority of loadings and did so for both the test (.93) and the retest (.84).

Comparisons of Individual Performance Question 5 sought to determine whether there were individual PWA who demonstrated significant differences Table 7. Principal component analysis with Varimax rotated components for CRTT version for each participant group and both test administrations. Group

PICA modalities Verbal Auditory Reading Visual Writing Copying Pantomime

CRTT version CRTT-L .62** .79** .85** .03 .60** .30 .49*

CRTT-R-FS .52** .67** .78** .00 .58** .23 .56**

CRTT-R-WC .46** .57** .79** .03 .75** .31 .62**

HC CRTT-R-WF .45** .64** .76** .10 .73** .21 .63**

Note. PICA modalities = averaged subtest scores from the PICA. *p ≤ .05; **p ≤ .01.

318

Factor 1

PWA Factor 2

Factor 1

CRTT version

Test

Retest

Test

Retest

Test

Retest

CRTT-L CRTT-R-FS CRTT-R-WC CRTT-R-WF

.44 –.21 .84 .86

.05 –.09 .86 .89

.74 .82 .03 .00

.83 .81 –.28 .21

.92 .95 .95 .94

.92 .92 .93 .95

Note. Bold numbers indicate the primary factor loadings for Factor 1 (CRTT-R-WC and CRTT-R-WF) and Factor 2 (CRTT-L and CRTT-R-FS).

Journal of Speech, Language, and Hearing Research • Vol. 58 • 311–324 • April 2015

Table 8. Principal component analysis with Varimax rotation and Kaiser normalization for CRTT versions for the HC group and both test administrations.

significantly better on the CRTT-R-WF than on the CRTT-L (D = –1.83).

Component Factor 1

Factor 2

Component

Test

Retest

Test

Retest

Factor 1 Factor 2

.93 –.37

.84 –.55

.37 .93

–.55 .84

Note. Bold numbers indicate the primary factor loadings for Factor 1 (CRTT-R-WC and CRTT-R-WF) and Factor 2 (CRTT-L and CRTT-R-FS).

in performance between the listening and reading comprehension versions of the test. Descriptive measure and nonparametric statistical measures were employed to answer this question. First, the score differences between the CRTT-L and the three CRTT-R conditions were calculated for each individual. Next, 2 SD of the differences was established as a criterion for the identification of PWA who performed either above or below that 2 SD value on the CRTT-L relative to each of the CRTT-R conditions. As summarized in Table 9, a total of four PWA showed greater differences between the modalities than those of the group on the basis of the 2 SD criterion on the first test for at least one of the 90 possible comparisons. Among those four individuals, two performed better on the CRTT-L than on the CRTT-R-FS (Participant 221, D = 1.80) and CRTT-R-WC (Participant 204, D = 1.65) on the initial test. On the retest, two participants (Participant 210 and Participant 211) performed better on the CRTT-L than on the CRTT-R-FS, and Participant 210 also performed better on the CRTT-L than on the CRTT-R-WC. The RSDT was used to further test the significance of the individual differences identified with the 2 SD criterion. The RSDT compares the difference of an individual’s performance on the two tasks X and Y against the differences between the tasks X and Y in the group. This test was developed specifically to determine the presence of neuropsychological dissociations, especially for tasks in which scores and standard deviations are different. Furthermore, the test is robust to departures from normality and was designed to test statistical differences based on the t-distribution rather than a normal distribution, treating the relatively small control group from the neuropsychological data as sample statistics rather than population parameters (Crawford & Garthwaite, 2005; Crawford & Howell, 1998). According to the RSDT results (summarized in Table 9) for the initial test, one participant performed significantly (p < .05) better on the CRTT-R-FS than on the CRTT-L (Participant 219, D = –2.14), and two participants performed significantly better on the CRTT-R-WF than on the CRTT-L (Participant 219, D = –2.05; Participant 207, D = –2.04). For the retest, one PWA (Participant 207, D = –1.85) performed significantly better on the CRTT-R-WC than on the CRTT-L, and this same participant performed

Discussion Listening and Reading Comparisons The first and second questions addressed the differences between the CRTT-L and each of its three reading versions and the similarities among them. As expected, the HC scored significantly higher on all CRTT versions than did the PWA. The listening version of the test (CRTT-L) produced significantly higher scores than each of the reading versions for both participant groups and on both test administrations.3 However, despite these significant betweentest differences, in most instances they were very close to 1 SEM for each comparison, for both participant groups, and for both the test and the retest. Considering the relatively small sample size of this study, it therefore seems most parsimonious to conclude that these observed effects represented clinically nonmeaningful differences between the listening and reading versions of the CRTT. This conclusion is supported by the correlation analysis, wherein each CRTT-R test correlated reliably and highly with the CRTT-L for the PWA. Additionally, the single-factor solution from the PCA, yielding extremely high and similar loadings on all four test versions for the PWA for both test administrations, is consistent with a single underlying impairment across the listening and reading versions with their dependence on another impaired factor. The low and nonsignificant correlation coefficients generated for the HC, across test versions and administrations, does not support a unified interpretation of the listening and reading versions for users without language impairment. Although no participant scored at ceiling, the fact that the HC group scored highly and within an extremely narrow range on all CRTT versions makes their correlations difficult to interpret. These findings, along with the low and nonsignificant correlations with the RCBA-2 and PICA criterion measures, suggest that none of the four CRTT test versions provides a suitable measure of language processing through listening and reading in older healthy individuals. This conclusion does not mean that their performance cannot serve as a valid comparison against which to judge impairment. It does, however, suggest that none of these CRTT versions provide an adequate measure of performance for typically functioning individuals for most of the purposes to which they might be used. The fact that the PCA derived a two-factor solution for this group suggests that the HC listeners/readers employed a different strategy or set of cognitive mechanisms to the tasks than did the PWA. For the HC, the two reading conditions 3 Additional clarification of differences among test versions is summarized in Supplemental Table 11 where means, standard error of the mean, and lower and upper boundaries of the 95% confidence intervals for each CRTT version, for both participant groups, and for test and retest are presented.

McNeil et al.: Computerized Revised Token Test

319

Table 9. Summary of the individual participants with substantive differences between the CRTT and the CRTT-R versions based on the 2 SD criterion and the Revised Standardized Difference Test (RSDT) for PWA. Test Session Criterion

Comparison

Test

Retest

2 SD CRTT-L > CRTT-R-FS

221 (∆ = 1.80)

CRTT-L > CRTT-R-WC

204 (∆ = 1.65)

210 (∆ = 1.55) 211 (∆ = 1.88) 210 (∆ = 1.91)

RSDT CRTT-L > CRTT-R-WF CRTT-L < CRTT-R-FS CRTT-L < CRTT-R-WC CRTT-L < CRTT-R-WF

219 (∆ = –2.14*) 207 (∆ = –2.04*) 219 (∆ = –2.05*)

207 (∆ = –1.85*) 207 (∆ = –1.83*)

Note. Data are listed by participant number and all were PWA. The ∆ indicates the difference between the CRTT-L and CRTT-R condition (i.e., ∆ = CRTT-L – CRTT-R). *p ≤ .05 based on the RSDT.

in which the sentences are built one word at a time loaded as the first factor. Considering the linguistic stimuli and the response requirements were the same across all four CRTT versions, we speculated that the act of building the sentences by clicking on the screen, one word at a time, and reading the sentence in a fragmented word-by-word manner created a task demand that was not required when the sentence was presented acoustically or in the visual full-sentence display. Indeed, the acoustic presentation (CRTT-L) and the fullsentence (CRTT-R-FS) display are both well practiced and are the forms in which language is most naturally or automatically processed; that is, for these stimulus displays, the motor requirements for the sequential and individual display of each word required in the other two word-by-word reading tests were not present in these unpracticed and “unnatural” reading formats for the HC readers. The fact that the listening and the full-sentence reading versions loaded on the same factor seems to argue against a short-term or working memory demand difference for the two test formats; that is, there are more storage and rehearsal demands on the listening version of the test compared with the fullsentence reading version because the stimuli in the CRTT-L is fleeting and in the CRTT-R-FS remains on the screen until the participant supplies a response. Yet these two versions share variance, whereas the two self-paced, word-byword reading tests shared more of the variance. This interpretation for the HC group raises the question as to why a single-factor solution was found for the PWA. Task or biographical differences that are present in normal language comprehension, such as age, gender, handedness, and the like, are not always evident in PWA (Brennan, Georgeadis, Baron, & Barker, 2004; Jones, Pierce, Mahoney, & Smeach, 2007). The findings may be attributed to changes in processing strategies, system adaptations, or fundamental changes in the manner in which information is retrieved and assembled for representation and integration. For example, we have found recently (unpublished data) that the selfpaced reading times for typical language users were additive

320

across the two clauses of the compound sentences for the CRTT-R-WF condition; that is, the reading time for the second noun in sentence-final position was significantly longer than for the first noun in the HC. However, the reading times for the two nouns were not significantly different for the PWA, suggesting that the two clauses are processed independently as separate sentences for these impaired individuals. Caplan and colleagues (2007) reported a similar end-of-sentence finding. Their study used both auditory selfpaced listening and auditory computer-paced whole-sentence presentation. On the self-paced condition, PWA showed longer listening times than on the whole-sentence condition, whereas these times did not differ for HC. Conversely, PWA were faster in the time it took to read the last word and respond in the self-paced condition than on response times for the whole-sentence condition, whereas, again, these times did not differ for HC. Taken together, these results suggest that, at least under self-paced conditions, PWA process the relevant information while reading the sentence, whereas HC accomplish some of the processing at the end of the sentence. Whether this effect represents a strategy or an effect of processing limitations awaits further experimentation. The second question asked which reading version functioned most like the listening version of the test. This question was evaluated first with the correlation coefficients between the CRTT-L and each of the CRTT-R versions. Although the full-sentence (CRTT-R-FS) reading version correlated more highly with the listening version than both of the other two reading versions, the correlations all were moderately high and the differences between them were nonsubstantive. Further, the fact that all CRTT-R versions correlated significantly and moderately highly with each other lends support to the conclusion that any one of the three reading tests can serve as a reading measure that yields results quite comparable to the listening version for PWA. The low and nonsignificant correlations among these measures for the HC leave their comparability more ambiguous.

Journal of Speech, Language, and Hearing Research • Vol. 58 • 311–324 • April 2015

Concurrent (Criterion) Validity As summarized in Table 5, only one of the correlations met either the significance (p ≤ .05) or magnitude (r = .70 or higher) criterion for the HC. Conversely, all correlations for all four CRTT versions, for both the test and the retest, met the criteria for the overall severity measure of aphasia (PICA) and for the reading measure (RCBA-2) for the PWA. As expected, the reading versions correlated marginally higher with the RCBA-2 than with the PICA, which measures language and communicative functions across all modalities, including pantomime, writing, and talking. However, as discussed above, differences among the four CRTT versions were marginal and are interpreted to be inconsequential. Therefore, no particular CRTT version was correlated substantively more highly with the criterion measures. However, all tests provided an adequate degree of concurrent validity for all versions of the test for the PWA.

Construct Validity The construct validity of RTT has been presented previously (McNeil & Prescott, 1978). Because the CRTT incorporated all aspects of the RTT content and administration procedures and most of the scoring conventions and rules into its computerized versions, the theoretical constructs and justification remain essentially unchanged from the original. One additional way to assess the construct of the test versions is to determine whether the listening version and each of the reading versions load on a single factor or whether they assess fundamentally different constructs inferred by two or more factor loadings. The results of the PCA for the PWA unambiguously loaded all four variables on a single factor, and it did so with near perfect fidelity on the replication. These findings suggest a common factor underlying PWA listening and reading comprehension performance on the particular tasks, stimuli, and response requirements used in this study. These results are consistent with a unidimensional construct for aphasia that dates back to Marie (1906), experimentally verified by the factor analytic study of Schuell et al. (1964), advocated by Darley (1982) and McNeil (1982), and more recently supported by the findings of Caplan and colleagues (2007). The results of the current study, as well as others (Duffy & Ulrich, 1976; Peach et al., 1988; but see DeDe, 2012, for contrasting results), call for caution by clinicians and researchers who find performance differences across auditory and visual modalities on disparate tasks or on standardized aphasia batteries as evidence that one modality is “stronger,” “less impaired,” or “more intact” than another within an individual. Without assessing these language functions on the same stimuli and administration procedures, while using the same scoring procedures and under the same overall cognitive load and task demands, it is difficult or impossible to ascribe differences either to modality effects or impairments in the psycholinguistic operations that uniquely subtend these linguistic processing operations. Although reading

is a later-learned, perhaps less automatic language skill involving cognitive structures and operations that are not used in listening, and generally believed to be more difficult than listening for most persons (including PWA), this idea has yet to be demonstrated convincingly. It might be that the presence of aphasia—because of a core, supramodal impairment that is shared by both listening and reading operations—will generally equate impairments across modalities and spare none. Whether these functions might be differentially strong or available in HC is not answered by the results of this study.

Test–Retest Reliability Overall Test Scores The time between test and retest was determined by participant availability. In spite of the variability within group in the time required to complete the protocol and the time required for its second administration, there was very high reliability for the groups on many CRTT-L and CRTT-R measures at the level of the overall test scores. It is evident from the overall scores for each test version that there was very little difference in mean scores between the test and the retest. The standard error of measurement for the PWA was relatively small for all four test versions. Additionally, the standard error of measurement for each version for both test administrations was within the 95% confidence interval around the mean, save for one contrast, suggesting that the differences that exist do not reach the .05 level of confidence. It is noteworthy that in all cases where the overall scores changed, the change in performance between the two test administrations was higher in the second administration. Although it is not likely that the improvement is attributable to a language processing improvement per se, it is likely that a portion of this improvement is attributable to an increase in overall familiarity or task practice. Factors that might account for this effect include familiarity with the computer display, response selection procedures (use of the touchscreen), or general test environment factors. Perhaps greater practice with the various versions would have reduced these score improvements. Nonetheless, the levels of measurement error were relatively small and indicated a stable measurement tool for both groups. The significant and nonsignificant overall mean differences and interactions found between groups and among conditions for the initial test also were found on the retest. The HC scored significantly higher than the PWA on both test administrations for all conditions. The CRTT-L yielded significantly higher scores than the three reading versions for both groups on both test administrations, and the CRTT-R-WF scored significantly higher than the CRTT-R-WC on both the test and retest. The single significant change from test to retest occurred for the CRTT-R-FS, which was significantly lower than the CRTT-R-WF on the retest. The correlations for the PWA among the four test versions were similar for the test and the retest. The correlations for the HC among the four test versions were low and not significantly different from .0 for the test and the retest, save for the correlation between the CRTT-R-WF

McNeil et al.: Computerized Revised Token Test

321

and CRTT-R-WC on the initial test. This finding is undoubtedly due to lack of a distribution of scores for the HC group. The test–retest consistency of the correlations between the CRTT-L and three CRTT-R versions and the two aphasia tests against which they were compared (in order to establish concurrent or criterion-related validity) were remarkably similar for the PWA. However, as with the other measures for the HC, the correlation coefficients were low and nonsignificant among the CRTT versions and the PICA and RCBA-2 for both test and retest. Reliability of the PCA revealed the same hierarchy of factor loadings for all four test versions and with nearly exact variance accounted for between the test and retest for both participant groups on each measure. With Varimaxrotated components, the same two-factor solution emerged for the HC with similar loadings for each of the two factors for both test administrations. Likewise, the single-factor solution for the PWA produced nearly identical values across all test versions for the test and retest. When these Varimax rotated components were normalized for the HC, the retest first factor was reduced modestly and the second factor was increased modestly. Neither of these increases was judged to represent a substantive change. Subtest-Level Scores At the level of the subtest scores, the reliability mirrored that of the overall scores. The nonsignificant differences between the test and retest for each of the subtests for each of the four test conditions confirmed that the magnitude of differences were small, which adds to the overall conclusion that the CRTT-L and each of the CRTT-R tests are reliable. The standard error of measurement for each subtest within each CRTT version was relatively small and consistent with those of the RTT (Park, McNeil, & Tompkins, 2000). Individual Participants Although six PWA demonstrated a difference between the CRTT-L and one or more of the CRTT-R versions that exceeded 2 SD from the group on one of the two test administrations, the direction of this effect was not stable on the retest. From these analyses, it is apparent that individual patient differences realized between administrations of the CRTT-L or any of the reading versions are not likely to be replicated. In other words, modality differences are ephemeral, at least on these test versions. Although this finding argues for the comparability of listening and reading performance on these tasks, it also raises the question as to the source of the performance differences when they are observed; that is, using the RSDT as the criterion, the participant who did demonstrate the same difference patterns of performance on both the test and retest performed significantly more poorly on the CRTT-L than on the CRTT-R-WF. The comparability of the two versions in their demand for the test stimuli that are temporarily available for processing motivates the hypothesis that this short-term memory component of processing was not the underlying factor that caused the difference. The fact that this individual’s

322

performance was not different between the CRTT-L and the CRTT-R-FS conditions leads to the hypothesis that the difference was not due to a fundamental auditory processing impairment. The fact that the computer access (eliciting each word at a time on the screen) demands are greater for the reading versions than the CRTT-L is not consistent with a response planning competition or impairment.

Conclusions The results of this study suggested that the CRTT-L and the three versions of the CRTT-R are highly related and likely reflect similar linguistic processing difficulties in PWA. The high correlations among the CRTT-L and CRTT-R versions and the PICA and RCBA-2 for the PWA also indicated that these conditions result in similar levels of language-processing difficulties regardless of modality and language function. Furthermore, the validity of the four conditions was supported for PWA but largely unsupported for the HC. The study failed to find relationships among the four conditions for the HC because their data were limited by a narrow distribution of scores across participants. From all of these comparisons, it is apparent that for the HC group, the CRTT-L and the CRTT-R versions of the test have failed to establish a minimal level of support for its concurrent validity or its reliability (as indexed by correlations). The CRTT-L and CRTT-R versions therefore will have limited use as a dependent measure for this population beyond its role in establishing cutoff scores against which pathological performance can be determined. Nonetheless, the CRTT-L and CRTT-R versions did distinguish between the groups, and they have demonstrated adequate concurrent validity and test–retest reliability to support their use with PWA. The findings of this study are most consistent with a supraordinate view of aphasic listening and reading impairment as measured by these tasks. It is important to note that no attempt was made to select pathological participants based on differential impairment profiles because no hypotheses or contrasts were motivated by those hypothesized groupings. The paucity of unique individual participant profiles suggests that such profiles did not occur in this sample regardless of any classification to which they might have belonged. Additionally, all versions of the CRTT have a limited diversity of vocabulary and a narrow range of syntactic structure, sentence lengths, and other cognitive demands that it assesses. Although this is an important strength of the test for managing task demands and parametrically manipulating this narrow range of psycholinguistic variables, it does not provide an adequate test of the aphasia unidimensionality hypothesis.

Acknowledgments The previously published test on which this research is based is derived from McNeil and Prescott (1978), and those data are based on work supported in part by the VA Hospital, Denver, CO. Additional support for the work reported here are based in part on

Journal of Speech, Language, and Hearing Research • Vol. 58 • 311–324 • April 2015

funding supplied by the Department of Veterans Affairs, Veterans Health Administration, Office of Research and Development, Rehabilitation Research and Development Service (award #C47074X to Malcolm R. McNeil and award #C3118R to Patrick J. Doyle and Sheila R. Pratt), and resources and facilities provided by the Geriatric Research Education and Clinical Center in the Veterans Affairs Pittsburgh Healthcare System, PA. The contents of this article do not represent the views of the Department of Veterans Affairs or the U.S. Government.

References Bayles, K., & Tomoeda, C. (1993). Arizona Battery for Communication Disorders of Dementia. Tucson, AZ: Canyonlands. Ben-Artzi, E., & Marks, L. E. (1995). Visual–auditory interaction in speeded classification: Role of stimulus difference. Perception & Psychophysics, 57(8), 1151–1162. Bermant, R. I., & Welch, R. B. (1976). The effect of degree of visual–auditory stimulus separation and eye position upon the spatial interaction of vision and audition. Perceptual and Motor Skills, 43, 487–493. Bertelson, P., & Aschersleben, G. (2003). Temporal ventriloquism: Cross-modal interaction on the time dimension: 1. Evidence from auditory–visual temporal order judgment. International Journal of Psychophysiology, 50(1–2), 147–155. Brennan, D. M., Georgeadis, A. C., Baron, C. R., & Barker, L. M. (2004). The effect of videoconference-based telerehabilitation on story retelling performance by brain-injured subjects and its implication for remote speech-language therapy. Telemedicine Journal and e-Health, 10(2), 147–154. Brookshire, R. H., & Nicholas, L. E. (1997). Discourse Comprehension Test. Minneapolis, MN: BRK Publishers. Bryman, A., & Cramer, D. (2005). Quantitative data analysis with SPSS12 and 13: A guide for social scientists. East Sussex, United Kingdom: Routledge. Caplan, D., Waters, G., DeDe, G., Michaud, J., & Reddy, A. (2007). A study of syntactic processing in aphasia I: Behavioral (psycholinguistic) aspects. Brain and Language, 101(2), 103–150. Catts, H. W., & Kamhi, A. G. (2005). Language and reading disabilities (2nd ed.). Boston, MA: Pearson. Crawford, J. R., & Garthwaite, P. H. (2005). Testing for suspected impairments and dissociations in single-case studies in neuropsychology: Evaluation of alternatives using Monte Carlo simulations and revised tests for dissociations. Neuropsychology, 19(3), 318–331. Crawford, J. R., & Howell, D. C. (1998). Comparing an individual’s test score against norms derived from small samples. The Clinical Neuropsychologist, 12, 482–486. Daneman, M., & Carpenter, P. A. (1980). Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior, 19, 450–466. Darley, F. L. (1982). Aphasia. Philadelphia, PA: W. B. Saunders. DeDe, G. (2012). Effects of word frequency and modality on sentence comprehension impairments in people with aphasia. American Journal of Speech-Language Pathology, 21, S103–S114. DeDe, G. (2013). Effects of verb bias and syntactic ambiguity on reading in people with aphasia. Aphasiology, 27(12), 1408–1425. Duffy, J. R., & Keith, R. C. (1980). Contemporary studies: Performance of non-brain injured adults on the PICA: Descriptive data and a comparison to patients with aphasia. AphasiaApraxia-Agnosia, 2, 1–30. Duffy, R. J., & Ulrich, S. R. (1976). A comparison of impairments in verbal comprehension, speech, reading, and writing

in adult aphasics. Journal of Speech and Hearing Disorders, 41, 110–119. Gallaher, A. J., & Canter, G. J. (1982). Reading and listening comprehension in Broca’s aphasia: Lexical versus syntactical errors. Brain and Language, 17, 183–192. Howard, M. A., Volkov, L. O., Mirsky, R., Garell, P. C., Noh, M. D., Granner, M., . . . Brugge, J. F. (2000). Auditory cortex on the human posterior superior temporal gyrus. The Journal of Comparative Neurology, 416, 79–92. Jones, D. K., Pierce, R. S., Mahoney, M., & Smeach, K. (2007). Effect of familiar content on paragraph comprehension in aphasia. Aphasiology, 21(12), 1218–1229. Kennedy, A., & Murray, W. S. (1984). Inspection times for words in syntactically ambiguous sentences under three presentation conditions. Journal of Experimental Psychology: Human Perception and Performance, 10(6), 833–849. LaPointe, L. L., & Horner, J. (1998). Reading Comprehension Battery for Aphasia–Second Edition. Austin, TX: PRO-ED. Marie, P. (1906). Revision de la question de l’aphaise: La Troisieme circonvolution frontale gauche ne joue aucun role special dans la function du language [Revision of the question of aphasia: The third left frontal convolution doesn’t play any special role in the function of language]. Seminars in Medicine, 26, 241–247. McNeil, M. R. (1982). The nature of aphasia in adults. In N. J. Lass, L. McReynolds, F. Northern, & D. Yoder (Eds.), Speech, Language and Hearing (Volume II, pp. 692–740). Philadelphia, PA: W. B. Saunders. McNeil, M. R., Dionigi, C. M., Langlois, A., & Prescott, T. E. (1989). A measure of revised token test ordinality and intervality. Aphasiology, 3, 31–40. McNeil, M. R., & Pratt, S. R. (2001). Defining aphasia: Some theoretical and clinical implications of operating from a formal definition. Aphasiology, 15, 901–911. McNeil, M. R., & Prescott, T. E. (1978). The Revised Token Test. Austin, TX: PRO ED. McNeil, M. R., Sung, J. E., Pratt, S. R., Szuminsky, N., Kim, A., Ventura, M., . . . Musson, N. (2008, May). Concurrent validation of the Computerized Revised Token Test (CRTT) and three experimental reading CRTT-R versions in normal elderly individuals and persons with aphasia. Paper presented at the Clinical Aphasiology Conference, Teton Village, WY. Mulligan, N. W. (2011). The effect of generation on long-term repetition priming in auditory and visual perceptual identification. Acta Psychologica, 137, 18–23. Odell, K. H. (1983). Comparisons between auditory and reading comprehension in aphasic adults (Unpublished master’s thesis). Madison: University of Wisconsin–Madison. Odell, K. H., Miller, S. B., & Lee, C. (1996). A comparison of aphasic performance on the standard and experimental computerized version of the Revised Token Test. Clinical Aphasiology, 24, 145–157. Park, G. H., McNeil, M. R., & Tompkins, C. A. (2000). Reliability of the Five-Item Revised Token Test for individuals with aphasia. Aphasiology, 14, 527–535. Patching, G. R., & Quinlan, P. T. (2002). Garner and congruency effects in the speeded classification of bimodal signals. Journal of Experimental Psychology: Human Perception and Performance, 28, 755–775. Peach, R. K., Canter, G. J., & Gallaher, A. J. (1988). Comprehension of sentence structure in anomic and conduction aphasia. Brain and Language, 35, 119–137. Poeck, K., & Hartje, W. (1978). Performance of aphasic patients in visual versus auditory presentation of the Token Test:

McNeil et al.: Computerized Revised Token Test

323

Demonstration of a supramodal deficit. In F. Boller & M. Dennis (Eds.), Auditory comprehension: Clinical and experimental studies with the Token Test (pp. 107–113). New York, NY: Academic Press. Porch, B. E. (1981). The Porch Index of Communicative Ability. Palo Alto, CA: Consulting Psychologists Press. Posner, M. I., Nissen, M. J., & Klein, R. M. (1976). Visual dominance: An information processing account of its origins and significance. Psychological Review, 83, 157–171. Pratt, S. R., Eberwein, C., McNeil, M. R., Ortmann, A., Roxberg, J., Fossett, T. R. D., & Doyle, P. J. (2006, June). The Computerized Revised Token Test: Assessing the impact of age and sound intensity. Paper presented at the International Aphasia Rehabilitation Conference, Sheffield, United Kingdom. Sachs, J. S. (1974). Memory in reading and listening to discourse. Memory & Cognition, 2(1A), 95–100. Sandhu, R., & Dyson, J. B. (2012). Re-evaluating visual and auditory dominance through modality switching costs and congruency analyses. Acta Psychologica, 140, 111–118.

324

Schuell, H., Jenkins, J. J., & Jimenez-Pabon, E. (1964). Aphasia in adults: Diagnosis, prognosis, and treatment. New York, NY: Hoeber Medical Division, Harper & Row. Sinnett, S., Spence, C., & Soto-Faraco, S. (2007). Visual dominance and attention: The Colavita effect revisited. Perception & Psychophysics, 69(5), 673–686. Sung, E. J., McNeil, M. R., Pratt, S. R., Dickey, M. W., Fassbinder, W., Szuminsky, N. J., & Doyle, P. J. (2011). Real-time processing in reading sentence comprehension for normal adult individuals and persons with aphasia. Aphasiology, 25, 57–70. Van Rullen, R., & Thorpe, S. J. (2001). Rate coding versus temporal order coding: What the retinal ganglion cells tell the visual cortex. Neural Computation, 13, 1255–1283. Welch, R. B., & Warren, D. H. (1980). Immediate perceptual response to inter-sensory discrepancy. Psychological Bulletin, 88, 638–667. Wilson, R. H., & Antablin, J. K. (1980). A picture identification task as an estimate of word recognition performance in nonverbal adults. Journal of Speech and Hearing Disorders, 45, 223–238.

Journal of Speech, Language, and Hearing Research • Vol. 58 • 311–324 • April 2015

Copyright of Journal of Speech, Language & Hearing Research is the property of American Speech-Language-Hearing Association and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

Reliability and validity of the computerized Revised Token Test: comparison of reading and listening versions in persons with and without aphasia.

This study assessed the reliability and validity of intermodality associations and differences in persons with aphasia (PWA) and healthy controls (HC)...
218KB Sizes 0 Downloads 4 Views