This article was downloaded by: [Dicle University] On: 06 November 2014, At: 01:49 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

The Clinical Neuropsychologist Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/ntcn20

Finger Tapping Test Performance as a Measure of Performance Validity a

b

Bradley N. Axelrod , John E. Meyers & Jeremy J. Davis

c

a

Psychology Section, John D. Dingell DVAMC, Detroit, MI, USA

b

Meyers Neuropsychological Services, Mililani, HI, USA

c

University of Utah School of Medicine, Salt Lake City, UT, USA Published online: 16 Apr 2014.

To cite this article: Bradley N. Axelrod, John E. Meyers & Jeremy J. Davis (2014) Finger Tapping Test Performance as a Measure of Performance Validity, The Clinical Neuropsychologist, 28:5, 876-888, DOI: 10.1080/13854046.2014.907583 To link to this article: http://dx.doi.org/10.1080/13854046.2014.907583

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/termsand-conditions

The Clinical Neuropsychologist, 2014 Vol. 28, No. 5, 876–888, http://dx.doi.org/10.1080/13854046.2014.907583

Finger Tapping Test Performance as a Measure of Performance Validity Bradley N. Axelrod1, John E. Meyers2, and Jeremy J. Davis3 1

Psychology Section, John D. Dingell DVAMC, Detroit, MI, USA Meyers Neuropsychological Services, Mililani, HI, USA 3 University of Utah School of Medicine, Salt Lake City, UT, USA

Downloaded by [Dicle University] at 01:49 06 November 2014

2

The Finger Tapping Test (FTT) has been presented as an embedded measure of performance validity in most standard neuropsychological evaluations. The present study evaluated the utility of three different scoring systems intended to detect invalid performance based on FTT. The scoring systems were evaluated in neuropsychology cases from clinical and independent practices, in which credible performance was determined based on passing all performance validity measures or failing two or more validity indices. Each FTT scoring method presented with specificity rates at approximately 90% and sensitivity of slightly more than 40%. When suboptimal performance was based on the failure of any of the three scoring methods, specificity was unchanged and sensitivity improved to 50%. The results are discussed in terms of the utility of combining multiple scoring measures for the same test as well as benefits of embedded measures administered over the duration of the evaluation. Keywords: Performance validity; Motor testing; Effort testing.

INTRODUCTION The standard of practice within clinical neuropsychology has recently come to regularly include measures of effort and motivation (Bush et al., 2005; Heilbronner et al., 2009). Referred to as performance validity measures (PVMs) when describing tasks evaluating validity of performance on measures of neuropsychological abilities (Larrabee, 2012), PVMs differ from the scales used to establish the validity of self-report symptoms on measures such as the Minnesota Multiphasic Personality Inventory-2 (MMPI-2; Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989) or the Postconcussive Symptom Questionnaire (PCSQ; Lees-Haley, 1992; Van Dyke, Millis, Axelrod & Hanks, 2013), referred to as Symptom Validity Tests (SVTs; Larrabee, 2012). Using factor-analytic analyses, Van Dyke and colleagues (2013) found that PVMs loaded on a separate factor from SVTs and actual measures of neuropsychological ability. As for the importance of assessing performance validity, Meyers, Volbrecht, Axelrod, and Reinsch-Boothby (2011) found that half of the variance in cognitive performance for patient samples was due to failure on PVMs. Davis, McHugh, Axelrod, and Hanks (2012) also found that those who failed more PVMs also presented as deficient on more neuropsychological tests.

Address correspondence to: Bradley N. Axelrod, Psychology Section, John D. Dingell DVAMC, 4646 John R, Detroit, Michigan 48201-1916, USA. E-mail: [email protected] (Received 18 December 2013; accepted 18 March 2014)

© 2014 Taylor & Francis

Downloaded by [Dicle University] at 01:49 06 November 2014

FINGER TAPPING AS A MEASURE OF PERFORMANCE VALIDITY

877

Clinicians evaluate a number of features of examinee behavior to assess performance validity (cf., Boone, 2007; Larrabee, 2007) including qualitative responses seen rarely in clinical practice and inconsistencies observed across measures within and between evaluations. In addition to observational qualitative approaches, most clinical research has focused on quantitative performance. Freestanding PVMs offer clear criteria of invalid performance based on how clinical patients might perform in comparison to those asked to fabricate brain injury or individuals who were otherwise found to demonstrate compromised effort. Common freestanding measures include a variety of forced choice recognition memory tasks of drawings, words, photos, and numerals (e.g., Test of Memory Malingering, Word Memory Test, Warrington Recognition Memory Test, Victoria Symptom Validity Test). These tasks are often presented as memory measure analogs, capitalizing on the simplicity of a task that can appear difficult. Aside from freestanding performance validity measures, neuropsychologists have demonstrated the utility of using scores embedded within traditional neuropsychology tests (refer to Schutte & Axelrod, 2013, for a detailed discussion). Some tests have found additional components—or patterns of scores—embedded within the scoring system of a single test that allow for differentiation of individuals with invalid performance from those with true pathology—e.g., Forced Choice from the California Verbal Learning Test-II (Delis, Kramer, Kaplan, & Ober, 2000), performance patterns from the CVLT-II (Root, Robbins, Chang, & Van Gorp, 2006), Rey Complex Figure Test (Lu, Boone, Cozolino, & Mitchell, 2003; Reedy et al., 2013), Rey Auditory Verbal Learning Test (Boone, Lu, & Wen, 2005; Davis, Millis, & Axelrod, 2012). Even combinations of scores between tests have been introduced as additional PVMs (e.g., Davis, McHugh, et al., 2012; Schutte, Millis, Axelrod, & Van Dyke, 2011; Sherman, Boone, Lu, & Razani, 2002) as have the total number of impaired scores relative to normative data (Davis, Axelrod, McHugh, Hanks, & Millis, 2013). In addition, by comparing scores of standard neuropsychological measures from individuals with documented brain injury, an estimation is made regarding the worst possible acceptable performance even for an individual with documented pathology (Backhaus, Fichtenberg, & Hanks, 2004). From those scores, comparisons can be made to determine if one is performing significantly worse than those with true injury. Another method used to evaluate embedded measures of performance validity is to compare examinees demonstrated to have valid performance from those otherwise shown to have deficient performance validity. Using such methods with neuropsychological tests, even cognitively impaired individuals will obtain acceptable scores, from a validity standpoint. Whereas most freestanding measures of performance validity appear as memory tasks, the use of derived cutoff scores opens up opportunity to use a variety of measures to detect invalid performance on tasks such as Digit Span (Axelrod, Fichtenberg, Millis, & Wertheimer, 2006, Meyers & Volbrecht, 2003), Picture Completion (Davis, McHugh, Bagley, Axelrod, & Hanks, 2011; Solomon et al., 2010), Speech Sounds Perception Test (Ross, Putnam, Millis, Adams, & Krukowski, 2006), Trail Making Test parts A and B (Iverson, Lange, Green, & Franzen, 2002), Judgment of Line Orientation (Whiteside, Wald, & Busse, 2011), Booklet Category Test (Greve, Bianchini, & Roberson, 2007), Continuous Performance Test II (Ord, Boettcher, Greve, & Bianchini, 2010), and the Comalli Stroop Test (Arentsen et al., 2013). The assessment of fine motor speed is included in most standard neuropsychological evaluations. In particular, the Finger Tapping Test (FTT) was incorporated into the

Downloaded by [Dicle University] at 01:49 06 November 2014

878

BRADLEY N. AXELROD ET AL.

Halstead Reitan Neuropsychology Battery (Reitan & Wolfson, 1985, 1993) and remains a mainstay of evaluations, even for individuals who do not administer the full battery (Camara, Nathan, & Puente, 2000; Rabin, Barr, & Burton, 2005). The obtained score is the average number of taps over multiple 10-second trials. Normative data exist for individuals based on sex, age, and years of education (Heaton, Miller, Taylor, & Grant, 2004). Some studies demonstrated worse performance with simulators (Mittenberg, Rotholc, Russell, & Heilbronner, 1996; Rapport, Farchione, Coleman, & Axelrod, 1998) relative to patients with true injuries. Arnold et al. (2005) noted gender differences on motor measures and explicitly examined FTT performance as an embedded PVM in a sample grouped by sex. In comparing individuals categorized by separate PVMs, the use of dominant cutoff scores was most effective. Specifically, mean dominant hand scores of fewer than 36 taps per trial for men and fewer than 29 taps per trial for women were derived. The dominant hand cutoffs demonstrated 90% specificity for both sexes, with sensitivity falling at 50% for men and 61% for women. It should be noted that Arnold and colleagues generated FTT scores based on the average of three trials, a method different from that generally use in which the average of five trials within five taps is used (Reitan & Wolfson, 1985, 1993). Larrabee (2003) examined FTT performance in groups classified as meeting criteria for definite malingered neuropsychological dysfunction and compared them with individuals with history of moderate to severe brain injury. Totaling mean performance across both hands, he incorporated a cutoff score of 62 taps or fewer into an algorithm in which additional embedded measures were included. Although this cutoff score was never independently cross-validated in isolation following the initial validation sample, similar scores were obtained in other studies evaluating FTT (e.g., Heaton, Smith, Lehman, & Vogt, 1978; Mittenberg et al., 1996). Meyers and Volbrecht (2003) introduced a unique method of embedding PVM within evaluations that also include FTT. Specifically, they created and tested a method of predicting Finger Tapping performance based on performance on the Rey Complex Figure Test copy raw score, Digit Symbol Coding scaled score, and Block Design scaled score. The formula developed through linear regression was: (RCFT raw score × .185) + (Digit Symbol Coding scaled score × .491) + (Block Design scaled score × .361) + 31.34. The difference between estimated and actual FTT performance was examined and a difference score cutoff (–10) was identified. Actual FTT was at most 10 taps worse than estimated FTT for those with acceptable validity. As with Larrabee (2003), this embedded method of predicting FTT was not cross-validated following formulation with the initial validation sample. The primary goal of this study was to cross-validate and compare the three embedded PVMs using FTT presented above. Specifically, group differentiation of these FTT measures was examined in a clinical sample clearly dichotomized by performance validity.

METHOD Participants Two samples were used to assess FTT performance, with the first being a Midwestern Department of Veterans Affairs Medical Center (VA) (permissions were

Downloaded by [Dicle University] at 01:49 06 November 2014

FINGER TAPPING AS A MEASURE OF PERFORMANCE VALIDITY

879

obtained to use the VA data) and the second an independent neuropsychology practice (IME). All data were de-identified prior to their use. Participants were included from the VA dataset if complete data were available on FTT as well as the additional measures necessary for calculation of the estimated tapping score. Of the 279 cases, 15 were excluded due to clinical evidence of possible dementia (i.e., Mini Mental State Examination score < 25). The sample averaged 45 (SD = 14; range = 21–65) years of age and had 13 (SD = 2) years of education. The sample was 12% female and 88% right-handed. Regarding ethnicity, the sample was 71% Caucasian, 25% African American, 2% Hispanic/Latino, and 2% other ethnicities. The sample from the neuropsychology practice was composed of 80 individuals referred for independent neuropsychological evaluations as part of civil litigation or disability. Referrals were made from case managers, worker’s compensation insurance, automobile claims agents, and attorneys. Participants were included in the civil forensic sample if complete data were available on FTT as well as the additional measures necessary for calculation of the estimated tapping score. As was the case for the VA cases, the average age of the 80 IME participants was 42 (SD = 11; range = 21–63) and they averaged 13 (SD = 2) years of education. The sample was 48% female and 88% righthanded. Regarding ethnicity, the sample was 79% Caucasian and 21% African American. Presenting concern was reported to be traumatic brain injury in all cases with injury severity classified as mild in 81% of cases. Cause of injury was reported as motor vehicle accident (76%), fall (18%), or other (6%; e.g., assault). Measures The primary measure of interest to this study was FTT (Reitan & Wolfson, 1985) in addition to the performance validity measures listed in Table 1. FTT was administered according to standardized instructions in which the raw score is the average of five consistent trials (i.e., within five taps) up to a maximum of 10 trials per hand with 10-second rest breaks between each trial and a 30-second rest break after every third trial. (Readers should be aware that Arnold et al., 2005, computed FTT based on the average of three trials, not five trials. The potential implications of this different technique were not evaluated as part of the current study.) Three FTT variables were examined that have reported utility as performance validity measures: the raw score for dominant hand (FTT-D; Arnold et al., 2005); raw scores combined for both hands (FTT-C; Larrabee, 2003); and the estimated finger tapping difference score (FTT-Dif; Meyers & Volbrecht, 2003). Procedure Participants completed neuropsychological measures as part of a comprehensive outpatient evaluation. Participants were grouped using freestanding and embedded performance validity indicators. While the test batteries differed between the samples, similar performance validity measures were examined to the extent possible. For example, the forensic sample completed the Auditory Verbal Learning Test (AVLT; Lezak, 1983) and the VA sample completed the California Verbal Learning Test-Second Edition (CVLT-2; Delis et al., 2000). Embedded performance validity measures based on

880

BRADLEY N. AXELROD ET AL.

Downloaded by [Dicle University] at 01:49 06 November 2014

the recognition trials of these tests were used. The cut scores used and failure rates are shown in Table 1. Following the approach used in previous research (Larrabee, 2003; Wolfe et al., 2010), participants who failed two or more performance validity measures were designated the Fail group in the VA (n = 25) and IME (n = 20) datasets. Participants who passed all performance validity measures were designated the Pass group in the VA (n = 179) and IME (n = 33) data. Participants who failed only one performance validity measure were not included in further analyses (VA: n = 60; IME: n = 27) so that the comparison of the three FTT indices could be accomplished with groups whose performance validity were clearly defined.

RESULTS Group differences Pass and Fail groups in both samples did not differ in gender, age, education, or handedness (Table 2). In the VA sample, Pass and Fail groups did not differ in ethnic composition. In the forensic sample, the Pass group was 88% Caucasian and 12% African American; the Fail group was 60% Caucasian and 40% African American, which was significantly different (Table 2). Additional analyses were conducted for the forensic sample, and no differences were observed on FTT across ethnic composition. Caucasian and African American participants did not demonstrate significant differences on FTT-D, t(78) = 1.5, p = .15, FTT-C, t(78) = 1.2, p = .26, or FTT-Dif, t(78) = 0.8, p = .44. Furthermore, nonparametric (Spearman) correlations were small and non-significant for race and FTT-D (.09, p = .40), FTT-C (.08, p = .48), and FTT-Dif (.02, p = .85). Therefore no other comparisons based on ethnic composition were made. Pass and Fail groups performed significantly differently on all FTT variables in both samples (Table 3). Specifically, both Fail groups had fewer taps than Pass groups for Dominant hand raw score and the total of both hands in the Combined score. The Fail groups also had significantly fewer taps per trial than estimated (FTT-Dif) when compared to the Pass groups. Classification using unitary FTT methods Turning to classification accuracy, published cutoffs for each of the FTT variables were examined in both samples. In the VA sample, FTT-D (< 36 taps per trial for males and < 29 taps per trial for women) correctly identified 36% of the Fail group and 94% of the Pass group. FTT-C (< 63 taps per trial) identified 32% of the Fail group and 94% of the Pass group. FTT-Dif (FTT minus FTT-estimate < –11 taps per trial) identified 24% of the Fail group and 94% of the Pass group. In the forensic sample, FTT-D correctly identified 50% of the Fail group and 88% of the Pass group. FTT-C identified 55% of the Fail group and 94% of the Pass group. FTT-Dif identified 55% of the Fail group and 88% of the Pass group. At the request of a review of this manuscript, FTT performance was compared to two embedded measures (TMT-A; RDS) in an attempt to better understand if FTT

FINGER TAPPING AS A MEASURE OF PERFORMANCE VALIDITY

881

Table 1. Performance validity indicator cut scores and percentage failure rates by sample VA Measure and cutoff

Downloaded by [Dicle University] at 01:49 06 November 2014

AVLT-R < 10 (Meyers & Volbrecht, 2003) CVLT-2-FC < 15 (Delis et al., 2000) MSVT (per manual; Green, 2004) RDS < 7 (Bibikian, Boone, Lu, & Arnold, 2006) RMT-Faces < 26 (Millis, 2002) RMT-Words < 38 (Iverson & Franzen, 1998) TMT-A raw score > 62 (Iverson et al., 2002) TOMM (per manual; Tombaugh, 1996) WCST-FMS > 3 (Larrabee, 2003)

IME

%

nadm

8 17 7

259 215 255

9 6 5

263 202 198

%

nadm

30

79

16 8 34 16

76 80 79 79

5

75

VA = VA data (N = 264); IME = independent examination data (N = 80); nadm = number of participants administered the measure; AVLT-R = Auditory Verbal Learning Test Recognition raw score; CVLT-2-FC = California Verbal Learning Test-2 Forced Choice Recognition raw score; MSVT = Medical Symptom Validity Test; RDS = Reliable Digit Span; RMT = Recognition Memory Test; TMT = Trail Making Test; TOMM = Test of Memory Malingering; WCST-FMS = Wisconsin Card Sorting Test Failure to maintain set.

offers unique versus redundant information. For all three FTT variables, the VA sample offered similar results. Specifically, detection of intact versus invalid performance for both TMT-A and RDS was consistent with FTT results for approximately 85% of the protocols. Similarly, the IME sample generated comparable findings between FTT and the two other embedded measures on 76% of the cases. So, although significant overlap exists, the relationship is not unitary and FTT does indeed seem to tap invalidity differently from TMT-A and RDS. Classification using FTT methods together The data above demonstrate that using the traditional cutoff scores provided by existing research, Fail cases were accurately detected (hits) on the average of 41% of the time (averaging across all methods and both samples). Pass cases were correctly identified as providing valid performance 93% of the time (true negatives). We next investigated the possibility of improving detection of deficient validity on FTT. Rather than examining each method separately, we evaluated each case based on the failure of any of the three FTT methods. Take for example a case that presented with FTT-D of 30, FTT-C of 66, and FTT-Dif of -6. FTT-D is invalid, while FTT-C and FTT-Dif are both acceptable. So, if one were to evaluate only FTT-C or FTT-Dif, then the case would be deemed a pass. But, because we were determining failure on any of the three methods, this case would be deemed a failure for having generated an invalid FTT-D score. We used this method to evaluate failure on any of the three FTT measures as an indication of FTT failure. For the VA data, 91% of those who passed were correctly identified as passing all FTT scores, while 40% of the Fail group were correctly identified as they failed one or more FTT score. The IME results similarly found accurate detection of 85% of the Pass cases and 60% of the Fail cases. For the entire dataset

882

BRADLEY N. AXELROD ET AL. Table 2. Demographic characteristics VA

Age, M (SD) Education, M (SD) % Male % Caucasian % Right handed

IME

Pass (n = 179)

Fail (n = 25)

p

Pass (n = 33)

Fail (n = 20)

p

44.3 (14.11) 13.1 (1.9) 87 73 89

48.1 (15.0) 13.2 (2.0) 96 60 84

.21 .80 .20 .19 .48

40.1 (11.3) 13.3 (2.6) 61 88 85

46.2 (11.6) 12.4 (2.4) 45 60 95

.06 .20 .27 .02 .26

Downloaded by [Dicle University] at 01:49 06 November 2014

VA = Veterans Affairs sample (N = 264); IME = civil forensic sample (N = 80); Pass = passed all performance validity measures; Fail = failed two or more performance validity measures.

with both samples merged, overall specificity (accurate detection of passing) is 90% while sensitivity is 49% in detecting suboptimal performance in the Fail group. Overall classification with all 257 cases is 83%. Positive predictive power and negative predictive power were computed for base rates of invalid performance in theoretical samples of 10%, 30%, and 50%. The cutoff scores of failure on any of the three FTT methods were used. Positive predictive power was .35, .68, and .83, respectively. Negative predictive power was .94, .80, and .64, respectively. Sample-specific cutoff scores Establishing new cutoff scores for samples different from the initial samples is important to determine if such scores are generalizability to cases generated from different sites. In the current study, we performed receiver operating characteristic (ROC) analyses to find overall classification rates as well as cutoff scores that best detected Pass and Fail of FTT. In the VA sample, FTT-D demonstrated area under the curve (AUC) of .67 (95% confidence interval: .54 – .80) and FTT-C demonstrated AUC of .66 (.54 – .79), no

Table 3. Finger Tapping Test performance VA

D raw ND raw Combined Estimated Difference

IME

Pass (n = 179)

Fail (n = 25)

t

d

Pass (n = 33)

Fail (n = 20)

t

d

48.6 (8.8) 43.8 (8.7) 92.4 (16.6) 44.6 (2.4) 4.0 (8.1)

40.1 (13.9) 39.3 (10.2) 79.3 (22.3) 41.6 (3.9) –1.6 (15.4)

4.2*** 2.4* 3.5*** 5.2*** 2.8**

0.73 0.47 0.67 0.93 0.46

44.8 (8.2) 41.3 (8.3) 86.2 (15.3) 44.9 (2.1) 0.0 (7.9)

30.9 (10.3) 31.1 (11.6) 62.0 (20.6) 40.9 (3.0) –10.0 (10.0)

5.4*** 3.7*** 4.9*** 5.8*** 4.0***

1.49 1.01 1.33 1.54 1.11

VA = Veterans Affairs sample (N = 264); IME = civil forensic sample (N = 80); D = dominant hand; ND = nondominant hand; Combined = both hands combined; Estimated = estimated finger tapping (Meyers & Volbrecht, 2003); Difference = difference between estimated and actual dominant hand raw scores (Meyers & Volbrecht, 2003). *p < .05; **p < .01; ***p < .001.

FINGER TAPPING AS A MEASURE OF PERFORMANCE VALIDITY

883

Table 4. Finger Tapping Test classification accuracy by sample VA

Downloaded by [Dicle University] at 01:49 06 November 2014

Cutoff D raw < 61 < 50 < 40 < 39 < 38 < 37 < 36 < 35 < 34 < 33 < 32 < 31 < 30 < 20 < 17 Combined < 120 < 100 < 80 < 70 < 69 < 68 < 65 < 63 < 61 < 60 < 55 < 35 Difference < 28 < 10

Finger Tapping Test performance as a measure of performance validity.

The Finger Tapping Test (FTT) has been presented as an embedded measure of performance validity in most standard neuropsychological evaluations. The p...
179KB Sizes 0 Downloads 4 Views