JSLHR

Research Article

Aging Affects Identification of Vocal Emotions in Semantically Neutral Sentences Kate Dupuisa,b,c and M. Kathleen Pichora-Fullera,c,d

Purpose: The authors determined the accuracy of younger and older adults in identifying vocal emotions using the Toronto Emotional Speech Set (TESS; Dupuis & PichoraFuller, 2010a) and investigated the possible contributions of auditory acuity and suprathreshold processing to emotion identification accuracy. Method: In 2 experiments, younger and older adults with normal hearing listened to and identified vocal emotions in the TESS stimuli. The TESS consists of phrases with controlled syntactic, lexical, and phonological properties spoken by an older female talker and a younger female talker to convey 7 emotion conditions (anger, disgust, fear, sadness, neutral, happiness, and pleasant surprise). Participants in both experiments completed audiometric

testing; participants in Experiment 2 also completed 3 tests of suprathreshold auditory processing. Results: Identification by both age groups was above chance for all emotions. Accuracy was lower for older adults in both experiments. The pattern of results was similar across age groups and experiments. Auditory acuity did not predict identification accuracy for either age group in either experiment, nor did performance on tests of auditory processing in Experiment 2. Conclusions: These results replicate and extend previous findings concerning age-related differences in ability to identify vocal emotions and suggest that older adults’ auditory abilities do not explain their difficulties in identifying vocal emotions.

T

loudness, and pitch. For example, speech that is slow, quiet, and low pitched can convey sadness, whereas fast, loud, and high-pitched speech can convey anger (e.g., Scherer, 2003; Sobin & Alpert, 1999). Listeners can accurately identify vocal emotions at rates four or five times greater than chance (Pittam & Scherer, 1993). However, mounting evidence suggests that older adults are less accurate than younger adults in identifying vocal emotions expressed in speech (e.g., Dupuis & Pichora-Fuller, 2010b; Mitchell, Kingston, & Barbosa Bouças, 2011; Orbelo, Grim, Talbott, & Ross, 2005; Orbelo, Testa, & Ross, 2003; M. Ryan, Murray, & Ruffman, 2010), with age-related differences beginning as early as the fourth decade (Mill, Allik, Realo, & Valk, 2009; Paulmann, Pell, & Kotz, 2008). The identification of emotion in speech involves sensory, cognitive, and emotional processing (Schirmer & Kotz, 2006), but there is no current consensus on the mechanisms responsible for age-related differences in the ability to identify vocal emotions.

he expression and understanding of emotion are integral to successful communication and social interaction (Banse & Scherer, 1996; Pittam & Scherer, 1993; Scherer, 2003). Emotion can be expressed in many ways. It can be expressed nonvocally in facial and/or body gestures, linguistically in the semantic content of speech (e.g., “The bad news makes me sad”), in the acoustic patterns of the voice that provide affective or emotional prosodic cues in speech (e.g., speaking “Yes, I can do it” in a voice expressing happiness), and/or in nonverbal vocal productions such as laughing or crying that can convey information about emotion (e.g., Sauter, Eisner, Calder, & Scott, 2010; Scott, Sauter, & McGettigan, 2010). The present study focuses on identification of emotional prosody in speech with neutral semantic content. The production of emotional prosody relies on variations in specific acoustical cues, such as speech rate,

a

University of Toronto, Ontario, Canada Baycrest Health Sciences, Toronto, Ontario, Canada c Toronto Rehabilitation Institute, Ontario, Canada d Rotman Research Institute, Toronto, Ontario, Canada Correspondence to Kate Dupuis: [email protected] b

Editor: Nancy Tye-Murray Associate Editor: Karen Kirk Received September 11, 2014 Revision received January 30, 2015 Accepted March 10, 2015 DOI: 10.1044/2015_JSLHR-H-14-0256

Auditory Contributions to Age-Related Differences in Vocal Emotion Identification Recent investigations have attempted to elucidate the contributions of hearing abilities to age-related differences in the identification of affective prosody. It seems that audiometric pure-tone thresholds are not related to the difficulties Disclosure: The authors have declared that no competing interests existed at the time of publication.

Journal of Speech, Language, and Hearing Research • Vol. 58 • 1061–1076 • June 2015 • Copyright © 2015 American Speech-Language-Hearing Association

Downloaded From: http://jslhr.pubs.asha.org/ by a East Carolina University User on 09/27/2016 Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

1061

in identifying vocal emotions found for older listeners, at least for those with no more than mild to moderate hearing loss (Lambrecht, Kreifelts, & Wildgruber, 2012; Mitchell, 2007; Orbelo et al., 2005). Nevertheless, for listeners who have greater degrees of hearing loss, the acoustical properties of speech typically used to identify some emotions may become inaudible, in particular when emotion identification involves lowering vocal intensity (e.g., sadness is produced at lower-than-average sound intensity levels). Moreover, even older adults with normal or near-normal pure-tone thresholds can have reduced suprathreshold auditory temporal processing abilities that are unrelated to audiometric pure-tone thresholds (for a review, see Fitzgibbons & Gordon-Salant, 2010). These reduced abilities could compromise the identification of vocal emotions in speech even when speech remains easily audible. In particular, reduced ability to discriminate differences in vocal intensity, voice fundamental frequency (F0), and/or signal duration cues could compromise the identification of vocal emotions in speech. Older adults have larger intensity difference limens (DLs) than younger adults (Fitzgibbons & Gordon-Salant, 2010). Intensity DLs for nonspeech tonal signals have been shown to increase with age (e.g., He, Dubno, & Mills, 1998; MacDonald, Pichora-Fuller, & Schneider, 2007), although, to our knowledge, age-related differences in intensity DLs have not been studied using speech stimuli. Nevertheless, it is possible that difficulties discriminating intensity differences may bias older adults toward interpreting emotions as sad or neutral (emotions produced with less variation in intensity). Frequency DLs for nonspeech signals increase with age, even in listeners who have audiometric pure-tone thresholds that are considered to be clinically normal (e.g., Abel, Krever, & Alberti, 1990). Using speech stimuli, voice F0 DLs have been shown to be approximately three times larger for older adults than for younger adults (Vongpaisal & Pichora-Fuller, 2007). Furthermore, the ability of older listeners to detect differences in F0 has been related to age-related reductions in melodic pitch perception (Russo, Ives, Goy, Pichora-Fuller, & Patterson, 2012). Thus, it is also possible that a decline in the periodicity coding of temporal fine structure cues may make it more difficult for older adults to perceive pitch differences that could serve to distinguish emotions (e.g., a neutral voice produced with low pitch compared with an angry voice produced with high pitch), even if the audibility of speech is maintained. Older adults often have difficulty discriminating the duration of nonspeech (Fitzgibbons & Gordon-Salant, 1994) and speech signals (Gordon-Salant, Yeni-Komshian, Fitzgibbons, & Barrett, 2006), and they often have reduced abilities to detect silent gaps in nonspeech (Schneider, Pichora-Fuller, Kowalchuk, & Lamb, 1994) and speech signals (Pichora-Fuller, Schneider, Benson, Hamstra, & Storzer, 2006). These reduced abilities in auditory temporal processing can affect specific aspects of phoneme identification (Gordon-Salant et al., 2006; Pichora-Fuller et al., 2006), and they may affect certain aspects of the identification of vocal emotions (e.g., happy would be produced

1062

at a faster rate with shorter segment and silent gap durations, whereas sad would be produced at a slower rate with longer segment and silent gap durations). Thus, age-related declines in suprathreshold auditory processing could alter the ability of older adults to use multiple acoustical properties that are important to the identification of vocal emotions, namely intensity (loudness), frequency (pitch), and duration (rate). The contributions of age-related suprathreshold auditory processing to vocal emotion identification have been examined in two studies. One study (Globerson, Amir, Golan, Kishon-Rabin, & Lavidor, 2013) tested the ability of younger adults to identify five vocal emotion conditions (anger, fear, sadness, happiness, and neutral) in mono- and polysyllabic nonsense utterances, Hebrew words, and Hebrew sentences. These authors showed that 31% of the variance in affective prosody recognition for sentences with neutral linguistic content could be explained by the listener’s ability to recognize pitch direction in gliding tones. Another study (Mitchell & Kingston, 2014) tested the relationship between auditory processing abilities (amplitude, duration, and pitch discrimination of nonspeech stimuli) and a listener’s ability to discriminate whether two sentences were spoken to portray the same or different emotions (happiness or sadness) in younger and older adults with clinically normal hearing. In the psychoacoustic discrimination tasks, participants heard pairs of tones and had to identify which tone (first or second) was louder, longer, or higher. Accuracy scores on the psychoacoustic tests accounted for 93% of the variance in the older adults’ performance and 45% of the variance in the younger adults’ performance. For both age groups, pitch discrimination scores had the largest influence on how accurately vocal emotions were judged to be the same or different. These two studies provide preliminary evidence that suprathreshold auditory processing abilities may affect the identification and discrimination of vocal emotions. However, the identification of vocal emotions could be studied more extensively by using a larger set of emotions, including all six of the basic emotions (for a detailed review, see Ekman, 1999). Furthermore, to isolate the contributions of auditory processing abilities to vocal emotion identification, further research should be conducted using stimuli whose semantic, syntactic, and phonemic properties are controlled. In addition, controlling the intensity levels used to present test stimuli could minimize the possible influence of reduced hearing thresholds in older adults on their accuracy in identifying vocal emotions (Globerson et al., 2013; Mitchell & Kingston, 2014). Last, neither prior study used measures of auditory processing for which there is well-established evidence of age-related differences. Thus, it would be worthwhile to examine how such measures relate to age-related differences in emotion identification. With these considerations in mind, in the present study we used a novel set of stimuli that included more emotions and controlled for possibly confounding acoustic and linguistic factors, and in Experiment 2 we tested suprathreshold auditory processing using measures for which age-related differences have been demonstrated previously.

Journal of Speech, Language, and Hearing Research • Vol. 58 • 1061–1076 • June 2015

Downloaded From: http://jslhr.pubs.asha.org/ by a East Carolina University User on 09/27/2016 Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

Toronto Emotional Speech Set The Toronto Emotional Speech Set (TESS; Dupuis & Pichora-Fuller, 2010a) consists of 200 items recorded by both a younger female talker and an older female talker to portray seven emotion conditions (anger, disgust, fear, sadness, neutral, happiness, and pleasant surprise), for a total of 2,800 test stimuli. The TESS stimuli were based on the Northwestern University Auditory Test No. 6 (NU-6; Tillman & Carhart, 1966), one of the most common speech audiometry tests used extensively by clinicians and researchers (Gelfand, 2009). The NU-6 includes 200 items that are divided into four phonemically balanced lists. Each item begins with a standard carrier phrase followed by a unique monosyllabic consonant–vowel nucleus–consonant word (e.g., “Say the word bean”). Therefore, each item has the same syntactic structure and the same number of words and syllables, with the phonemic properties of the items balanced across lists. The TESS stimuli were recorded by two female actors (aged 26 and 64 years) who had clinically normal hearing and who were monolingual speakers of Canadian English and in good self-reported health. In separate recording sessions, each actor recorded each of the 200 items from the NU-6 to portray one of six emotions (anger, disgust, fear, sadness, happiness, and pleasant surprise) as well as a neutral voice. The actors recorded numerous tokens of each item (e.g., “Say the word bean” spoken in a sad voice), and a group of four listeners then chose the token that seemed to be the most representative of each emotion. In this way, a total of 2,800 stimuli (200 NU-6 items × 2 actors × 7 emotion conditions) were created. (A more detailed description of the stimulus recording can be found in Dupuis & Pichora-Fuller, 2011, and analysis of the acoustical characteristics of the TESS stimuli can be found in Dupuis & Pichora-Fuller, 2014.) The semantic content of the preceding words was equivalent across all items because a standard carrier phrase was used. The semantic content of the sentence-terminating words of the 200 NU-6 items was determined in an earlier study in which emotional valence ratings (1 = extremely negative, 9 = extremely positive) were obtained from 48 younger and 16 older participants (Dupuis & Pichora-Fuller, 2008).

Current Work Experiment 1 This was the first attempt to gather emotion identification accuracy data for the TESS stimuli from both younger and older listeners. It was hypothesized that both age groups would be able to accurately identify the vocal emotions at rates exceeding chance but that the older adults would be less accurate overall compared with the younger adults. Given previous research, it was not expected that identification accuracy would be related to audiometric thresholds in either age group as long as the stimuli remained audible, but we did expect that age-related reductions in suprathreshold auditory processing would influence the older adults’ ability to identify vocal emotions. We expected older

adults to have greater difficulty identifying emotions produced with a higher pitch or at faster rates (i.e., anger, fear) compared with those produced with a lower pitch or at slower rates (i.e., neutral, sadness). Experiment 2 Experiment 2 replicated Experiment 1 and extended it by adding measures of suprathreshold auditory processing, including F0 DL, gap detection in speech markers, and tonal intensity discrimination, so that we were able to test relationships between the suprathreshold auditory processing measures and the accuracy of vocal emotion identification.

Experiment 1 Method Participants Participants were 56 university students (M = 19.7 years, SD = 2.7, 70% women and 30% men) and 28 older adults (M = 69.8 years, SD = 4.5, 71% women and 29% men) from an existing volunteer pool. All participants had acquired English by the age of 5 years and had completed at least Grade 10. The majority of the older adults (75%) had undertaken postsecondary education (see Table 1 for participant characteristics and Table 2 for the audiometric thresholds of participants). On average, compared with the younger adults, the older adults had completed a greater number of years of education and had higher scores on the Mill Hill Vocabulary Scale (Raven, 1982), a predictor of overall intellectual function. All participants met the audiometric criteria for inclusion in the study: They had clinically normal pure-tone air-conduction thresholds of no greater than 25 dB HL in the frequency range most important for speech (250–3000 Hz) in the better ear and no clinically significant interaural threshold asymmetry (no more than 15 dB at more than two adjacent standard octave test frequencies). Participants provided informed consent and were tested individually. The testing session lasted approximately 1 hr. Younger participants received one course credit, and older participants received $10 for their time. Both experiments described in this article were conducted in accordance with the human ethics standards and received approval from the research ethics board of the University of Toronto. Stimuli The stimuli used in the present study were taken from the TESS. The 140 semantically neutral NU-6 items (Mvalence = 5.2, SD = 0.7, range = 3.8–6.4) were selected for use in the current study for a total of 1,960 stimuli (140 items × 7 emotions × 2 talkers). The 980 stimuli recorded by each talker were divided into seven test lists of 140 items each (i.e., 20 stimuli spoken to portray each of the seven emotion conditions). The stimuli in each list were pseudorandomized, with no more than three stimuli spoken in the same vocal emotion presented in a row. Each

Dupuis & Pichora-Fuller: Emotional Speech Identification

Downloaded From: http://jslhr.pubs.asha.org/ by a East Carolina University User on 09/27/2016 Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

1063

Table 1. Summary of participant characteristics. Experiment 1 Variable Years of education Vocabulary score (0–20) Health rating score (1–4)b F0 DL (Hz) Gap detection thresholds (ms) Intensity DL (dB)

Experiment 2

Younger adults

Older adults

Younger adults

Older adults

13.5 (0.28) 12.4 (0.36) 3.5 (0.08) — — —

15.3 (0.53)a 14.9 (0.47)a 3.3 (0.13) — — —

16.2 (0.47) 13.4 (0.44) 3.4 (0.12) 1.1 (0.18) 12.8 (2.2) 2.5 (0.24)

16.1 (0.61) 14.9 (0.47)a 3.0 (0.10)a 5.9 (1.4)a 26.1 (3.9)a 3.0 (0.21)

Note. Means (SEs). F0 DL, gap detection thresholds, and intensity DL measures are the average of the two best runs. Em dashes indicate data not obtained. F0 DL = voice fundamental frequency difference limen; DL = difference limen. a

Significant difference between scores for the two age groups. bAll health ratings are self-reported.

participant heard one of the lists (i.e., all 140 words with an equal number spoken to portray each of the emotions by one of the talkers). The seven test lists for each talker were allocated to different participants so that each was heard an equal number of times. A practice list of 14 items (2 items × 7 emotions) was also created and was pseudorandomized in the same way for each talker. MATLAB software was used to equate root mean square (RMS) values across all stimuli to ensure that the average intensity level of each item was the same. This was done to minimize potential differences across emotions in the basic audibility of the stimuli (see Globerson et al., 2013, for a similar approach to studying emotion in speech and Gingras, Marin, & Fitch, 2013, for a similar approach to studying emotion in music).1 By equalizing the RMS level of the stimuli so that all stimuli were similarly audible, we attempted to minimize the possible confounds between audibility and emotion identification. Note that this method does not equalize variability in intensity, so variability in intensity remained available as a cue that could influence emotion identification. Apparatus Participants sat in a chair in the center of a 3.3 m × 3.3 m double-walled sound-attenuating booth (Industrial Acoustics Co., New York, NY) facing a loudspeaker (ElectroMedical Instrument Co., Mississauga, Ontario, Canada) located in a corner of the booth. The loudspeaker was located approximately 180 cm from the participant at a height of 107 cm (i.e., about head height), and the stimuli were presented from the loudspeaker at an average sound level of 70 dBA. This presentation level was chosen to be comparable to the level of conversational speech. Presentation of the stimuli was controlled by a Dell (Round Rock, TX) computer. Stimuli were routed through a Tucker Davis Technologies (Gainesville, FL) System III Hardware and Globerson et al. (2013) describes this method as follows: “The root mean square (RMS) value of the samples in each sound file was calculated. The sound file was normalized to achieve an equal RMS value for all stimuli” (p. 1802).

1

1064

a Harman/Kardon (Stamford, CT) HK 3380 amplifier. Participants responded using a Compar (Minnetonka, MN) Multisync LCD 1760 NX 17 touch screen mounted below the height of their head on a 45-cm-high table approximately 30 cm in front of them. Procedure Counterbalancing was used to assign each participant in the two age groups to one of the seven test lists recorded by either the younger or the older talker. Before beginning the task, the experimenter used a picture of the response screen to explain the instructions to each participant. The participant then completed 14 practice trials using the touch screen. Participants were instructed to listen carefully to each stimulus and make their responses on the basis of how the speaker was talking rather than what she was saying. They were asked to respond to the following statement: Table 2. Audiometric thresholds (in dB HL), standard pure-tone averages (PTAB), and high-frequency pure-tone averages (HFPTAB) for the better ear for younger and older listeners. Experiment 1

Experiment 2

Variable

Younger adults

Older adults

Younger adults

Older adults

Frequency (Hz) 250 500 1000 1500 2000 3000 4000 8000 PTAB HFPTAB

3.8 (0.7) 0.3 (0.7) −2.0 (0.7) −0.1 (0.8) −1.1 (0.8) 0.64 (0.9) 1.1 (0.9) 4.4 (1.2) −0.4 (0.5) −1.2 (0.6)

9.8 (1.5) 9.8 (1.4) 10.9 (1.2) 10.9 (1.3) 11.6 (1.5) 17.0 (1.4) 21.6 (2.4) 39.0 (3.5) 10.5 (1.2) 14.0 (1.1)

1.4 (1.0) −1.3 (0.9) −2.1 (0.7) −1.6 (0.9) −3.9 (0.7) 0.4 (0.9) 0.2 (1.0) 2.9 (1.2) −2.4 (0.6) −2.4 (0.6)

7.3 (1.2) 5.7 (1.1) 5.5 (1.0) 8.8 (1.6) 9.3 (1.8) 15.5 (1.5) 20.5 (2.1) 38.4 (3.2) 6.8 (1.0) 11.0 (1.1)

Note. Means (SEs). PTAB was calculated using thresholds at 500, 1000, and 2000 Hz and HFPTAB was calculated using thresholds at 500, 1000, 2000, and 4000 Hz for the better ear. Thresholds, PTABs, and HFPTABs are all significantly lower for the younger adults compared with the older adults (ps < .01).

Journal of Speech, Language, and Hearing Research • Vol. 58 • 1061–1076 • June 2015

Downloaded From: http://jslhr.pubs.asha.org/ by a East Carolina University User on 09/27/2016 Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

“Please indicate how you think the talker feels.” The possible responses were angry, disgusted, fearful, happy, neutral, pleasantly surprised, and sad. After the participant responded, a next button appeared at the bottom of the screen, and by pressing this button the participant started the next trial.

Results Identification Accuracy The effects of emotion and listener age on identification accuracy were examined using a repeated-measures analysis of variance (ANOVA) with vocal emotion (anger, disgust, fear, sadness, neutral, happiness, and pleasant surprise) as the within-subject factor and listener age group (younger, older), talker (younger, older), and test list (1 through 7) as the between-subjects factors (see Figure 1). Mauchly’s test indicated that the assumption of sphericity had been violated, c2(20) = 65.5, p < .001; therefore, degrees of freedom were corrected using Huynh-Feldt estimates of sphericity (ɛ = 1.0). Post hoc analyses were conducted using t tests with Tukey’s least significant difference (LSD) adjustments for multiple comparisons and alpha levels set at p = .05. Results indicate main effects of vocal emotion, F(6, 336) = 14.24, p < .001, hp2 = .20, and listener age group, F(1, 56) = 22.61, p < .001 hp2 = .29. There was no main effect of talker ( p = .34), but there was a significant Vocal Emotion × Talker interaction, F(6, 336) = 7.41, p < .001, hp2 = .12. There was no main effect of talker or of test list ( p = .66) nor any significant interaction of any factor with test list ( ps > .05), suggesting that the lists were appropriately balanced. It is important to note that there were no significant three-way (emotion by listener age group by talker, p = .14; emotion by listener age group by test list, p = .57; listener age group by talker by test list, p = .82; emotion by talker by test list, p = .89) or four-way (emotion, listener age group, talker, test list, p = .28) interactions.

Figure 1. Mean identification accuracy for each age group of listeners plotted by vocal emotion and experiment. Dark bars correspond to the data of the younger listeners; light bars correspond to the data of the older listeners (± SE). Solid bars correspond to the data from Experiment 1; striped bars correspond to the data from Experiment 2. The emotions are ordered according to identification accuracy in Experiment 1; the only difference in the order for Experiment 2 is that accuracy is slightly higher for neutral than for fear.

Main effects of vocal emotion and listener age group. As seen in Figure 1, overall identification accuracy was 76.6%, far exceeding chance (14.3%). Anger and sadness were the easiest emotions to identify (86.9% and 81.2% accuracy, respectively), whereas disgust and pleasant surprise were the most difficult to identify (64.9% and 61.9% accuracy, respectively), and the other emotions were intermediate in difficulty. Overall, the younger adults had significantly higher identification accuracy (M = 82.1%, SD = 11.4) compared with the older adults (M = 65.8%, SD = 19.4). Note that there was no interaction between vocal emotion and listener age group ( p = .19), indicating that the pattern of identification accuracy across emotions was similar for both listener age groups. Vocal emotion by talker interaction. Although there was no main effect of talker, there was a significant interaction of vocal emotion by talker, suggesting that the two talkers did not produce all emotions equivalently. Post hoc analyses indicated that the identification rate for stimuli spoken to portray anger was better for items produced by the older talker (M = 90.1%) compared with items produced by the younger talker (M = 83.7%). Identification accuracy for stimuli spoken to portray happiness and sadness was better when the younger talker produced these emotions (Mhappiness = 85.3%, Msadness = 86.5%) than when the older talker produced them (Mhappiness = 54.4%, Msadness = 76.0%). There were no significant differences in identification accuracy for the stimuli produced by the two talkers in the other four vocal emotion conditions (fear, pleasant surprise, disgust, or neutral).

Influence of Auditory Acuity on Identification Accuracy Two sets of pure-tone averages were calculated for the better ear: a standard pure-tone average (PTAB) was calculated using thresholds at 500, 1000, and 2000 Hz, and a high-frequency pure-tone average (HFPTAB) was calculated using thresholds at 500, 1000, 2000, and 4000 Hz. The older adults had significantly poorer PTABs, t(81) = 8.15, p < .001, and HFPTABs, t(81) = 11.97, p < .001, compared with the younger adults (see Table 2). In order to investigate whether the lower vocal emotion identification rates of the older adults could be attributed to their poorer auditory acuity, correlations between overall identification accuracy and pure-tone averages were conducted for the two listener age groups. The correlation with PTAB did not reach significance for either the younger (r = −.12, p = .38) or the older (r = −.24, p = .22) listener age groups, nor did the correlations with HFPTAB (ryounger = −.13, p = .33; rolder = −.19, p = .32). For each age group separately, follow-up correlations were conducted between identification accuracy for each emotion and PTAB and HFPTAB. Due to the high number of correlations conducted, Bonferroni corrections were applied. There were no significant correlations with any emotion and either PTAB or HFPTAB for either age group. Furthermore, for neither age group were there significant correlations between thresholds at 1000, 2000, 3000, 4000, or 8000 Hz and either

Dupuis & Pichora-Fuller: Emotional Speech Identification

Downloaded From: http://jslhr.pubs.asha.org/ by a East Carolina University User on 09/27/2016 Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

1065

overall accuracy or the identification scores for the individual emotions.

Discussion This is the first study to examine the accuracy of vocal emotion identification for the TESS stimuli in younger and older adults. Results suggest that all emotions were identified at rates at least three times higher than chance for both listener age groups. Although the accuracy of emotion identification differed between talkers for stimuli spoken to portray three of the emotions, the two actors produced the rest of the emotions in a manner that yielded comparable emotion identification accuracy. It is important to note that the absence of a Vocal Emotion × Listener Age Group × Talker interaction suggests that the differences in production of emotions across the two talkers influenced the two listener age groups in a similar manner. The finding of age-related differences in vocal emotion identification using the TESS stimuli is consistent with previous work (e.g., Dupuis & Pichora-Fuller, 2010b; M. Ryan et al., 2010). Contrary to our hypothesis, although the older adults had lower identification rates overall, the pattern of responding to the different vocal emotions did not differ between the two listener age groups. The absence of a Vocal Emotion × Listener Age Group interaction suggests that potential age-related differences in auditory processing do not appear to explain the lower vocal emotion identification performance of the older adults in the current experiment. Also, consistent with previous findings (e.g., Mitchell, 2007; Orbelo et al., 2005), the overall agerelated differences in emotion identification could not be explained by the pure-tone hearing thresholds of participants whose audiometric thresholds were considered to be normal for their age (International Organization for Standardization, 2000).

Experiment 2 The goal of Experiment 2 was to replicate the emotion identification results of Experiment 1 in a new sample of participants and to directly test the relationship of suprathreshold auditory processing abilities to vocal emotion identification accuracy.

Method Participants Participants were 28 university students (M = 21.6 years, SD = 3.3, 61% women and 39% men) and 28 older adults (M = 70.7 years, SD = 4.0, 43% women and 57% men) from an existing volunteer pool. None of these participants had been tested in Experiment 1. All participants had acquired English by the age of 5 years and had completed at least Grade 10. The majority of the older adults (92%) had undertaken postsecondary education. On average, compared with the younger adults, the older adults had completed a similar number of years of education and

1066

had higher scores on the Mill Hill Vocabulary Scale. All participants in Experiment 2 met the same audiometric eligibility criteria as did participants in Experiment 1. Participants provided informed consent and were tested individually. Due to the additional tests administered, Experiment 2 consisted of two sessions of approximately 1 hr each. Younger participants received two course credits, whereas older participants again received $10 per hour of their time. There were some differences in participant characteristics between the two experiments. Compared with the younger participants in Experiment 1, the younger adults in Experiment 2 were slightly older (1.9 years), t(81) = 2.84, p = .006; had completed more education (2.7 years), t(81) = 5.11, p < .001; and had better PTABs (2 dB HL), t(81) = 2.45, p = .017. Older adults in Experiment 2 had better PTABs (3.7 dB HL), t(54) = 2.35, p = .022, than older participants in Experiment 1. Although these differences are statistically significant, they are likely of little practical relevance. Stimuli, Apparatus, and Procedures The TESS stimuli and presentation apparatus were the same as those used in Experiment 1 for the emotion identification task. Participants completed audiometry and the emotion identification task in the first testing session, with each participant in the two age groups being assigned using counterbalancing to one of the seven test lists and to one of the two talkers. Three tests of suprathreshold auditory processing were added to the test battery that had been used in Experiment 1. The three new tests were conducted following the same procedures as had been used in previous published studies. All testing took place in the same soundattenuating booth using the same presentation equipment as was used for the emotion identification task. An F0 DL task was completed at the end of the first testing session. In a second testing session, participants completed a gap detection task and an intensity DL task. Additional Measures Vowel F0 DLs. The F0 DL task described in Vongpaisal and Pichora-Fuller (2007, pp. 1143–1144) was administered. Tokens of the vowel [a] were synthesized using five fixed formant frequencies. F0 varied from 120.0 to 150.0 Hz in increments of 0.1 Hz. Stimuli were prepared at a sampling rate of 11025 Hz. In each trial, three 260-ms tokens of the vowel [a] were presented with an interstimulus interval of 150 ms; the standard token (F0 = 120 Hz) was followed by two comparison tokens. One of the comparison tokens matched the F0 of the standard token, whereas the F0 of the other comparison token differed from the standard. Stimuli were presented at a level of 80 dB SPL. The order of presentation of the comparison tokens in a trial was random. Listeners indicated which of the two comparison tokens was different from the standard by pressing the corresponding button on a button box. Participants received feedback after each trial. Prior to the test phase, a practice block of 100 trials was administered in which listeners

Journal of Speech, Language, and Hearing Research • Vol. 58 • 1061–1076 • June 2015

Downloaded From: http://jslhr.pubs.asha.org/ by a East Carolina University User on 09/27/2016 Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

compared the standard tone (F0 = 120 Hz) with a comparison tone having an F0 of 145 Hz. In the test phase, F0 thresholds were determined for each participant using a three-up, one-down adaptive procedure to find the 79.9% correct threshold. The initial difference between the standard tone and the comparison tone was 30 Hz, and this difference was halved on subsequent trials following a correct response or doubled following three incorrect responses. After five reversals, the scaling of the increments was reduced from 2.00 to 1.25 and the scaling of the decrements was increased from 0.5 to 0.8. F0 DL was determined from the mean of the last 10 reversals. Each participant completed three runs of the adaptive task, and more were completed if necessary until performance stabilized. The final F0 DL (in Hertz) was calculated as the average of the two best runs. Gap detection in speech. Participants’ ability to detect gaps in 40-ms speech markers was measured using the same procedure described in previous studies (Besser, Festen, Goverts, Kramer, & Pichora-Fuller, 2015, p. 28; Pichora-Fuller et al., 2006, p. 1148). Speech stimuli were constructed from recorded samples of [su] spoken by an adult female. A gap of varying duration was inserted between the consonant and the vowel at the zero crossing following the consonant in [su]. A two-interval, two-alternative forced-choice staircase procedure was used. In each trial, listeners heard a gap stimulus in one interval and a nongap stimulus in the other interval, with random assignment of the test gap stimulus and control nongap stimulus to the intervals. Intervals were separated by 1 s. Each interval was marked by the illumination of a light corresponding to one of two buttons on a button box. Participants responded by pressing the button they believed corresponded to the interval (i.e., 1 vs. 2) in which the gap stimulus had been heard. Following each response, feedback was provided by illumination of the light above the correct interval. A threedown, one-up procedure was used to find the 79.9% point on the psychometric function, with an initial gap duration of 350 ms. The gap size decreased after three correct responses and increased after one incorrect response; the step size of the increase or decrease was changed by a factor of 0.5 with each reversal until a minimum step size of 1 ms was reached. Stimuli were presented binaurally at 75 dB SPL. Test runs were completed after 12 reversals, and the average of the last eight reversals represented the threshold for each run. All participants completed at least three runs. If a participant’s thresholds continued to improve on the third run, then up to three more runs were completed until a plateau in performance was observed. The gap detection threshold (in milliseconds) was calculated as the average of the two best runs. Intensity DLs. As described in MacDonald et al. (2007), intensity DLs were measured using a two-interval, two-alternative forced-choice paradigm. A 500-Hz pure tone was presented at 70 dB SPL binaurally in one interval, and a 500-Hz pure tone was presented at 70 + DL dB binaurally in the other interval. For each run of 100 trials, accuracy scores were determined at each of four comparison

intensity levels (25 trials at each level) and the 79% correct threshold was interpolated. The duration of each tone was 500 ms, and an interstimulus interval of 250 ms was used. Stimuli were presented in broadband noise; an independent broadband Gaussian noise (0–10 kHz) was generated for each trial interval. The spectrum level of the broadband noise was 30 dB SPL/Hz. The listener was asked to choose the interval with the louder and/or purer tone and to press the corresponding button on a button box. Feedback was provided by an illuminated light-emitting diode above the correct response button. The intensity DL (in dB) was calculated as the average threshold of the two best runs. Note that different presentation levels were used for the three psychoacoustic measures described above. The original presentation levels used by the developers of the tasks were used so that direct comparisons could be made to prior studies in which age-related differences were found in the same test conditions using the same stimuli. The levels used in the three tasks were all well above the hearing thresholds of the participants in the current study. There is prior research establishing that, at least for the gap detection task, results are not affected by the level of presentation as long as the level is not near threshold (Schneider, Speranza, & Pichora-Fuller, 1998).

Results Identification Accuracy The effects of emotion and listener age on identification accuracy were again examined using a repeatedmeasures ANOVA with vocal emotion (anger, disgust, fear, sadness, neutral, happiness, and pleasant surprise) as the within-subject factor and listener age group (younger, older), talker (younger, older), and test list (1 through 7) as the between-subjects factors. As in Experiment 1, Mauchly’s test indicated that the assumption of sphericity had been violated, c2(20) = 37.5, p < .001; therefore, degrees of freedom were corrected using Greenhouse-Geisser estimates of sphericity (ɛ = .72). Post hoc analyses were conducted using t tests with Tukey’s LSD adjustments for multiple comparisons and alpha levels set at p = .05. There were main effects of vocal emotion, F(4, 168) = 18.73, p < .001, hp2 = .40; listener age group, F(1, 28) = 23.69, p < .001, hp2 = .46; and talker, F(1, 28) = 7.27, p = .012, hp2 = .21. There were also significant two-way interactions of vocal emotion by talker, F(4, 168) = 6.29, p < .001, hp2 = .18, and vocal emotion by listener age group, F(4, 168) = 4.09, p = .003, hp2 = .13. There was no main effect of test list ( p = .48), nor any interactions with test list. There was no significant two-way interaction between listener age group and talker ( p = .23), nor a significant three-way interaction among vocal emotion, listener age group, and talker ( p = .09). Main effects of vocal emotion, listener age, and talker. As in Experiment 1, overall identification accuracy in Experiment 2 (74.1%) exceeded chance. Post hoc comparisons for the main effect of vocal emotion indicated that anger

Dupuis & Pichora-Fuller: Emotional Speech Identification

Downloaded From: http://jslhr.pubs.asha.org/ by a East Carolina University User on 09/27/2016 Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

1067

and sadness were again the easiest emotions to identify, closely followed by fear (87.5%, 85.9%, and 81.6% accuracy, respectively), and identification accuracy did not differ for these three emotions. Disgust and pleasant surprise were again the most difficult vocal emotions to identify (61.9% and 61.7% accuracy, respectively), and the other emotions were intermediate in difficulty. Again, the younger adults had higher emotion identification accuracy (M = 83.7%, SD = 14.4) than the older adults (M = 64.5%, SD = 14.1). In contrast to Experiment 1, emotion identification accuracy differed for the two talkers, with higher identification rates (M = 79.4%, SD = 17.0) for stimuli spoken by the younger talker compared with stimuli spoken by the older talker (M = 68.8%, SD = 15.8). Vocal Emotion × Listener Age interaction. In Experiment 2 (but not in Experiment 1), there was a significant two-way interaction between vocal emotion and listener age. Post hoc examination of this interaction revealed that the younger adults outperformed the older adults for all emotions except fear ( p = .09) and pleasant surprise ( p = .12), for which their performance did not differ significantly. Vocal Emotion × Talker interaction. Similar to the findings in Experiment 1, there was a significant Vocal Emotion × Talker interaction, suggesting that the two talkers did not produce all emotions equivalently; however, the specific nature of the interaction was not identical in the two experiments. In both experiments, the identification of stimuli did not differ across talkers for those stimuli portraying the vocal emotions of disgust (Experiment 1, p = .85; Experiment 2, p = .66), fear (Experiment 1, p = .35; Experiment 2, p = .77), or neutral (Experiment 1, p = .44; Experiment 2, p = .15). In Experiment 2, there was also no difference between talkers for the vocal emotions anger ( p = .51) and sadness ( p = .36). As in Experiment 1, accuracy in Experiment 2 was greater for stimuli spoken to portray happiness by the younger talker (Mhappiness = 86.1%) compared with those spoken by the older talker (Mhappiness = 53.8%; p < .001). In Experiment 2, accuracy was also greater for stimuli spoken to portray pleasant surprise by the younger talker (Mps = 73.0%) compared with those spoken by the older talker (Mps = 50.4%; p < .001). It is important to note that, as in Experiment 1, the absence of a three-way Vocal Emotion × Listener Age Group × Talker interaction indicates that the two listener age groups were influenced by the differences in production of these emotions across the two talkers in a similar manner. Across-experiment comparisons. To examine whether there were any overall differences in identification between the two experiments, a final set of analyses was conducted using a repeated-measures ANOVA with vocal emotion (anger, disgust, fear, sadness, neutral, happiness, and pleasant surprise) as the within-subject factor and experiment (1, 2), listener age group (younger, older), talker (younger, older), and test list (1 through 7) as the between-subjects factors. Mauchly’s test indicated that the assumption of sphericity had been violated, c2(20) = 86.3, p < .001; therefore,

1068

degrees of freedom were corrected using Huynh-Feldt estimates of sphericity (ɛ = 1.0).2 This analysis failed to reveal a main effect of experiment, F(1, 84) < 1, p = .96, hp2 < .01, suggesting that the overall patterns of results were similar across the two experiments. There were significant main effects of emotion, F(6, 504) = 30.34, p < .001, hp2 = .27; listener age group, F(1, 84) = 46.11, p < .001, hp2 = .35; and talker, F(1, 84) = 10.44, p = .002, hp2 = .11. As in the experiment-specific analyses described above, the older adults had lower overall emotion identification rates (65.1%) compared with the younger adults (82.9%), and identification rates were lower for stimuli spoken by the older talker (69.8%) compared with those spoken by the younger talker (78.2%). Similar to the results found in Experiment 2, in the omnibus analysis there was a Vocal Emotion × Listener Age Group interaction, F(6, 504) = 3.22, p = .009, hp2 = .04. Post hoc comparisons indicated that older and younger listeners differed on all emotion conditions except for pleasant surprise, which was the least accurately identified emotion, with no significant difference in the performance of the two age groups. Of note, there were no Vocal Emotion × Experiment ( p = .14), Listener Age Group × Experiment ( p = .58), or Talker × Experiment ( p = .40) interactions. Indeed, the only significant interaction involving experiment was the threeway interaction of vocal emotion, listener age group, and experiment, F(6, 504) = 2.32, p = .032, hp2 = .03. When comparing within age groups, there was no difference in identification accuracy between the two experiments for any of the emotions. When comparing across age groups, there was once again no difference in identification accuracy between the two age groups in Experiment 2 for stimuli spoken to portray fear or pleasant surprise. Results for the TESS stimuli collapsed across experiments (Nyounger = 84, Nolder = 56) can be found in Table 3.3 Analysis of confusions during vocal emotion identification. We examined the nature of the confusions between emotions for both listener groups collapsed across experiments (see Table 4). A number of interesting patterns are worth mentioning. First, stimuli portraying happiness were most frequently confused with stimuli portraying pleasant surprise, and this pattern of confusion was symmetrical for 2

The effect of gender on emotion identification was examined in separate analyses for Experiment 1 and Experiment 2 and the two experiments combined. We failed to find a main effect of gender in any of these new analyses (Experiment 1: p = .95; Experiment 2: p = .64; omnibus analysis: p = .47). In Experiment 1, there was an interaction of emotion by gender, F(6, 234) = 2.75, p = .026, hp2 = .026. Post hoc analyses indicated that the only difference between the two genders was that men were significantly less accurate in identifying the neutral voice (72.4%) than women (87.6%). There was no emotion by gender interaction in Experiment 2 ( p = .35) or in the omnibus analysis ( p = .70). Given the mainly null findings, gender was not included as a variable of interest in the current article. 3 A post hoc power analysis for the null significant effect of experiment conducted with G*Power statistical software (Faul, Erdfelder, Lang, & Buchner, 2007) indicated a power of 0.95 to detect this effect, suggesting that the lack of a main effect of experiment was not due to low statistical power.

Journal of Speech, Language, and Hearing Research • Vol. 58 • 1061–1076 • June 2015

Downloaded From: http://jslhr.pubs.asha.org/ by a East Carolina University User on 09/27/2016 Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

both age groups. Stimuli spoken to portray sadness were most frequently confused with stimuli spoken to portray neutral, and this pattern of confusion was also symmetrical for both age groups. In addition, both groups of listeners confused fear most frequently with stimuli spoken to portray sadness. All participants confused angry voices with disgusted voices, although the pattern was not symmetrical, and confusions in identifying disgusted voices differed between listener age groups. Younger listeners confused stimuli spoken to portray disgust with stimuli spoken to portray pleasant surprise, whereas older listeners confused stimuli spoken to portray disgust with stimuli spoken to portray neutral. Influence of Auditory Acuity on Emotion Identification Accuracy As in Experiment 1, the older adults in Experiment 2 had significantly poorer PTABs, t(54) = 8.26, p < .001, and HFPTABs, t(54) = 10.76, p < .001, compared with the younger adults (see Table 1). In order to investigate whether the poorer vocal emotion identification accuracy of the older adults could be linked to their poorer auditory acuity, correlations between overall identification accuracy and PTAB and HRPTAB were conducted for each of the two listener age groups separately. The correlations with PTAB did not reach significance for either the younger (r = .01, p = .98) or the older (r = .18, p = .36) listener age group, nor did the correlations with HFPTAB (ryounger = −.24, p = .22; rolder = .01, p = .97). For each age group, follow-up correlations were conducted between identification accuracy for each of the vocal emotions and PTAB and HFPTAB, with Bonferroni corrections applied. As found in Experiment 1, there were no significant correlations between the accuracy of emotion identification and either PTAB or HFPTAB for either age group. Furthermore, considering pure-tone thresholds at 1000, 2000, 3000, 4000, and 8000 Hz, there were no significant correlations with either overall accuracy or the identification scores for the individual emotions for either age group once the corrections for multiple correlations were applied to the p values. A final set of analyses was conducted to examine potential relationships between vocal identification accuracy and auditory acuity. We tested for the correlations between hearing thresholds and emotion identification within each age group (collapsed across experiments).4 Once Bonferroni 4

The correlational analyses were conducted within age groups rather than across age groups due to issues with multicollinearity between hearing loss and age; that is, hearing thresholds were highly correlated with age when the two age groups were combined (r = .70, p < .001). To disentangle the effects of hearing loss and age, a hierarchical logistic regression analysis was conducted with age and PTAB as explanatory factors and overall emotion identification accuracy as the outcome factor. Age was entered into the model first, followed by PTAB. Both the model with age alone and the model with age and PTAB were significant for the two age groups combined ( ps < .001); however, when looking at the coefficients individually, age was the only significant contributor to the model ( p < .001) and PTAB was not ( p = .289). Thus, the significance of this model is led by age rather than any connection between vocal emotion identification and hearing thresholds.

corrections for multiple comparisons had been applied, there were no significant correlations between hearing thresholds at any of the frequencies tested and vocal emotion identification accuracy (either the overall score or individual scores for each of the emotion conditions). Influence of Auditory Temporal Processing on Emotion Identification Accuracy Prior to conducting these analyses, an examination for outliers revealed one younger adult whose overall emotion identification accuracy was greater than 3 SD below the mean (M = 83.7, SD = 14.4). This participant was eliminated from subsequent analyses. The performance of the two age groups was compared for the three new tests of suprathreshold auditory processing (see Table 1). The younger adults obtained significantly smaller F0 DLs, t(53) = 3.5, p = .002, and gap detection thresholds, t(53) = 2.9, p = .005, compared with the older adults, and the age-related difference between the two age groups approached significance for intensity DLs, t(53) = 1.8, p = .074. In order to determine whether participants’ suprathreshold auditory processing abilities were related to how accurately they could identify vocal emotions, correlations were conducted between the accuracy of emotion identification collapsed across emotions and the F0 DL, intensity DL, and gap detection thresholds for the two listener age groups both together and separately. Correlations were significant when both age groups were combined for F0 DL (r = −.41, p = .002), but not for gap detection (r = −.21, p = .13) or intensity DL (r = −.24, p = .08). None of the correlations reached significance when the younger and older adult data were analyzed separately ( ps > .05).5

Discussion Experiment 2 replicated the emotion identification results of Experiment 1. There was no main effect of experiment on emotion identification accuracy, suggesting that the two samples of participants performed similarly. In both experiments, younger and older listeners with good audiometric thresholds identified the vocal emotions portrayed in the TESS stimuli with a high degree of accuracy, 5

The correlational analyses were conducted within age groups rather than across age groups due to issues with multicollinearity between auditory processing and age; that is, F0 DL (r = .37, p = .005) and gap detection threshold (r = .28, p = .04) but not intensity DL (r = .23, p = .01) were high correlated with age when the two age groups were combined. To disentangle the effects of age and the psychoacoustic abilities, a hierarchical logistic regression analysis was conducted with age and F0 DL, gap detection threshold, and intensity DL as explanatory factors and overall emotion identification accuracy as the outcome factor. When age was entered into the model first, followed by the three psychoacoustic measures, we found that only age had a significant influence on the model ( p = .002). Indeed, the addition of the three psychoacoustic measures to the model only improved the predictive power of the model from R2 = .302 to R2 = .398. This small increase was attributable to the nearly significant influence of F0 DL ( p = .058), with the other two psychoacoustic measures having no significant effect on the model.

Dupuis & Pichora-Fuller: Emotional Speech Identification

Downloaded From: http://jslhr.pubs.asha.org/ by a East Carolina University User on 09/27/2016 Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

1069

Table 3. Reference data for Toronto Emotional Speech Set vocal emotion identification accuracy for younger and older listeners, collapsed across the two experiments. Emotion Anger Sadness Fear Neutral Happiness Disgust Pleasant surprise

Younger adults

Older adults

96.5 (9.7) 90.8 (17.4) 87.4 (19.8) 83.5 (23.6) 79.0 (24.5) 76.2 (27.6) 64.9 (25.5)

78.0 (17.9) 76.3 (19.2) 69.6 (29.1) 64.3 (33.8) 60.0 (28.2) 49.4 (30.7) 58.4 (24.2)

Note. Means (SDs).

but with younger listeners outperforming older listeners. Accuracy varied across emotions, with a similar pattern of results across experiments and across listener age groups. Although there were differences in identification for stimuli spoken by the younger compared with the older talker for two of the emotions, the two actors produced the rest of the emotions in a manner that yielded comparable emotion identification accuracy, with similar responses to the two talkers by both listener age groups. As in Experiment 1, participants’ auditory acuity levels did not predict their overall emotion identification accuracy or their accuracy in identifying specific emotions. In order to further examine the potential influence of suprathreshold auditory processing abilities on vocal emotion identification, three measures that have previously yielded evidence of age-related differences were administered in Experiment 2. There were significant age-related differences on the measures of F0 DL and gap detection, and the age-related difference approached significance for the intensity DL task, but the results from these tests were not significantly correlated with vocal emotion identification accuracy for either age group.

General Discussion Younger and older listeners were able to identify the vocal emotions portrayed in the TESS stimuli at rates far exceeding chance. In keeping with previous studies, older adults were significantly less accurate than younger adults in identifying vocal emotions. The absence of significant correlations between overall vocal emotion identification accuracy and measures of auditory acuity and suprathreshold auditory processing provides no evidence that agerelated differences in hearing abilities as measured in the current study explain the age-related differences observed in the accuracy of identifying vocal emotions. Below we summarize the current findings, discuss potential contributors to the age-related differences in emotion identification accuracy observed in the current study, examine how listener and talker age interact to influence vocal emotion identification, and explore the contributions of auditory abilities to the understanding of vocal emotions.

Accuracy of Vocal Emotion Identification for the TESS The overall accuracy of vocal emotion identification by younger and older listeners tested with the TESS was 82.6% and 65.1%, respectively. In reviews of the literature on vocal emotion identification, overall accuracy rates are reported to reach up to 65% in studies using different response paradigms and varying numbers of vocal emotion alternatives (Scherer, 2003; Scherer, Johnstone, & Klasmeyer, 2003). Similar accuracy rates have been reported in studies that used the same seven emotion conditions as those portrayed in the TESS. For example, one study of younger and middle-aged German participants demonstrated accuracy scores of 73.2% and 61.8%, respectively, when participants were asked to identify the emotion being portrayed

Table 4. Confusion matrices for younger (A; n = 84) and older (B; n = 56) listeners combined across the two experiments represented as absolute responses and percentages (in parentheses) A Anger Sadness Fear Neutral Happiness Disgust Pleasant surprise B Anger Sadness Fear Neutral Happiness Disgust Pleasant surprise

Anger

Sadness

Fear

Neutral

Happiness

Disgust

Pleasant surprise

1,621 (96.49) 2 (0.12) 3 (0.18) 17 (1.01) 9 (0.54) 23 (1.37) 5 (0.30)

5 (0.30) 1,525 (90.77) 8 (0.48) 88 (5.24) 7 (0.42) 44 (2.62) 3 (0.18)

7 (0.43) 73 (4.35) 1,468 (87.38) 14 (0.83) 45 (2.68) 24 (1.43) 29 (2.92)

24 (1.43) 182 (10.83) 6 (0.36) 1,402 (83.45) 17 (1.01) 44 (2.62) 5 (0.30)

1 (0.06) 14 (0.83) 7 (0.42) 45 (2.68) 1,328 (79.05) 7 (0.42) 278 (16.55)

47 (2.80) 18 (1.07) 11 (0.65) 96 (5.71) 119 (7.08) 1,280 (76.19) 109 (6.49)

20 (1.19) 5 (0.30) 30 (1.79) 57 (3.39) 260 (15.48) 218 (12.98) 1,090 (64.88)

Anger

Sadness

Fear

Neutral

Happiness

Disgust

Pleasant surprise

874 (78.04) 2 (0.18) 10 (0.89) 29 (2.59) 11 (0.98) 188 (16.79) 6 (0.54)

1 (0.09) 854 (76.25) 51 (4.55) 117 (10.45) 4 (0.36) 93 (8.30) 0 (0.00)

27 (2.41) 192 (17.14) 779 (69.55) 56 (5.00) 1 (0.09) 136 (12.14) 4 (0.36)

1 (0.09) 139 (12.41) 40 (3.57) 720 (64.29) 69 (6.16) 18 (1.61) 58 (5.18)

9 (0.80) 7 (0.63) 8 (0.71) 88 (7.86) 672 (60.00) 31 (2.77) 305 (27.23)

45 (4.02) 24 (2.14) 36 (3.21) 248 (22.14) 91 (8.13) 553 (49.38) 123 (10.98)

28 (2.50) 0 (0.00) 42 (3.75) 117 (10.45) 195 (17.41) 84 (7.50) 654 (58.39)

Note. Each entry represents the percentage of trials on which the stimulus (column) resulted in the response (row); thus, all columns sum to 100%. Items in bold represent correct identification rates.

1070

Journal of Speech, Language, and Hearing Research • Vol. 58 • 1061–1076 • June 2015

Downloaded From: http://jslhr.pubs.asha.org/ by a East Carolina University User on 09/27/2016 Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

in German sentences (Paulmann et al., 2008). Castro and Lima (2010) found an overall emotion identification accuracy rate of 75% in younger listeners tested with short Portuguese sentences portraying these same emotions. The emotion identification accuracy rates for younger listeners in these two prior studies approached the overall accuracy rate of 82.5% achieved by the younger listeners in the current study. In agreement with previous findings (e.g., Globerson et al., 2013; Scherer, Banse, Wallbott, & Goldbeck, 1991; see Juslin & Laukka, 2003, for a review), the highest identification rates for the TESS were found for stimuli spoken to portray anger and sadness.

Effect of Listener Age on the Accuracy of Vocal Emotion Identification Overall, younger listeners were more accurate than older adults in identifying vocal emotion, consistent with our hypothesis and with previous studies (e.g., Kiss & Ennis, 2001; Mitchell et al., 2011; Orbelo et al., 2005). In Experiment 1, older adults were less accurate than younger adults in identifying all emotions, in Experiment 2 they were less accurate in identifying all emotions except fear and pleasant surprise, and in the omnibus analysis (where data from the experiments were combined) the only emotion condition that did not differ between the two age groups was pleasant surprise. The absence of an age-related difference in identification accuracy for pleasant surprise agrees with previous findings (e.g., Paulmann et al., 2008; Ruffman, Henry, Livingstone, & Phillips, 2008; Ruffman, Sullivan, & Dittrich, 2009; Wong, Cronin-Golomb, & Neargarder, 2005). The preserved ability of the older group to identify pleasant surprise and fear is also compatible with our recent finding that word recognition in noise by both younger and older listeners is best when the TESS stimuli portray fear or pleasant surprise (Dupuis & Pichora-Fuller, 2014). If these two emotions are more effective than other emotions in capturing a listener’s attention and orienting them to important threatening or beneficial stimuli in their environment, then these emotions may also confer a compensatory advantage to older adults that helps them to maintain identification accuracy at levels similar to those of younger listeners. Note that we had hypothesized, on the basis of previous findings (e.g., Lima & Castro, 2011), that the older adults would be less accurate than younger adults in identifying emotions produced with a higher pitch or at faster rates (e.g., anger, fear) but that there would not be age-related differences in the accuracy of identifying emotions produced with a lower pitch or at faster rates (e.g., sadness, neutral); however, this hypothesis was not supported in either experiment. It may be that the discrepancies between the pattern of age-related differences found in the current study and in the prior study (Lima & Castro, 2011) may be related to the age of participants; note that the older adults in the current study were considerably older (Mage = 70 years) than the two older groups (musicians: Mage = 48.4 years; control: Mage = 47 years) tested by Lima and Castro (2011). The more emotion-specific difficulties of middle-aged listeners

may reflect the influence of auditory declines and the ability to use particular acoustical cues, whereas the more general poorer performance of the older listeners may reflect other nonauditory, age-related changes. Potential Contributors to Age-Related Differences in Vocal Emotion Identification There are a number of mechanisms that could explain the age-related differences in identifying vocal emotions that were found in the present study, including age-related changes in cognition and emotion regulation. The finding that older adults had lower rates of vocal emotion identification than their younger counterparts is consistent with research on age-related differences in emotion identification in other domains, including facial and bodily expressions (Ruffman et al., 2009) and music (Laukka & Juslin, 2007). Taken together, these results suggest a general reduction in the ability of older adults to process and understand emotional information in nonspeech and nonauditory stimuli. It is interesting to note that impaired processing of receptive and expressive emotional prosody has been shown in older adults with Alzheimer’s disease (e.g., Roberts, Ingram, Lamar, & Green, 1996; Taler, Baum, Chertkow, & Saumier, 2008; but see Horley, Reid, & Burnham, 2010, for a discussion of inconsistencies in these findings), and the severity of impairment in understanding emotion has been shown to increase systematically with the severity of the disease (Testa, Beatty, Gleason, Orbelo, & Ross, 2001). Given the fact that emotion identification deficits are seen quite early in the disease process of patients with Alzheimer’s disease (Testa et al., 2001), it may be that difficulties identifying vocal emotions in samples of older adults who are healthy (e.g., those tested in the current experiments) are related to earlier stages of cognitive decline. Longitudinal studies would allow for further elucidation of this hypothesis. Related research with visual and auditory stimuli, such as pictures or nonverbal vocalizations, indicates that the ratings assigned by older adults to the emotional intensity or arousal conveyed by a stimulus are lower than those assigned by their younger counterparts (e.g., Fecteau, Armony, Joanette, & Belin, 2005; Thompson, Aidinejad, & Ponte, 2001), possibly due to controlled downregulation mechanisms (e.g., Gross et al., 1997). The reductions in older adults’ abilities to recognize vocal emotions may be related to their lower ratings of emotional arousal; however, this potential relationship needs further exploration.6 There may also be interactions between age-related cognitive declines and emotion regulation. Some aspects of cognitive processing that decline with age (e.g., memory and attention) can be influenced by the emotional content of stimuli (e.g., Dolan, 2002; Hamann, 2001). On the 6

Participants in Experiment 1 also rated the intensity with which the talker portrayed each sentence. Older adults had lower intensity ratings overall compared with the younger adults, F(1, 82) = 5.6, p = .021, consistent with previous findings. There was no interaction between listener age and vocal emotion, indicating a similar pattern of age differences in intensity ratings across all seven emotion conditions.

Dupuis & Pichora-Fuller: Emotional Speech Identification

Downloaded From: http://jslhr.pubs.asha.org/ by a East Carolina University User on 09/27/2016 Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

1071

contrary, age-related declines in cognitive processing may also influence the recognition of affective prosody. For example, changes in attentional control and working memory capacity (e.g., Gazzaley, Cooney, Rissman, & D’Esposito, 2005; Sambataro et al., 2010; for reviews, see Park & Schwarz, 2000; Schneider, Pichora-Fuller, & Daneman, 2010) might influence older listeners’ abilities to simultaneously keep in mind potentially conflicting representations of the semantic and emotional information being conveyed. In everyday conversation, reductions in processing speed (e.g., Kennedy & Raz, 2009; Wingfield & Tun, 2001) may make it difficult for older listeners to keep up with a rapid flow of linguistic information in speech while simultaneously recognizing vocal emotions. The potential influence of cognitive factors on the recognition of vocal emotion is thus likely to be minimal in tasks imposing a very low cognitive load but greater in tasks involving dividing attention, memorizing words, or resolving inconsistencies between linguistic and vocal emotional information. However, the cognitive demands in the current study were low and seem unlikely to explain the age-related differences that were observed. Interactions Between Listener Age and Talker Age Using both a younger and an older talker to record the TESS stimuli allowed for the examination of potential own-age listener biases in these experiments. In the current study, there was no interaction between the age of the listener and the age of the talker. Note that research in the visual domain has suggested that during facial processing, viewers show biased processing for faces from their own age group (e.g., Anastasi & Rhodes, 2006; Bäckman, 1991). On the basis of the findings from research using visual stimuli, it might have been expected that younger adults would have found it easier to identify emotions in speech spoken by the younger talker compared with speech spoken by the older talker, with the pattern being reversed for the older listeners. However, the data from the current study did not support this hypothesis. The absence of a group own-age– talker preference in the current study may be due to the fact that participants were not encouraged to process the identity or age of the talker in order to correctly identify the emotion being portrayed. Indeed, in previous studies where this bias was shown, participants processed stimuli at a deeper semantic level than in the current study. For example, Anastasi and Rhodes (2006) had participants rate photos of faces from different age groups for attractiveness and sort the photos into specific age categories (e.g., 18–25 years, 55–75 years), whereas the stimuli used by Bäckman (1991) were photos of famous Swedish people who would have been familiar to the participants. In the current work, it is unlikely that participants would have been familiar with the voice of either talker. Alternative explanations for the absence of an own-age response bias include the possibility that the bias exists only for visually represented information or that it may be easier to determine general age group (i.e., younger, middle-aged, older) from facial information than from voice information (for a review of the visual literature, see Rhodes, 2009). Indeed, studies have suggested that older

1072

adults (Huntley, Hollien, & Shipp, 1987) have difficulty perceiving the age of a talker from voice cues, whereas younger adults are conservative in their age estimations and tend to underestimate the highest ages and overestimate the youngest ages of talkers (E. B. Ryan & Capadano, 1978).

Contributions of Auditory Abilities to Age-Related Differences in Emotion Identification The constellation of findings in the two experiments does not seem to be attributable to age-related hearing problems in terms of either audiometric thresholds or the three selected measures of suprathreshold auditory processing. With the possible exception of two emotions that may be more effective in capturing attention (fear and pleasant surprise), significant age-related differences in emotion identification accuracy were observed across all emotions. However, the three-way interaction of experiment, listener age, and vocal emotion was driven by the similar and poor accuracy of both groups in identifying pleasant surprise only in Experiment 2, and it seems unlikely that these differences between the two experiments are meaningful. It is important to note that because age-related differences were not isolated to specific emotions, it seems unlikely that the poorer performance of older listeners can be explained by their difficulty hearing particular acoustical cues relevant to distinguishing particular emotions. Moreover, the ordering of emotions from easiest to hardest to identify was nearly identical for both age groups. The only difference between age groups in the ordering of difficulty across emotions was that stimuli spoken to portray disgust were identified more accurately than stimuli spoken to portray pleasant surprise by the younger listeners, whereas the opposite was true for the older listeners, but these differences in ordering were not significant and were likely due to variability among the participants tested for each experiment. Furthermore, correlational analyses suggest that neither age-related declines in audiometric threshold nor suprathreshold auditory processing abilities account for older adults’ lower overall identification accuracy rates. Recall that the RMS intensity of the stimuli was equalized prior to their presentation and that all participants had clinically normal or near-normal hearing thresholds in the frequency range most important for speech perception. Therefore, any age-related differences in the accuracy of emotion identification were not expected to be the result of the stimuli simply not being heard or being only partially heard by the older group, even for emotions such as sadness that are typically characterized by lower intensity. Although the older group had audiometric thresholds up to and including 3 kHz that would be considered to be clinically normal (World Health Organization, 2014) and their audiometric profiles were normal for their age (ISO, 2000), statistically significant age-related differences were found for audiometric pure-tone thresholds in both experiments. However, audiometric thresholds were not associated with the accuracy of emotion identification for either age group. The lack of a significant correlation between pure-tone

Journal of Speech, Language, and Hearing Research • Vol. 58 • 1061–1076 • June 2015

Downloaded From: http://jslhr.pubs.asha.org/ by a East Carolina University User on 09/27/2016 Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

thresholds and the accuracy of emotion identification is consistent with previous findings (Mitchell, 2007; Orbelo et al., 2005). Nevertheless, it remains possible that emotion identification abilities may be compromised by more significant degrees of audiometric hearing loss, especially if the vocal emotion of the talker renders speech less audible.7 Even though equalizing the intensity level of the stimuli should have compensated for potential age-related differences due to audiometric threshold elevations, vocal emotion identification could still have been undermined by age-related differences in the auditory processing of suprathreshold speech cues. For example, age-related differences in the discrimination of pitch, intensity, and duration cues could compromise the use of these specific acoustical dimensions relevant to the identification of vocal emotions. In Experiment 2, selected tests of suprathreshold auditory processing revealed significant age-related differences on measures of vowel F0 DLs and gap detection in speech and nearly significant differences on a measure of intensity DLs. The results of the older group on the F0 DL task were somewhat higher than those found previously by Besser et al. (2015; medianyounger = 0.9, medianolder = 2.7) and Vongpaisal and Pichora-Fuller (2007; Myounger = 0.6, Molder = 1.8), whereas the results of the younger group were more similar to previous results. On the intensity DL task, both the younger and the older participants had higher thresholds than previously found by MacDonald et al. (2007; Myounger = 1.3, Molder group with good performance in noise = 1.3); however, the older participants in Experiment 2 had similar thresholds to those found in MacDonald et al.’s (2007) subsample of older adults with poorer performance (Molder group with poor performance in noise = 3.3). For the gap detection task, participants had similar scores to those found in Pichora-Fuller et al.’s (2006) data for the condition in which a gap was inserted into a shortduration recording of [su]. It is interesting to note that the older adults in the current study had somewhat bimodal distributions on the measures of intensity DL and gap detection, suggesting that, although all participants had good audiometric thresholds, some of the older participants demonstrated problems on the auditory processing tasks. Last, despite the finding of age-related differences on these measures of suprathreshold auditory processing, they were not associated with vocal emotion identification in either age group. Contrary to the findings of the present study that neither threshold nor suprathreshold measure of hearing was associated with the overall accuracy of vocal emotion identification, two recent studies reported an association between emotion identification accuracy and psychoacoustic 7

It could be argued that the emotion identification results presented in the current article could have been different had we not equated for intensity across the stimuli. However, a previous analysis of the TESS stimuli indicated that mean intensity of stimuli accounts for less than 10% of the variance in categorizing the stimuli by emotion and that the majority of variance in categorization is related to pitch attributes of the stimuli (Dupuis, 2011). Therefore, even had intensity been allowed to vary across stimuli, this would likely not have had a large effect on participants’ identification of the TESS stimuli.

measures. The apparent discrepancy between studies might be explained by methodological differences related to the selection of participants, the particular measures of auditory processing abilities, and the properties of the materials used to test emotion identification. In one study, only younger adults with normal hearing were tested (Globerson et al., 2013). In the other study (Mitchell & Kingston, 2014), both younger and older adults were tested; however, because the audiometric criteria (< 25 dB HL at 0.5, 1.0, and 2.0 kHz) used in this study were less stringent than the criteria we used, it is possible that our older listeners had milder age-related auditory declines or that age and hearing loss were somewhat confounded in their study. The studies also differed in the selection of measures used to test suprathreshold auditory processing abilities. The three measures used in the current study were chosen because age-related differences on these measures have been documented previously, whereas there is no prior literature showing agerelated differences on the specific measures used in the other studies. In the current study, a two-alternative, forced-choice adaptive procedure was used to establish gap detection thresholds and DLs for F0 and the intensity of a tone in noise. Globerson et al. (2013) used similar procedures to establish DLs for steady and gliding tones, but only tasks requiring pitch direction recognition were significant predictors of prosody recognition scores. Mitchell and Kingston (2014) reported percentage correct accuracy but did not establish thresholds for their measures. Furthermore, in terms of the stimuli used to test emotion identification, the studies differed in the number of emotions tested, with the current study testing a larger number of emotions. One study tested only two emotions (happiness and sadness; Mitchell & Kingston, 2014) and the other tested four emotions (happiness, sadness, anger, and fear; Globerson et al., 2013), whereas the TESS includes a neutral voice and items portraying the six basic emotions that have been shown to exist in faces across cultures (Ekman, 1999). In addition, the TESS stimuli are controlled in terms of semantic emotion and the syntactic structure of the stimuli, and we equalized the average intensity at which the stimuli were presented to offset the possible effects of reduced audibility on emotion identification. Another difference in the test materials concerns the characteristics of the talkers who produced the stimuli. The two studies in which associations between auditory processing abilities and vocal emotion identification were found used both male and female talkers (one of each gender in Mitchell & Kingston, 2014, and two of each gender in Globerson et al., 2013), whereas both TESS talkers were women. It is possible that gender and other intertalker differences in vocal emotion production influenced the results. In our study, intertalker differences were observed for some of the emotions, suggesting that idiosyncratic differences in how our younger and older female talkers produced speech to portray some emotions affected the listeners’ ability to identify the emotions accurately, but both listener age groups were similarly affected by these talker variations in the production of some vocal emotions. Taken together, age and hearing loss were less likely to have been

Dupuis & Pichora-Fuller: Emotional Speech Identification

Downloaded From: http://jslhr.pubs.asha.org/ by a East Carolina University User on 09/27/2016 Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

1073

confounded in the current study because we used stricter screening criteria and equalized the intensity level of the stimuli used to test emotion identification. It is easier to interpret our measures of suprathreshold auditory processing because there is a prior literature for these measures and they may be more representative of the auditory processing abilities needed when listening to speech. TESS may offer a more demanding test of vocal emotion identification because more emotions are tested, semantic emotion and syntactic properties of the material are controlled, and intertalker differences due to gender are more restricted.

Conclusions and Future Directions Overall, these results extend the literature on agerelated differences in vocal emotional identification through the use of a new set of carefully controlled stimuli recorded by a younger and an older talker. We confirmed that older adults were less accurate than younger adults in identifying vocal emotions, and we showed that the pattern of responses across emotions was generally similar for younger and older listeners. It is important to note that we found no association between emotion identification accuracy and results on auditory measures despite our strict criteria for participant selection, our selection of previously published measures of suprathreshold auditory abilities, and our use of the TESS to test six emotions and a neutral emotion while controlling for semantic emotion, syntax, the audibility of the materials, and intertalker variations in speech production. In light of the lack of convincing evidence that auditory aging underlies the reduced ability of older adults to identify vocal emotion, future studies must investigate other potential explanations such as age-related declines in cognition and/or emotional regulation. The explanation for the poorer emotion identification of older adults remains unknown. Nevertheless, that older adults, even those with relatively good audiograms, have difficulty identifying vocal emotion is an important finding for audiologists, speech-language pathologists, and other health professionals to consider because of its likely effects on everyday communication.

Acknowledgments This research was made possible by support from the Canadian Institutes of Health Research Grants MOP-15359 and STP-53875 and the Natural Sciences and Engineering Research Council of Canada Grant RGPIN 138472-11 to M. Kathleen Pichora-Fuller and a Canada Graduate Scholarship from the Natural Sciences and Engineering Research Council of Canada to Kate Dupuis. Preliminary results from this study were presented at the Acoustics Week in Canada 2011, annual conference of the Canadian Acoustical Association held in Quebec City, Quebec, in 2011; the Cognitive Aging Conference held in Atlanta, Georgia, in 2010; and the Aging and Speech Communication Conference held in Bloomington, Indiana, in 2009. Data from Experiment 1 for the younger adults were published in the proceedings of the Canadian Acoustical Association (2011). We thank Pascal van Lieshout for his advice on the

1074

study and Huiwen Goy for Praat programming support during the creation and analysis of the stimuli used in this study.

References Abel, S. M., Krever, E. M., & Alberti, P. W. (1990). Auditory detection, discrimination and speech processing in aging, noise sensitive and hearing-impaired listeners. Scandinavian Audiology, 19, 43–54. doi:10.3109/01050399009070751 Anastasi, J. S., & Rhodes, M. G. (2006). Evidence for an own-age bias in face recognition. North American Journal of Psychology, 8, 237–252. Bäckman, L. (1991). Recognition memory across the adult life span: The role of prior knowledge. Memory & Cognition, 19, 63–71. doi:10.3758/BF03198496 Banse, R., & Scherer, K. R. (1996). Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 70, 614–636. doi:10.1037/0022-3514.70.3.614 Besser, J., Festen, J. M., Goverts, S. T., Kramer, S. E., & Pichora-Fuller, M. K. (2015). Speech-in-speech listening on the LiSN-S test by older adults with good audiograms depends on cognition and hearing acuity at high frequencies. Ear and Hearing, 36, 24–41. doi:10.1097/AUD.0000000000000096 Castro, S. L., & Lima, C. F. (2010). Recognizing emotions in spoken language: A validated set of Portuguese sentences and pseudosentences for research on emotional prosody. Behavior Research Methods, 42, 74–81. doi:10.3758/BRM.42.1.74 Dolan, R. J. (2002). Emotion, cognition, and behavior. Science, 298, 1191–1194. doi:10.1126/science.1076358 Dupuis, K. (2011). Emotion in speech: Recognition by younger and older adults and effects on intelligibility (Unpublished doctoral dissertation). University of Toronto, Toronto, Ontario, Canada. Dupuis, K., & Pichora-Fuller, K. (2008). Effects of emotional content and emotional valence on speech intelligibility in younger and older adults. Canadian Acoustics, 26, 114–115. Dupuis, K., & Pichora-Fuller, M. K. (2010a). Toronto Emotional Speech Set. Retrieved from https://tspace.library.utoronto.ca/ handle/1807/24487 Dupuis, K., & Pichora-Fuller, M. K. (2010b). Use of affective prosody by younger and older adults. Psychology and Aging, 25, 16–29. doi:10.1037/a0018777 Dupuis, K., & Pichora-Fuller, M. K. (2011). Recognition of emotional speech for younger and older talkers: Behavioural findings from the Toronto Emotional Speech Set. Canadian Acoustics, 39, 182–183. Dupuis, K., & Pichora-Fuller, M. K. (2014). Intelligibility of emotional speech in younger and older adults. Ear and Hearing, 35, 695–707. doi:10.1097/AUD.0000000000000082 Ekman, P. (1999). Facial expressions. In T. Dalgleish & M. Power (Eds.), Handbook of cognition and emotion (pp. 301–320). Sussex, United Kingdom: Wiley. Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175–191. Fecteau, S., Armony, J. L., Joanette, Y., & Belin, P. (2005). Judgment of emotional nonlinguistic vocalizations: Age-related differences. Applied Neuropsychology, 12, 40–48. doi:10.1207/ s15324826an1201_7 Fitzgibbons, P. J., & Gordon-Salant, S. (1994). Age effects on measures of auditory duration discrimination. Journal of Speech, Language, and Hearing Research, 37, 662–670. doi:10.1044/ jshr.3703.662

Journal of Speech, Language, and Hearing Research • Vol. 58 • 1061–1076 • June 2015

Downloaded From: http://jslhr.pubs.asha.org/ by a East Carolina University User on 09/27/2016 Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

Fitzgibbons, P. J., & Gordon-Salant, S. (2010). Behavioral studies with aging humans: Hearing sensitivity and psychoacoustics. In S. Gordon-Salant, R. D. Frisina, A. N. Popper, & R. R. Fay (Eds.), The aging auditory system (pp. 111–134). New York, NY: Springer. doi:10.1007/978-1-4419-0993-0_5 Gazzaley, A., Cooney, J. W., Rissman, J., & D’Esposito, M. (2005). Top-down suppression deficit underlies working memory impairment in normal aging. Nature Neuroscience, 8, 1298–1300. doi:10.1038/nn1543 Gelfand, S. A. (2009). Essentials of audiology (3rd ed.). New York, NY: Thieme. Gingras, B., Marin, M. M., & Fitch, W. T. (2013). Beyond intensity: Spectral features effectively predict music-induced subjective arousal. Quarterly Journal of Experimental Psychology, 67, 1428–1446. doi:0.1080/17470218.2013.863954 Globerson, E., Amir, N., Golan, O., Kishon-Rabin, L., & Lavidor, M. (2013). Psychoacoustic abilities as predictors of vocal emotion recognition. Attention, Perception, & Psychophysics, 75, 1799–1810. doi:10.3758/s13414-013-0518-x Gordon-Salant, S., Yeni-Komshian, G. H., Fitzgibbons, P. J., & Barrett, J. (2006). Age-related differences in identification and discrimination of temporal cues in speech segments. The Journal of the Acoustical Society of America, 119, 2455–2466. doi:10.1121/1.2171527 Gross, J. J., Carstensen, L. L., Pasupathi, M., Tsai, J., Götestam Skorpen, C., & Hsu, A. Y. C. (1997). Emotion and aging: Experience, expression, and control. Psychology and Aging, 12, 590–599. doi:10.1037/0882-7974.12.4.590 Hamann, S. (2001). Cognitive and neural mechanisms of emotional memory. Trends in Cognitive Sciences, 5, 394–400. He, N.-J., Dubno, J. R., & Mills, J. H. (1998). Frequency and intensity discrimination measured in a maximum-likelihood procedure from young and aged normal-hearing subjects. The Journal of the Acoustical Society of America, 103, 553–565. doi:10.1121/1.421127 Horley, K., Reid, A., & Burnham, D. (2010). Emotional prosody perception and production in dementia of the Alzheimer’s type. Journal of Speech, Language, and Hearing Research, 53, 1132–1146. doi:10.1044/1092-4388(2010/09-0030) Huntley, R., Hollien, H., & Shipp, T. (1987). Influences of listener characteristics on perceived age estimations. Journal of Voice, 1, 49–52. doi:10.1016/S0892-1997(87)80024-3 International Organization for Standardization. (2000). ISO 7029– 2000: Acoustics—Statistical distribution of hearing thresholds as a function of age (2nd ed.). Geneva, Switzerland: Author. Juslin, P. N., & Laukka, P. (2003). Communication of emotions in vocal expression and music performance: Different channels, same code? Psychological Bulletin, 129, 770–814. doi:10.1037/ 0033-2909.129.5.770 Kennedy, K. M., & Raz, N. (2009). Aging white matter and cognition: Differential effects of regional variations in diffusion properties on memory, executive functions, and speed. Neuropsychologia, 47, 916–927. doi:10.1016/j.neuropsychologia.2009. 01.001 Kiss, I., & Ennis, T. (2001). Age-related decline in perception of prosodic affect. Applied Neuropsychology, 8, 251–254. doi:10.1207/09084280152829110 Lambrecht, L., Kreifelts, B., & Wildgruber, D. (2012). Age-related decrease in recognition of emotional facial and prosodic expressions. Emotion, 12, 529–539. doi:10.1037/a0026827 Laukka, P., & Juslin, P. N. (2007). Similar patterns of age-related differences in emotion recognition from speech and music. Motivation and Emotion, 31, 182–191. doi:10.1007/s11031-0079063-z

Lima, C. F., & Castro, S. L. (2011). Speaking to the trained ear: Musical expertise enhances the recognition of emotions in speech prosody. Emotion, 11, 1021–1031. doi:10.1037/ a0024521 MacDonald, E., Pichora-Fuller, M. K., & Schneider, B. A. (2007). Intensity discrimination in noise: Effect of aging. Proceedings of the 23rd Annual Meeting of the International Society for Psychophysics (pp. 135–140). Tokyo, Japan: International Society for Psychophysics. Mill, A., Allik, J., Realo, A., & Valk, R. (2009). Age-related differences in emotion recognition ability: A cross-sectional study. Emotion, 9, 619–630. doi:10.1037/a0016562 Mitchell, R. L., & Kingston, R. A. (2014). Age-related decline in emotional prosody discrimination: Acoustic correlates. Experimental Psychology, 61, 215–223. doi:10.1027/1618-3169/ a000241 Mitchell, R. L. C. (2007). Age-related declines in the ability to decode emotional prosody: Primary or secondary phenomenon? Cognition and Emotion, 21, 1435–1454. doi:10.1080/ 02699930601133994 Mitchell, R. L. C., Kingston, R. A., & Barbosa Bouças, S. L. (2011). The specificity of age-related decline in prosodic emotion interpretation. Psychology and Aging, 26, 406–414. doi:10.1037/a0021861 Orbelo, D. M., Grim, M. A., Talbott, R. E., & Ross, E. D. (2005). Impaired comprehension of affective prosody in elderly subjects is not predicted by age-related hearing loss or age-related cognitive decline. Journal of Geriatric Psychiatry and Neurology, 18, 25–32. doi:10.1177/0891988704272214 Orbelo, D. M., Testa, J. A., & Ross, E. D. (2003). Age-related impairments in comprehending affective prosody with comparison to brain-damaged subjects. Journal of Geriatric Psychiatry and Neurology, 16, 44–52. doi:10.1177/0891988702250565 Park, D. C., & Schwarz, N. (2000). Cognitive aging: A primer. Philadelphia, PA: Psychology Press. Paulmann, S., Pell, M. D., & Kotz, S. A. (2008). How aging affects the recognition of emotional speech. Brain and Language, 104, 262–269. doi:10.1016/j.bandl.2007.03.002 Pichora-Fuller, M. K., Schneider, B., Benson, N., Hamstra, S., & Storzer, E. (2006). Effect of age on gap detection in speech and non-speech stimuli varying in marker duration and spectral symmetry. The Journal of the Acoustical Society of America, 119, 1143–1155. doi:10.1121/1.2149837 Pittam, J., & Scherer, K. R. (1993). Vocal expression and communication of emotion. In M. Lewis & J. M. Haviland (Eds.), Handbook of emotions (pp. 185–197). New York, NY: Guilford. Raven, J. C. (1982). Revised manual for Raven’s Progressive Matrices and Vocabulary Scale. Windsor, United Kingdom: NFER Nelson. Rhodes, M. G. (2009). Age estimation of faces: A review. Applied Cognitive Psychology, 23, 1–12. doi:10.1002/acp.1442 Roberts, V. J., Ingram, S. M., Lamar, M., & Green, R. C. (1996). Prosody impairment and associated affective and behavioral disturbances in Alzheimer’s disease. Neurology, 47, 1482–1488. doi:10.1212/WNL.47.6.1482 Ruffman, T., Henry, J. D., Livingstone, V., & Phillips, L. H. (2008). A meta-analytic review of emotion recognition and aging: Implications for neuropsychological models of aging. Neuroscience & Biobehavioral Reviews, 32, 863–881. doi:10.1016/ j.neubiorev.2008.01.001 Ruffman, T., Sullivan, S., & Dittrich, W. (2009). Older adults’ recognition of bodily and auditory expressions of emotion. Psychology and Aging, 24, 614–622. doi:10.1037/a0016356

Dupuis & Pichora-Fuller: Emotional Speech Identification

Downloaded From: http://jslhr.pubs.asha.org/ by a East Carolina University User on 09/27/2016 Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

1075

Russo, F. A., Ives, T., Goy, H., Pichora-Fuller, M. K., & Patterson, R. (2012). Age-related difference in melodic pitch perception is probably mediated by temporal processing: Empirical and computational evidence. Ear and Hearing, 32, 177–186. Ryan, E. B., & Capadano, H. L. (1978). Age perceptions and evaluative reactions toward adult speakers. Journal of Gerontology, 33, 98–102. doi:10.1093/geronj/33.1.98 Ryan, M., Murray, J., & Ruffman, T. (2010). Aging and the perception of emotion: Processing vocal expressions alone and with faces. Experimental Aging Research, 36, 1–22. doi:10.1080/ 03610730903418372 Sambataro, F., Murty, V. P., Callicott, J. H., Tan, H.-Y., Das, S., Weinberger, D. R., & Mattay, V. S. (2010). Age-related alterations in default mode network: Impact on working memory performance. Neurobiology of Aging, 31, 839–852. doi:10.1016/ j.neurobiolaging.2008.05.022 Sauter, D. A., Eisner, F., Calder, A. J., & Scott, S. K. (2010). Perceptual cues in nonverbal vocal expressions of emotion. The Quarterly Journal of Experimental Psychology, 63, 2251–2272. doi:10.1080/17470211003721642 Scherer, K. R. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication, 40, 227–256. doi:10.1016/S0167-6393(02)00084-5 Scherer, K. R., Banse, R., Wallbott, H. G., & Goldbeck, T. (1991). Vocal cues in emotion encoding and decoding. Motivation and Emotion, 15, 123–148. doi:10.1007/BF00995674 Scherer, K. R., Johnstone, T., & Klasmeyer, G. (2003). Vocal expression of emotion. In R. J. Davidson, K. R. Scherer, & H. H. Goldsmith (Eds.), Handbook of affective sciences (pp. 433–456). Oxford, United Kingdom: Oxford University Press. Schirmer, A., & Kotz, S. A. (2006). Beyond the right hemisphere: Brain mechanisms mediating vocal emotional processing. Trends in Cognitive Sciences, 10, 24–30. doi:10.1016/j.tics.2005. 11.009 Schneider, B. A., Pichora-Fuller, M. K., & Daneman, M. (2010). The effects of senescent changes in audition and cognition on spoken language comprehension. In S. Gordon-Salant, R. D. Frisina, A. N. Popper, & R. R. Fay (Eds.), The aging auditory system (pp. 167–210). New York, NY: Springer. Schneider, B. A., Pichora-Fuller, M. K., Kowalchuk, D., & Lamb, M. (1994). Gap detection and the precedence effect in young and old adults. The Journal of the Acoustical Society of America, 95, 980–991. doi:10.1121/1.408403

1076

Schneider, B. A., Speranza, F., & Pichora-Fuller, M. K. (1998). Age-related changes in temporal resolution: Envelope and intensity effects. Canadian Journal of Experimental Psychology, 52, 184–190. doi:10.1037/h0087291 Scott, S. K., Sauter, D., & McGettigan, C. (2010). Brain mechanisms for processing perceived emotional vocalizations in humans. In S. M. Brudzynski (Ed.), Handbook of mammalian vocalization (pp. 187–197). London, United Kingdom: Academic Press. Sobin, C., & Alpert, M. (1999). Emotion in speech: The acoustic attributes of fear, anger, sadness, and joy. Journal of Psycholinguistic Research, 28, 347–365. doi:10.1023/A:1023237014909 Taler, V., Baum, S. R., Chertkow, H., & Saumier, S. (2008). Comprehension of grammatical and emotional prosody is impaired in Alzheimer’s Disease. Neuropsychology, 22, 188–195. doi:10.1037/0894-4105.22.2.188 Testa, J. A., Beatty, W. W., Gleason, A. C., Orbelo, D. M., & Ross, E. E. (2001). Impaired affective prosody in AD: Relationship to aphasic deficits and emotional behaviors. Neurology, 57, 1474–1481. doi:10.1212/WNL.57.8.1474 Thompson, L. A., Aidinejad, M. R., & Ponte, J. (2001). Aging and the effects of facial and prosodic cues on emotional intensity ratings and memory reconstructions. Journal of Nonverbal Behavior, 25, 101–125. doi:10.1023/A:1010749711863 Tillman, T. W., & Carhart, R. (1966). An expanded test for speech discrimination utilizing CNC monosyllabic words: Northwestern University Auditory Test No. 6 (Technical Report No. SAMTR-66-135). San Antonio, TX: USAF School of Aerospace Medicine, Brooks Air Force Base. Vongpaisal, T., & Pichora-Fuller, M. K. (2007). Effect of age on F0 difference limen and concurrent vowel identification. Journal of Speech, Language, and Hearing Research, 50, 1139–1156. doi:10.1044/1092-4388(2007/079) Wingfield, A., & Tun, P. A. (2001). Spoken language comprehension in older adults: Interactions between sensory and cognitive change in normal aging. Seminars in Hearing, 22, 287–301. Wong, B., Cronin-Golomb, A., & Neargarder, S. (2005). Patterns of visual scanning as predictors of emotion identification in normal aging. Neuropsychology, 19, 739–749. doi:10.1037/ 0894-4105.19.6.739 World Health Organization. (2014). Prevention of blindness and deafness: Grades of hearing impairment. Retrieved from http:// www.who.int/pbd/deafness/hearing_impairment_grades/en/#

Journal of Speech, Language, and Hearing Research • Vol. 58 • 1061–1076 • June 2015

Downloaded From: http://jslhr.pubs.asha.org/ by a East Carolina University User on 09/27/2016 Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

Aging Affects Identification of Vocal Emotions in Semantically Neutral Sentences.

The authors determined the accuracy of younger and older adults in identifying vocal emotions using the Toronto Emotional Speech Set (TESS; Dupuis & P...
317KB Sizes 0 Downloads 7 Views