Noise Reduction Improves Memory for Target Language Speech in Competing Native but Not Foreign Language Speech Elaine Hoi Ning Ng,1 Mary Rudner,1 Thomas Lunner,1,2,3 and Jerker Rönnberg1
Objectives: A hearing aid noise reduction (NR) algorithm reduces the adverse effect of competing speech on memory for target speech for individuals with hearing impairment with high working memory capacity. In the present study, we investigated whether the positive effect of NR could be extended to individuals with low working memory capacity, as well as how NR influences recall performance for target native speech when the masker language is non-native.
2005; Van Engen & Bradlow 2007; Brouwer et al. 2012). To examine whether noise reduction (NR) is effective in reducing the semantic interference arising from the background speech, the current study investigated the beneficial effect for hearing aid users of NR on memory for target native speech heard in competing native and foreign speech.
Design: A sentence-final word identification and recall (SWIR) test was administered to 26 experienced hearing aid users. In this test, target spoken native language (Swedish) sentence lists were presented in competing native (Swedish) or foreign (Cantonese) speech with or without binary masking NR algorithm. After each sentence list, free recall of sentence final words was prompted. Working memory capacity was measured using a reading span (RS) test.
Competing Speech as Background Noise and Its Impact on Recall Performance The presence of any type of background noise such as environmental noise and human speech sounds in real life situations may impede speech understanding if the target speech signals are masked and thus become less audible. Understanding speech in noise involves bottom-up driven auditory stream segregation of the target speech signal from background noise. The ease of segregation is fundamentally dependent on the differences in acoustics characteristics of the target and masker. When background noise contains speech information, the semantic information in competing speech may also interfere with speech perception. In other words, segregation between speech and masker can also be driven by semantic and linguistic differences (see Mattys et al. 2009 for review). Listening in noise occupies cognitive capacity, and the remaining cognitive resources can be deployed in different ways that may differ depending on age, hearing impairment, and individual differences in a range of cognitive abilities. When the background contains competing speech, semantic information in the masker may engage cognitive resources leaving fewer resources for any higher-order processes such as memory (Rabbitt 1990; Murphy et al. 2000; Tun et al. 2002; McCoy et al. 2005; Wingfield et al. 2005; Heinrich et al. 2008; Sarampalis et al. 2009; Heinrich & Schneider 2011; Zekveld et al. 2013). Thus, background speech would have an adverse effect on cognitive processes. The masking effect of competing speech on speech perception is dependent on different factors. One factor is the number of talker in the competing speech. Speech reception in single-talker competing noise is easier than in steady state noise because of the presence of gaps or fluctuations in the noise (e.g., Festen & Plomp 1990). As the number of talkers in the competing noise increases, the masking effect increases and becomes stronger than steady state noise. It is because multitalker babble contains semantic information, which competes more with the target speech signal than single-talker competing noise. Moreover, there are less temporal gaps and fluctuations in multitalker babble, which further impair speech perception. The masking effect of multitalker babble is the strongest when there are three or four talkers (e.g., Hall et al. 2002; Freyman et al. 2004). This effect becomes less prominent as the number of talkers further
Results: Recall performance was associated with RS. However, the benefit obtained from NR was not associated with RS. Recall performance was more disrupted by native than foreign speech babble and NR improved recall performance in native but not foreign competing speech. Conclusions: Noise reduction improved memory for speech heard in competing speech for hearing aid users. Memory for native speech was more disrupted by native babble than foreign babble, but the disruptive effect of native speech babble was reduced to that of foreign babble when there was NR. Key words: Competing speech, Free recall, Masker language, Noise reduction, Working memory. (Ear & Hearing 2015;36;82–91)
INTRODUCTION Speech understanding is more challenging in noise than in quiet (Rönnberg et al. 2010). This applies particularly to persons with hearing impairment. Noise also has an adverse effect on memory for speech, even when speech recognition is successfully achieved (Rabbitt 1990; Tun et al. 2002, 2009; McCoy et al. 2005; Wingfield et al. 2005; Sarampalis et al. 2009; Ng et al. 2013). Hearing aids are designed to improve audibility and often include advanced signal processing algorithms to reduce background noise. Recent studies (e.g., Sarampalis et al. 2009; Ng et al. 2013) have therefore examined whether such algorithms could reduce the adverse effects of noise on memory and showed that under certain circumstances, including native speech maskers, this may be the case irrespective of hearing status. Native language speech perception is disrupted less by foreign than native competing speech (e.g., Rhebergen et al. Linnaeus Centre HEAD, Swedish Institute for Disability Research, Department of Behavioural Sciences and Learning, Linköping University, Linköping, Sweden; 2Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark; and 3Department of Clinical and Experimental Medicine, Linköping University, Linköping, Sweden. 1
0196/0202/2015/361-0082/0 • Ear & Hearing • Copyright © 2015 Wolters Kluwer Health, Inc. All rights reserved • Printed in the U.S.A. 82
NG ET AL. / EAR & HEARING, VOL. 36, NO. 1, 82–91
increases and saturates when there are twelve or more talkers (Souza & Turner 1994) because the talkers in the babble are masking each other, which then reduces the semantic interference on perception of target speech. Another factor that influences degree of masking is the language of the competing speech. Native language as a speech masker is the most detrimental because perception of the target speech can be disrupted at different linguistic levels, including phonemic, phonological, syntactic, prosodic, and semantic levels (Van Engen & Bradlow 2007). This has been shown to apply in relation to any non-native language (e.g., Rhebergen et al. 2005; Van Engen & Bradlow 2007; Brouwer et al. 2012). Tun et al. (2002) demonstrated that the ability to recall target native language speech was disrupted when background competing speech was present for persons with normal hearing. In particular, older adults recalled significantly less in a competing speech background in their native tongue than in a competing unfamiliar language, while younger adults did not show any significant difference in recall performance between the two types of competing speech. For competing speech in a non-native language, both familiarity and linguistic similarity affect the effectiveness of masking. The linguistic information of a familiar non-native language is intelligible and hence more disrupting than that of an unfamiliar language as a masker (see e.g., Ben-David et al. 2012). The linguistic similarity between target and competing speech also influences speech recognition performance. For instance, compared to English, Dutch is more linguistically similar than Chinese and Korean (Bradlow et al. 2010), and the greater the linguistic similarity, the stronger the masking effect (Brouwer et al. 2012). Calandruccio et al. (2010, 2013) suggested that the spectral and/or temporal properties of speech masker may explain the linguistic similarity effect. Taken together, competing native speech has a stronger masking effect than competing foreign speech on native speech recognition accuracy and a greater detrimental effect on subsequent recall. The masking effect of foreign speech on native speech increases with familiarity and linguistic similarity to the native speech, and it is likely that similar effects will be found for recall.
Signal Processing and the Role of Working Memory Capacity When listening conditions are adverse, such as in the presence of background noise and when speech signal is distorted, cognitive abilities play an important role in speech recognition (Rönnberg 2003, Rönnberg et al. 2008, 2013). This relationship between cognition and speech recognition is also found in hearing aid users. In particular, individuals with good cognitive abilities seem to benefit more from advanced signal processing in hearing aids than individuals with poorer abilities (Gatehouse et al. 2003; Foo et al. 2007; Moore 2008; Rudner et al. 2011; Ng et al. 2013). One possible explanation is that extra cognitive resources are needed when listening to processed signal (Lunner et al. 2009). Since hearing aid signal processing may have artifacts and distort speech signal, the input speech signal cannot be matched readily with the phonological representations in the mental lexicon. This requires explicit cognitive processing, which is effortful and consumes cognitive resources, in order to make sense of the suboptimal input signal. Thus, there may be
fewer cognitive resources remaining for speech processing. For individuals with limited cognitive capacity, there may not be enough remaining resources to process the speech. This could explain why these individuals usually do not show as much benefit from advanced signal processing as the individuals with good cognitive capacity. If fewer resources are needed to overcome the extra demand in listening to processed signals, there will be more resources remaining for speech processing. For example, when the input is less distorted or when background noise is reduced, it is possible for persons with limited capacity to benefit from signal processing. Francis and Nusbaum (2009) pointed out that the availability of working memory capacity limits task performance. In their synthesized speech perception study, the benefits of acoustic cues could be demonstrated when the task demand was low but not high. They argued that sufficient remaining working memory capacity was crucial in making use of cues available in speech stimuli. Thus, if fewer cognitive resources are required to accomplish a task or if task demand is reduced, the remaining resources could be allocated to other cognitive tasks and so the benefit of NR or cues could be better illustrated.
Effects of Noise Reduction on Memory for Heard Speech in Hearing Aid Users The positive effect on memory for speech of using a NR algorithm in hearing aids for hearing aid users was first reported in Ng et al. (2013). However, the effect was found only for individuals with high working memory capacity. The test used in that study was a sentence-final word identification and recall (SWIR) test, which consisted of 35 eight-sentence lists. The task was to recall the final word of every sentence in a list when prompted to do so at the end of each list. Speech intelligibility close to 100% was ensured by individually adapting SNR and asking participants to report back each sentence final word as it occurred. By providing a favorable SNR good intelligibility was retained. We also ensured that cognitive resources would be engaged in speech recognition while still ensuring that to-beremembered items were audible and thus available for memory encoding. This is important because the benefit of signal processing on memory for speech cannot be observed when speech is presented at such low intelligibility levels that it is not available for encoding. The favorable SNRs used in the SWIR test also closely resemble the noise levels in daily listening situations (between 0 and 15 dB SNR) (Smeds et al. 2012). While most speech-in-noise tests are insensitive in positive SNRs, the SWIR test can serve as a tool to assess aided performance at realistic SNRs. Binary masking NR algorithm was used in Ng et al. (2013). It is effective in attenuating interfering noise containing linguistic information (Brungart et al. 2006). The results of Ng et al. (2013) showed that noise impaired recall performance and that competing speech was more detrimental to recall performance than stationary noise. This is probably because the semantic information in competing speech is more distracting (Sörqvist & Rönnberg 2012) and is harder to segregate from target speech (Mattys et al. 2009). This is particularly true for older people with hearing impairment. Ng et al. (2013) also showed that binary masking NR reduced the adverse effect of native language (Swedish) competing speech on memory. In particular, memory for sentence-final words occurring in late list positions in a sentence list was improved
NG ET AL. / EAR & HEARING, VOL. 36, NO. 1, 82–91
in the presence of NR for listeners with good working memory capacity only. By introducing NR, the background noise is attenuated and hence segregation of target speech from noise becomes easier (Heinrich & Schneider 2011). Word identification may consequently be speeded up and encoding of words into working memory can be facilitated. Retrieval from the short-term storage component could therefore be enhanced, and this is reflected in the improvement of memory for the late list speech items. However, no effect of NR was found in listeners with low working memory capacity.
Aims of the Study The present study was designed to replicate and extend the findings obtained in our previous study (Ng et al. 2013) and had three aims. The first aim was to investigate whether the positive effect of NR on memory for speech for hearing aid users (Ng et al. 2013) could be extended from individuals with high working memory capacity to individuals with low working memory capacity by using a less cognitively demanding test. Effects of NR signal processing were found to be significant in individuals with good working memory capacity only using the original SWIR test. We predicted that when the SWIR test is modified to be less cognitively demanding, individuals with limited working memory capacity would also benefit from signal processing. We also expected similar result that NR would improve memory for speech in late list positions. The second aim was to investigate whether the effect of NR on recall of native speech masked by native speech (Ng et al. 2013) could be extended to recall of native speech masked by non-native speech. The effects of NR were found in the competing speech background in our previous study. This study further examined the role of semantic interference of the masker language and how that interacted with memory and NR. Maskers in two different languages were chosen, one in the native language of the participants (Swedish) and also the same language as the target speech stimuli and another one in a foreign language, which was Cantonese. To optimize the masking effect of competing speech, four-talker babble was chosen. Cantonese was chosen for the present study because this language is distinctively different from Swedish in terms of phonemic, rhythmic, syllabic, and prosodic properties. In other words, this language is unintelligible and linguistically dissimilar from the native Swedish language. Thus, differences in semantic and phonological interference between these competing talker languages would be maximized (Van Engen & Bradlow 2007). Because we expected the non-native masker to be less effective than the native masker, we predicted that NR would not improve recall as much for the non-native masker as for the native masker. The third aim of the study was to study the role of working memory capacity in the effect of NR. In a review of 20 studies on speech recognition and cognitive abilities, Akeroyd (2008) concluded that hearing loss was the primary predictor of speech recognition performance, while individual cognitive ability emerged as the secondary factor. Among all the cognitive tests used in these studies, the most effective predictor was working memory capacity measured by the reading span (RS) test (Daneman & Carpenter 1980). Besser et al. (2013) also found a positive association between working memory capacity, measured using the RS test, and speech recognition performance
in both speech babble and stationary noise across studies. In summary, these reviews indicated that the RS test is a promising predictor of speech recognition performance. Thus, this test was used in this study.
MATERIALS AND METHODS Participants Twenty-six native Swedish speakers (13 women and 13 men) with symmetrical moderate to moderately severe acquired sensorineural hearing loss were recruited from the audiology clinic of the University Hospital of Linköping, Sweden. Their average age was 62.4 years (SD = 2.3, range: 56–65 years) and their average pure-tone thresholds at .5, 1, 2, and 4 kHz in both ears was 51.43 dB HL (SD = 4.86). Figure 1 shows the configuration of hearing loss for each of the participants. All were hearing aid users, who reported an average daily usage of 12 hr (SD = 4.49) over 9 years, and they used digital hearing aids with common features such as wide dynamic range compression, NR, and directional microphones. All were native Swedish speakers and were not familiar with any dialects of the Chinese language. No history of otological problems or psychological disorders was reported. The study was approved by the regional ethics committee in Linköping and informed consent was obtained from all participants.
Reading Span Test This test measures working memory capacity by tapping the ability to process and store verbal information simultaneously. The test material consisted of lists of 3, 4, and 5 three-word sentences, and there were two lists of each list length (i.e., 24 sentences in total). The lists were presented in ascending order of list length. This test consisted of two tasks. The first task was to judge whether the sentences shown on the center of a computer screen, at a rate of 800 msec per word with an interstimulus interval of 75 msec, were sensible or absurd (Baddeley et al. 1985). An example of a sensible sentence is “Prästen läste bibeln,” meaning the priest read the bible, and an example of an absurd sentence is “Spindeln cyklade hem,” meaning the spider cycled home. After each list of sentences, the participants were prompted to recall either the first or the final word of the sentences in the list in correct serial order, which was the second task. They were encouraged to respond as accurately as possible. The test was scored by the total number of items correctly recalled irrespective of serial order. This scoring method optimizes the individual variation in response and was used in other studies (e.g., Lunner 2003; Foo et al. 2007; Ng et al. 2013). Two practice lists of two sentences were administered.
Sentence-Final Word Identification and Recall Test There were two tasks in the original SWIR test: identification task (repeat the final word after listening to each sentence) and free recall task (recall, in any order, all the words that were previously repeated). In the present study, the free recall task remained unchanged. To check whether there is an effect of immediate repetition of final word on recall performance, the identification task was performed on only half of the sentence list stimuli (see Procedure for details). The sentence stimuli used in the present study (e.g., “Pappa ska laga min fåtölj,” meaning father will fix my armchair) were
NG ET AL. / EAR & HEARING, VOL. 36, NO. 1, 82–91
Speech Babble Speech babble in two different languages, Swedish (Swe) and in Cantonese (Can), was used. The speech babble in Swedish was the same babble used in estimating the individualized SNR. These were four-talker babble and consisted of recordings of two male and two female native speakers in the corresponding languages reading different paragraphs of a newspaper text. The duration of the recordings of each speaker lasted for approximately 3 min, and the recordings were equalized for root mean square amplitude before mixing. The four-talker babble was post-filtered to resemble the long-term average spectrum of the HINT sentences. The speech babble was introduced 3 seconds before the onset of sentence stimuli and was terminated 1 sec after sentence offset. For each sentence, different portions of the babble were used.
Processing The binary masking NR algorithm was used in the present study. This signal processing algorithm reduces the masking effect of interfering speech noise by removing noise dominant spectrotemporal regions in the speech-in-noise mixture (Wang et al. 2009). A 64-channel gammatone filterbank followed by time-windowing was applied to speech-in-noise mixtures to form time-frequency units. For each time-frequency unit in the binary matrix, when the local SNR of any time-frequency unit is less than 0 dB (i.e., the energy of the noise exceeds the energy of the target speech), that unit is reduced by 10 dB. Otherwise, the unit is retained in the binary matrix. This is done to ensure that the SNR gain with the binary masks is optimized (Li & Wang 2009). There were two processing conditions in this study: (1) binary masking NR (see Boldt et al. 2008 for details), which is a non-ideal estimation of NR, and 2) unprocessed (NoP).
Procedure Fig. 1. Configuration of hearing loss for each participant in the low (upper panel) and the high (lower panel) reading span (RS) groups.
a subset of the Swedish Hearing In Noise Test (HINT) sentences (Hällgren et al. 2006) and were identical to those used in Ng et al. (2013), but the test was modified to reduce the task demand. Each list consisted of seven instead of eight sentences, and speech stimuli were presented at more favorable SNRs to minimize resources devoted to speech perception. This was achieved by estimating the individualized SNR (see Procedure for details) using a four-talker babble in Swedish, instead of stationary noise used in Ng et al. (2013). With the modified test, less cognitive resources would be required to process input speech signals with artifacts (or when NR is applied), and presumably individuals with limited working memory capacity can also benefit from NR signal processing.
Test Conditions The present study had a 2 × 2 × 2 design. There were two conditions of speech babble and two conditions of processing. On top of having these four conditions, repetition of final word was either required or not required. Therefore, there were eight test conditions in total.
Each participant took part in two 2-hr sessions. Audiometric measurements together with the RS test were performed in the first session. In the second session, an individualized SNR estimated to give 95% speech reception was obtained for each participant before the administration of the SWIR test and was applied to all test conditions in the SWIR test. To estimate the individualized SNR, an SNR which yielded 84% speech intelligibility in four-talker babble in Swedish was first obtained using HINT with a modified adaptive procedure (4-up-1-down; Levitt 1971). Then, a psychometric function was plotted for each participant using the data points obtained from HINT with the modified procedure. The individualized SNR predicting 95% speech perception was estimated from this function. This resulted in a mean individualized SNR of 7.5 dB (SD = 1.9), which is higher than the SNR estimated using stationary noise in Ng et al. (2013) (mean = 4.1 dB SNR, SD = 1.9). All sentence-in-speech babble stimuli were preprocessed using MATLAB (2012). Auditory signals were presented using a 24-bit external PC soundcard at a sampling rate of 22.05 kHz and transmitted to the microphone of an Oticon Epoq XW behind-the-ear hearing aid in an anechoic chamber. An ear simulator was coupled to the receiver of the hearing aid, and the auditory signals were transmitted to a pair of ER3A insert earphones through an equalizer and a measuring amplifier. The tests were carried out in a double-walled sound booth.
NG ET AL. / EAR & HEARING, VOL. 36, NO. 1, 82–91
The stimuli were composed of 140 sentences, forming 20 seven-sentence lists. There were four background noise conditions (two conditions of processing and speech babble in two languages), and therefore each background condition was tested with five lists for each participant. Each of the 20 sentence lists was presented twice in the same test condition, yielding 40 lists in total. To test the effect of repetition, for half of the participants, the identification task was performed (i.e., repetition of final words was required) on two of the five lists, and the identification task not performed (i.e., repetition of final words was not required) on three of the five lists for each of the four conditions. For the other half of the participants, the number of lists tested with and without the identification task was reversed, so that the number of repetitions was balanced across the participants. The participants were told if the final words were to be identified or not before a list began. The free recall task remained unchanged. The order of presentation was randomized for sentences within each list, and the order of the lists was randomized for each participant. All lists were presented in all test conditions across participants in a counterbalanced manner. Four practice sentence lists were administered.
Scoring Method The identification task performance was scored as the percentage of responses given and was obtained only in conditions where final words were repeated. The free recall task was scored as the percentage of correct recall of the responses in the identification task (i.e., a correct recall could be an incorrectly identified but correctly recalled word) for conditions where final words were repeated. For conditions where final words were not repeated, recall performance was scored based on the percentage of correct recall. In these conditions, mispronunciations of the recalled words occurred only in a couple of test trials, and these words were scored as correct recall. Scoring was done online, and all responses were recorded. The serial list position of the recalled words was also analyzed. The probability of recall of pre-recency items (primacy and asymptote items) is assumed to measure retrieval from a long-term storage component, and the probability of recall of recency item is assumed to reflect retrieval from a short-term storage component. The primary, asymptote, and recency serial list positions correspond to the first to second, third to fifth, and sixth to seventh items, respectively, in each sentence list, and the calculations of the performance in each list position were based on the average recall performance in the corresponding serial items. The sequence of the final words being recalled, or the recall order, was also noted.
RESULTS Reading Span Test Out of 24 items, the mean number correctly recalled was 10.0 (SD = 3.0). RS performance did not correlate with either age (r = 0.12, p = 0.57) or PTA (r = −0.08, p = 0.71). Two subgroups of participants were formed based on the RS performance to investigate the interaction between RS and recall performance. The median RS score was 10.5. One group with RS score greater than the median value (high RS, n = 13) had mean score of 12.5 (SD = 1.9), and the other group with RS score lower than the median value (low RS, n = 13) had mean score of 7.7
(SD = 1.8). The mean RS scores in both high and low RS groups are comparable to Ng et al. (2013) (mean = 12.8 and 7.8, SD = 2.5 and 2.2, respectively). Age, PTA, and word identification performance did not differ between these two groups, t(24) = 1.41, p = 0.17; t(24) = 0.75, p = 0.46; and t(24) = −1.30, p = 0.21, respectively.
SWIR TEST Analyses Regarding Control Conditions/Manipulations • This section describes (1) the results of the identification task, (2) whether the identification task was done along with the free recall task, and (3) whether presenting the sentence stimuli twice would have an impact on the analyses of the free recall task. Analyses reported in (1) and (2) concern only the data collected in test conditions where the identification task was performed. 1. Identification task The identification task was performed on half of the stimuli and the mean performance was 93.3% in Swe/NoP (Swedish babble/unprocessed), 98.2% in Can/NoP (Cantonese babble/ unprocessed), 99.51% in Swe/NR (Swedish babble/NR), and 99.18% in Can/NR (Cantonese babble/NR). Thus, final word identification exceeded 93% in all conditions, which was expected as the individualized SNR predicting 95% speech intelligibility in four-talker Swedish babble was used. 2. Manipulations of the analyses of the free recall task The identification task performance approached 100% in all test conditions except Swe/NoP. To make a fair comparison, the differences in the identification task performance should be controlled for in calculating the free recall task performance. Percentage of correct recall should be calculated based on the total number of responses given (cf. Ng et al. 2013). However, the identification task was not performed in half of the test conditions. Differences in identification performance were therefore not known in these conditions. To unify the calculation of the free recall task in all test conditions in the present study, we examined whether the identification task had actually affected recall performance in the different test conditions. Two separate analyses of variance (ANOVAs) with three within-subject factors (processing, speech babble, and serial position) and one between-subject factor (RS group) were performed. Both ANOVAs (when identification performance was controlled and not controlled for) gave highly similar patterns of results. There were main effects of processing, serial position, and RS group. There was a significant twoway interaction: serial position × RS. There was a tendency towards a significant interaction between processing and serial position. Because the patterns of results were the same for both calculation methods and because identification task performance was not obtained in four out of the eight test conditions, scoring of the free recall task was based on the number of correct recall out of the total number of items in a list for all test conditions. The results of the free recall task are shown in Table 1. 3. Effect of presenting the stimuli twice on free recall performance To examine the impact of listening to each sentence twice on recall performance, an ANOVA was performed to compare
NG ET AL. / EAR & HEARING, VOL. 36, NO. 1, 82–91
recall performance obtained from the first and the second presentation of the stimuli in the eight test conditions. This ANOVA had two between-subject factors: two levels of presentation (first, second) and eight levels of test condition (four conditions Swe/NoP, Swe/NR, Can/NoP, and Can/NR where final words were identified; and these four conditions where final words were not identified). Mean recall performance was higher at the second presentation, F(1, 175) = 17.40, p = 0.00, ηp2 = 0.41. Crucially, however, this effect did not interact with test condition, F(7, 175) = 0.29, p = 0.96, ηp2 = 0.11.
Analyses Addressing the Aims of the Study An ANOVA with four within-subject factors (processing, speech babble, final word repetition, and serial position) and one between-subject factor (RS group) was performed. There were four significant main effects: processing, F(1, 24) = 31.28, p = 0.00, ηp2 = 0.57 (recall performance with NR was better than that with NoP); speech babble, F(1, 24) = 5.09, p = 0.03, ηp2 = 0.18 (recall performance in Cantonese babble was better than that in Swedish babble); RS group, F(1, 24) = 16.38, p = 0.00, ηp2 = 0.41 (high RS performed better than low RS); and serial position, F(2, 48) = 35.95, p = 0.00, ηp2 = 0.60. The post hoc t-tests (Bonferroni adjusted for multiple comparisons at the 0.05 level) showed that performance in recency was better than that in primacy, t(144) = 4.97, p = 0.00, which was also better than that in asymptote, t(144) = 3.80, p = 0.00). There was no main effect of final word repetition. Three two-way interactions were significant: (1) processing × speech babble, F(1, 24) = 5.52, p = 0.03, ηp2 = 0.19 (Fig. 2A), such that recall performance in Cantonese speech babble with and without NR did not differ statistically, and that in Swedish babble without NR performance was worse than with NR, t(144) = 2.08, p = 0.04, which did not differ from recall performance in Cantonese speech babble either with or without NR; (2) serial position × RS group, F(2, 48) = 4.19, p = 0.02, ηp2 = 0.15 (Fig. 2B), indicating that the high RS group performed better than the low RS group in primacy only, t(48) = 6.72, p = 0.00; and (3) final word repetition × serial position, F(2, 48) = 3.47, p = 0.04, ηp2 = 0.13. However, no significant simple effects were emerged from this interaction (p-values ranging from 0.07 to 0.10). The processing × serial position interaction was marginally significant, F(2, 48) = 3.03, p = 0.057, ηp2 = 0.11 (Fig. 2C), suggesting
that recall performance was better in recency with NR than without, t(144) = 2.36, p = 0.02. Two three-way interactions were also significant: (1) processing × serial position × RS group, F(2, 48) = 3.32, p = 0.05, ηp2 = 0.12 (Fig. 3); and (2) processing × final word repetition × serial position, F(2, 48) = 3.27, p = 0.05, ηp2 = 0.12. Since the processing × serial position interaction showed an effect in the recency position, which is in line with our prediction, we investigated the simple main effects in the processing × serial position × RS interaction using separate ANOVAs for both high and low RS groups. Recall performance was significantly better when there was NR than when there was no NR in the low RS group in recency only, t(144) = 2.64, p = 0.01. The ANOVA on the high RS also demonstrated a main effect of processing, F(1, 12) = 29.83, p = 0.00, ηp2 = 0.56, but there was no interaction between processing and serial position, suggesting that NR had a similar effect on recall performance across all serial positions. To further investigate how working memory capacity affects memory, the recall order of final words in the free recall task was analyzed. Table 2 shows the output order analysis, which is the proportion of words recalled for each serial list position (input position) by serial recall order (output position). For example, input position 7 with output position 1 refers to the probability of the final word of the seventh sentence in a list to be the first word to be recalled in recall sequence. To study the participants’ recall preference, we examined the list position of the first sentencefinal word being recalled (output position = 1 only). This analysis reflects recall strategy. An ANOVA was performed to examine the original list position of the first word recalled by RS group. One within-subject factor, serial list position (seven levels), and one between subject factor, RS group, were entered. There were a significant main effect of serial list position, F(6, 144) = 36.86, p = 0.00, ηp2 = 0.61, and a significant interaction between serial list position and RS group, F(6, 144) = 4.67, p = 0.00, ηp2 = 0.16. The post hoc t-tests indicated that the participants in the low RS group were more likely to recall the last item in a list (input position 7) than the first item in a list (input position 1), t(25) = 3.62, p = 0.01. For the high RS group, no such difference was found, t(25) = 1.22, p = 0.23. The input–output pattern showed that these two groups had different strategies in recalling the sentence-final words. For the low RS group, they emphasized recall of words in late serial positions, whereas the high RS group seemed to start with early positions to a larger extent.
TABLE 1. Means (in percentage) and standard deviations of the free recall task (the test conditions where the identification task was and was not performed were pooled) Speech babble
Processing Condition Unprocessed (NoP)
RS High Low
Noise reduction (NR)
M SD M SD M SD M SD
65.06 23.11 38.12 20.12 74.20 19.79 48.43 22.56
47.97 15.44 37.47 8.90 57.59 17.70 32.91 15.68
71.79 14.17 64.40 18.12 78.69 13.85 78.33 16.19
73.05 18.05 44.55 22.72 72.76 19.94 44.13 28.84
48.82 13.28 35.79 8.91 50.53 18.26 37.93 11.33
79.78 12.65 70.35 14.16 83.65 11.78 78.72 11.68
NG ET AL. / EAR & HEARING, VOL. 36, NO. 1, 82–91
Fig. 2. Significant two-way interactions between (A) processing (unprocessed [NoP] and with noise reduction [NR]) and speech babble (Swedish, Swe) and (Cantonese, Can); (B) serial position (primacy, asymptote, and recency) and reading span (RS) group (high RS and low RS); and (C) processing and serial position. The y-axis represents the percentage of words correctly recalled in the sentence-final word identification and recall test. Error bars represent standard deviation.
DISCUSSION The results of the present study showed that binary masking reduced the adverse effect of noise on speech memory for hearing aid users. This effect did not interact with RS skill, demonstrating that binary masking improved speech recall irrespective of working memory capacity. Thus, the positive effect of NR on memory for speech for hearing aid users (Ng et al. 2013) could be extended from individuals with high working memory capacity to individuals with low working memory capacity by using a less cognitively demanding test. Further, the effect of NR on speech recall was mainly attributable to words heard in late list positions. Recall performance was more disrupted by native than foreign speech babble and NR improved recall performance in native but not foreign competing speech.
Effects of Noise Reduction and Recall Strategies on Memory Competing speech has a disruptive effect on memory for speech, and NR reduced that effect. In Ng et al. (2013), where the memory task was more demanding than in the present study, only the individuals with better working memory capacity showed effects of NR on memory and those with lower capacity showed no statistically significant benefit from NR in terms of recall performance. In the present study, the effect of
Fig. 3. Significant three-way interaction between processing (unprocessed [NoP] and with noise reduction [NR]), serial position (primacy, asymptote, and recency), and reading span (RS) group (high RS and low RS). Error bars represent standard deviation.
NR on memory for speech heard against a competing talker were found for all participants regardless of working memory capacity, which is also in line with our prediction. We argue that when the memory task is less demanding (fewer items to be remembered and more favorable individualized SNRs as compared to the original SWIR test), the impact of working memory capacity on task performance becomes less important because individuals with comparatively lower working memory capacity had sufficient cognitive capacity to accomplish the task and were hence more likely to benefit from signal processing. Therefore, the results of the present study show that NR improves recall of speech heard in noise for hearing aid users, irrespective of working memory capacity. Interestingly, the statistical analysis showed a three-way interaction between NR, serial position, and working memory, which indicates that the pattern of improvement is dependent on working memory capacity. The analysis of serial order of recall showed that while participants with lower working memory capacity tended to recall words in late list positions first, the tendency to begin a recall with words in early list position did not differ from that with words in late list position for participants with better capacity. This is in line with previous studies that have demonstrated different recall strategies in people with different cognitive capacity. For example, Unsworth et al. (2011) showed that individuals who focused on both primacy and recency items had higher working memory capacity than those who focused on the recency items only. Unsworth and Engle (2007) also demonstrated that individuals with low working memory capacity have a greater tendency to start recalling late list items. Since NR enhances recall of words in late list items in particular, it may have a more prominent effect for those who tended to recall words in late list positions first. This may explain why the effect of NR became more pronounced in recall of late list positions for participants with lower working memory capacity. The present study suggests that higher working memory capacity improves word recall. It is also worth noting that the main effect of the RS group and its interaction with serial position suggested that in general (both with and without NR), the participants with better working memory capacity showed better recall performance than those with lower capacity, particularly in the early list positions. Even though all participants had reasonably good capacity to perform the modified SWIR test in the present study, the participants who had better working memory
NG ET AL. / EAR & HEARING, VOL. 36, NO. 1, 82–91
TABLE 2. Proportion of words recalled for each serial list position (input position) by serial recall order (output position) and reading span group Input Position
High RS Output position
Low RS Output position
1 2 3 4 5 6 7
0.50 0.07 0.13 0.19 0.08 0.04 0.00
0.10 0.45 0.13 0.16 0.11 0.05 0.00
0.03 0.14 0.44 0.18 0.13 0.05 0.02
0.11 0.14 0.30 0.27 0.12 0.06 0.00
0.06 0.27 0.32 0.17 0.14 0.04 0.01
0.09 0.46 0.17 0.16 0.10 0.03 0.00
0.46 0.12 0.12 0.16 0.09 0.05 0.00
1 2 3 4 5 6 7
0.36 0.19 0.23 0.17 0.05 0.01 0.00
0.13 0.36 0.21 0.19 0.10 0.02 0.00
0.10 0.29 0.32 0.16 0.09 0.03 0.00
0.15 0.24 0.37 0.17 0.03 0.03 0.01
0.15 0.34 0.29 0.15 0.05 0.02 0.00
0.13 0.52 0.20 0.09 0.06 0.00 0.00
0.62 0.12 0.12 0.09 0.03 0.01 0.00
RS, reading span.
capacity showed better recall performance than those with lower capacity. Better recall performance in early list position, which represents better encoding of words into long-term storage, has also shown to be associated with better cognitive capacity (Unsworth 2007). Sarampalis et al. (2009) showed that the magnitude of the primacy effect is related to task difficulty (with and without contextual cues) and SNR. In other words, as more cognitive resources are devoted to the task and less resources are being spared, the primacy effect diminishes. On the other hand, when the task is less cognitively demanding and more resources remain, the primacy effect becomes more pronounced because more resources are available to encode words from working memory into long-term storage. Again, this pattern of findings explains why in the present study individuals with better cognitive capacity showed better memory for words in early list positions than those with lower capacity in the modified SWIR test. This explanation is also consistent with the previous argument that the participants with better working memory capacity had their cognitive resources distributed more evenly in remembering items across the entire list, and those with lower capacity devoted more cognitive resources in the late list items.
Effects of Masker Language on Memory Memory for speech was more disrupted when the competing speech was in the same target native language than when it was in a foreign language. This is in agreement with the results reported in Tun et al. (2002), who showed that memory for words is more disrupted when the target words are heard in competing speech in the native language of older adults. It has been well documented that recognizing speech masked by competing speech is more detrimental when the speech is in an intelligible language than in an unintelligible or unfamiliar language (see Mattys et al. 2009 for review). In the present study, one of the reasons why the Swedish language background has a stronger masking effect is that segregation of target speech from a competing background is harder when they are in the same language (e.g., Brouwer et al. 2012). The masking effect of Cantonese
competing speech, on the other hand, was not as strong as for Swedish. Since the linguistic features of Cantonese (such as phonemes, words and prosodic characteristics, and syllabic and morphological structures) are substantially different from those of Swedish, and this language is foreign and unintelligible, the target speech in Swedish could be segregated relatively easily from Cantonese speech background (Van Engen & Bradlow 2007), regardless of the presence of NR. In addition, the differences in the spectral properties of the babbles could offer another possible explanation as to why Cantonese was a less effective masker. Although the long-term average speech spectra have been found to be highly similar across languages (Byrne et al. 1994), the Cantonese babble used in the present study has less energy above approximately 4000 Hz than the Swedish babble (Fig. 4). This is comparable to what is reported by Calandruccio et al. (2013), who compared the long-term average speech spectra of speech maskers in English, Dutch, and Mandarin. The Cantonese babble has less spectral energy at higher frequencies and hence is a less effective masker than the Swedish babble. Taken together, it demands more attentional and processing resources when the competing speech is in Swedish for Swedish native speakers, which consequently impairs recall performance. Noise reduction improves recall performance when the target and competing background are in the native language. The competing background is attenuated by NR, which makes the target speech more audible. The bottom-up spectral-temporal stream segregation would become easier and more efficient. Moreover, NR could possibly reduce the masking effect caused by the lexical-semantic information in the competing speech background. The additional attentional and processing resources involved due to the presence of competing background in native language would therefore decrease, and hence more resources can be freed for other cognitive tasks. The results did not demonstrate any effect of NR on memory when the competing speech was in a different language from the target speech that was both unfamiliar and dissimilar. The Cantonese babble is a less effective masker than the Swedish
NG ET AL. / EAR & HEARING, VOL. 36, NO. 1, 82–91
benefit mainly in late list positions. Memory for native speech was more disrupted by native babble than foreign babble, but the disruptive effect of native speech babble was reduced to that of foreign babble when there was NR. We argue that NR facilitated segregation of target native speech from native speech babble, and hence memory for the target words was improved.
ACKNOWLEDGMENTS Fig. 4. Long-term average speech spectra for four-talker babble in Swedish (solid line) and Cantonese (dotted line).
babble, in terms of spectral differences, and therefore the effect of NR in this masker might have been restrained. Future studies should focus on how NR and memory for target speech are affected by the language familiarity and spectral-temporal differences of the competing maskers. The present study suggests that foreign babble is a less effective masker. However, the experimental design of this study cannot answer the question of whether foreign babble has an adverse effect on memory. A no-noise condition should be included in the experimental design of future studies to compare recall performance in foreign babble against recall performance in quiet. If foreign babble has no adverse effect on memory, it may also explain why NR has no effect on memory in this kind of background noise. Recall performance did not differ between masker languages in the presence of NR. In other words, NR removed the differential effect of masker language in competing speech on memory for words. Our findings are in agreement with the results reported in Van Engen and Bradlow (2007), such that speech recognition performance is more interfered when the competing speech was in native language than in a foreign language in a more adverse SNR (−5 dB), and the background noise in both languages were equally interfering in a less adverse SNR (0 dB and above). These results suggest that at a low SNR, the difference in performance between native and foreign language could not be solely explained by the energy of competing speech (Brungart 2001). The linguistic context of competing speech in native language had an additional masking effect when the competing speech was more audible. The effect of masker language is minimized when competing speech becomes relatively less audible at better SNRs. We argue that with NR, competing speech is attenuated and therefore auditory stream segregation is facilitated and the distraction due to the presence of irrelevant linguistic information has also been diminished. This is particularly true for competing speech in Swedish, where segregation was more difficult than that in Cantonese when there was no NR.
CONCLUSIONS Noise reduction improved memory for speech heard in competing speech for hearing aid users. This improvement was mainly attributable to words that occurred in the late list position and was found irrespective of working memory capacity. However, the pattern across list positions of the effect of NR on memory was modulated by both cognitive capacity and by differences in recall strategy. Participants with better working memory showed more consistent recall benefit of NR across list positions, while participants with low working memory capacity showed such a
The authors thank Michael Syskind Pedersen and Ulrik Kjems from Oticon A/S, Smørum, Denmark, for their contribution of ideas regarding the noise reduction signal processing, René Burmand Johannesson from Eriksholm Research Centre, Oticon A/S, Denmark, and Mathias Hällgren from the Department of Technical Audiology, Linköping University, Sweden, for their technical support. This research was funded by the Swedish Research Council. The authors declare no other conflict of interest. Address for correspondence: Elaine H. N. Ng, Department of Behavioural Sciences and Learning, Linköping University, SE-581 83 Linköping, Sweden. E-mail: [email protected]
Received June 11, 2013; accepted May 29, 2014.
REFERENCES Akeroyd, M. A. (2008). Are individual differences in speech reception related to individual differences in cognitive ability? A survey of twenty experimental studies with normal and hearing-impaired adults. Int J Audiol, 47 Suppl 2, S53–S71. Baddeley, A. D., Logie, R., Nimmo-Smith, I. (1985). Components of fluent reading. J Mem Lang, 24, 119–131. Ben-David, B. M., Tse, V. Y., Schneider, B. A. (2012). Does it take older adults longer than younger adults to perceptually segregate a speech target from a background masker? Hear Res, 290, 55–63. Besser, J., Koelewijn, T., Zekveld, A. A., et al. (2013). How linguistic closure and verbal working memory relate to speech recognition in noise–A review. Trends Amplif, 17, 75–93. Boldt, J. B., Kjems, U., Pedersen, M. S., et al. (2008, 14–17 September). Estimation of the ideal binary mask using directional systems. Proceedings of the 11th International Workshop on Acoustic Echo and Noise Control, URL (consulted April 2013): http://www.iwaenc.org/proceedings/2008/contents/papers/9062.pdf. Bradlow, A., Clopper, C., Smiljanic, R., et al. (2010). A Perceptual phonetic similarity space for languages: Evidence from Five Native Language Listener Groups. Speech Commun, 52, 930–942. Brouwer, S., Van Engen, K. J., Calandruccio, L., et al. (2012). Linguistic contributions to speech-on-speech masking for native and non-native listeners: Language familiarity and semantic content. J Acoust Soc Am, 131, 1449–1464. Brungart, D. S. (2001). Informational and energetic masking effects in the perception of two simultaneous talkers. J Acoust Soc Am, 109, 1101–1109. Brungart, D. S., Chang, P. S., Simpson, B. D., et al. (2006). Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. J Acoust Soc Am, 120, 4007–4018. Byrne, D., Dillon, H., Tran, K., et al. (1994). An international comparison of long-term average speech spectra. J Acoust Soc Am, 96(4), 2108–2120. Calandruccio, L., Brouwer, S., Van Engen, K. J., et al. (2013). Masking release due to linguistic and phonetic dissimilarity between the target and masker speech. Am J Audiol, 22, 157–164. Calandruccio, L., Dhar, S., Bradlow, A. R. (2010). Speech-on-speech masking with variable access to the linguistic content of the masker speech. J Acoust Soc Am, 128, 860–869. Daneman, M., & Carpenter, P. A. (1980). Individual differences in working memory and reading. J Verb Learn Verb Behav, 19(4), 450–466. Festen, J. M., & Plomp, R. (1990). Effects of fluctuating noise and interfering speech on the speech-reception threshold for impaired and normal hearing. J Acoust Soc Am, 88, 1725–1736. Foo, C., Rudner, M., Rönnberg, J., et al. (2007). Recognition of speech in noise with new hearing instrument compression release settings requires
NG ET AL. / EAR & HEARING, VOL. 36, NO. 1, 82–91
explicit cognitive storage and processing capacity. J Am Acad Audiol, 18, 618–631. Francis, A. L., & Nusbaum, H. C. (2009). Effects of intelligibility on working memory demand for speech perception. Atten Percept Psychophys, 71, 1360–1374. Freyman, R. L., Balakrishnan, U., Helfer, K. S. (2004). Effect of number of masking talkers and auditory priming on informational masking in speech recognition. J Acoust Soc Am, 115(5 Pt 1), 2246–2256. Gatehouse, S., Naylor, G., Elberling, C. (2003). Benefits from hearing aids in relation to the interaction between the user and the environment. Int J Audiol, 42 Suppl 1, S77–S85. Hall, J. W., III, Grose, J. H., Buss, E., et al. (2002). Spondee recognition in a two-talker masker and a speech-shaped noise masker in adults and children. Ear Hear, 23, 159–165. Heinrich, A., & Schneider, B. A. (2011). Elucidating the effects of ageing on remembering perceptually distorted word pairs. Q J Exp Psychol (Hove), 64, 186–205. Heinrich, A., Schneider, B. A., Craik, F. I. (2008). Investigating the influence of continuous babble on auditory short-term memory performance. Q J Exp Psychol (Hove), 61, 735–751. Hällgren, M., Larsby, B., Arlinger, S. (2006). A Swedish version of the Hearing In Noise Test (HINT) for measurement of speech recognition. Int J Audiol, 45, 227–237. Levitt, H. (1971). Transformed up-down methods in psychoacoustics. J Acoust Soc Am, 49 Suppl 2, 467. Li, Y., & Wang, D. (2009). On the optimality of ideal binary time-frequency masks. Speech Comm, 51, 230–239. Lunner, T. (2003). Cognitive function in relation to hearing aid use. Int J Audiol, 42 Suppl 1, S49–S58. Lunner, T., Rudner, M., Rönnberg, J. (2009). Cognition and hearing aids. Scand J Psychol, 50, 395–403. MATLAB and Statistics Toolbox Release (2012). The MathWorks, Inc., Natick, Massachusetts, USA. Mattys, S. L., Brooks, J., Cooke, M. (2009). Recognizing speech under a processing load: Dissociating energetic from informational factors. Cogn Psychol, 59, 203–243. McCoy, S. L., Tun, P. A., Cox, L. C., et al. (2005). Hearing loss and perceptual effort: Downstream effects on older adults’ memory for speech. Q J Exp Psychol A, 58, 22–33. Moore, B. C. (2008). The choice of compression speed in hearing aids: Theoretical and practical considerations and the role of individual differences. Trends Amplif, 12, 103–112. Murphy, D. R., Craik, F. I., Li, K. Z., et al. (2000). Comparing the effects of aging and background noise on short-term memory performance. Psychol Aging, 15, 323–334. Ng, E. H., Rudner, M., Lunner, T., et al. (2013). Effects of noise and working memory capacity on memory processing of speech for hearing-aid users. Int J Audiol, 52, 433–441. Rabbitt, P. (1990). Mild hearing loss can cause apparent memory failures which increase with age and reduce with IQ. Acta Otolaryngol Suppl, 476, 167–175; discussion 176.
Rhebergen, K. S., Versfeld, N. J., Dreschler, W. A. (2005). Release from informational masking by time reversal of native and non-native interfering speech. J Acoust Soc Am, 118(3 Pt 1), 1274–1277. Rudner, M., Rönnberg, J., Lunner, T. (2011). Working memory supports listening in noise for persons with hearing impairment. J Am Acad Audiol, 22, 156–167. Rönnberg, J. (2003). Cognition in the hearing impaired and deaf as a bridge between signal and dialogue: A framework and a model. Int J Audiol, 42 Suppl 1, S68–S76. Rönnberg, J., Lunner, T., Zekveld, A., et al. (2013). The ease of language understanding (ELU) model: Theoretical, empirical and clinical advances. Front Syst Neurosci, 7, 13. doi:10.3389/fnsys.2013.00031. Rönnberg, J., Rudner, M., Foo, C., et al. (2008). Cognition counts: A working memory system for ease of language understanding (ELU). Int J Audiol, 47 Suppl 2, S99–S105. Rönnberg, J., Rudner, M., Lunner, T., et al. (2010). When cognition kicks in: Working memory and speech understanding in noise. Noise Health, 12, 263–269. Sarampalis, A., Kalluri, S., Edwards, B., et al. (2009). Objective measures of listening effort: Effects of background noise and noise reduction. J Speech Lang Hear Res, 52, 1230–1240. Smeds, K., Wolters, F., Rung, M. (2012, August). Estimation of Realistic Signal-to-Noise Ratios. Unpublished paper presented at the International Hearing Aid Research Conference (IHCON) in Lake Tahoe, CA, USA. Souza, P. E., & Turner, C. W. (1994). Masking of speech in young and elderly listeners with hearing loss. J Speech Hear Res, 37, 655–661. Sörqvist, P., & Rönnberg, J. (2012). Episodic long-term memory of spoken discourse masked by speech: What is the role for working memory capacity? J Speech Lang Hear Res, 55, 210–218. Tun, P. A., McCoy, S., Wingfield, A. (2009). Aging, hearing acuity, and the attentional costs of effortful listening. Psychol Aging, 24, 761–766. Tun, P. A., O’Kane, G., Wingfield, A. (2002). Distraction by competing speech in young and older adult listeners. Psychol Aging, 17, 453–467. Unsworth, N. (2007). Individual differences in working memory capacity and episodic retrieval: Examining the dynamics of delayed and continuous distractor free recall. J Exp Psychol Learn Mem Cogn, 33, 1020–1034. Unsworth, N., Brewer, G. A., Spillers, G. J. (2011). Inter- and intra-individual variation in immediate free recall: An examination of serial position functions and recall initiation strategies. Memory, 19, 67–82. Unsworth, N., & Engle, R. W. (2007). The nature of individual differences in working memory capacity: Active maintenance in primary memory and controlled search from secondary memory. Psychol Rev, 114, 104–132. Van Engen, K. J., & Bradlow, A. R. (2007). Sentence recognition in nativeand foreign-language multi-talker background noise. J Acoust Soc Am, 121, 519–526. Wang, D., Kjems, U., Pedersen, M. S., et al. (2009). Speech intelligibility in background noise with ideal binary time-frequency masking. J Acoust Soc Am, 125, 2336–2347. Wingfield, A., Tun, P. A., & McCoy, S. L. (2005). Hearing loss in older adulthood: What it is and how it interacts with cognitive performance. Curr Dir Psychol Sci, 14, 144–148. Zekveld, A. A., Rudner, M., Johnsrude, I. S., et al. (2013). The effects of working memory capacity and semantic cues on the intelligibility of speech in noise. J Acoust Soc Am, 134, 2225–2234.