Hearing Research 310 (2014) 36e47

Contents lists available at ScienceDirect

Hearing Research journal homepage: www.elsevier.com/locate/heares

Research paper

Evaluation of the sparse coding shrinkage noise reduction algorithm in normal hearing and hearing impaired listeners Jinqiu Sang a, Hongmei Hu a, Chengshi Zheng b, Guoping Li a, Mark E. Lutman a, Stefan Bleeck a, * a b

Institute of Sound and Vibration Research, University of Southampton, SO17 1BJ, UK Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China

a r t i c l e i n f o

a b s t r a c t

Article history: Received 20 August 2013 Received in revised form 15 January 2014 Accepted 24 January 2014 Available online 2 February 2014

Although there are numerous single-channel noise reduction strategies to improve speech perception in noise, most of them improve speech quality but do not improve speech intelligibility, in circumstances where the noise and speech have similar frequency spectra. Current exceptions that may improve speech intelligibility are those that require a priori knowledge of the speech or noise statistics, which limits practical application. Hearing impaired (HI) listeners suffer more in speech intelligibility than normal hearing listeners (NH) in the same noisy environment, so developing better single-channel noise reduction algorithms for HI listeners is justified. Our model-based “sparse coding shrinkage” (SCS) algorithm extracts key speech information in noisy speech. We evaluate it by comparison with a state-ofthe-art Wiener filtering approach using speech intelligibility tests with NH and HI listeners. The modelbased SCS algorithm relies only on statistical signal information without prior information. Results show that the SCS algorithm improves speech intelligibility in stationary noise and is comparable to the Wiener filtering algorithm. Both algorithms improve intelligibility for HI listeners but not for NH listeners. Improvement is less in fluctuating (babble) noise than in stationary noise. Both noise reduction algorithms perform better at higher input signal-to-noise ratios (SNR) where HI listeners can benefit but where NH listeners have already reached ceiling performance. The difference between NH and HI subjects in intelligibility gain depends fundamentally on the input SNR rather than the hearing loss level. We conclude that HI listeners need different signal processing algorithms from NH subjects and that the SCS algorithm offers a promising alternative to Wiener filtering. Performance of all noise reduction algorithms is likely to vary according to extent of hearing loss and algorithms that show little benefit for listeners with moderate hearing loss may be more beneficial for listeners with more severe hearing loss. Ó 2014 Elsevier B.V. All rights reserved.

1. Introduction For people with mild to severe hearing losses, current advanced hearing aids can help improve speech perception in quiet environments. However, one important reason why hearing-aid users often do not like to use hearing aids is that the current hearing aids do not work well in background noise (Alcantara et al., 2003; Dillon, 2001). Hearing-impaired (HI) people typically require a Abbreviations: BKB, BamfordeKowaleBench sentence; CI, cochlear implant; CSWF, a Wiener filtering approach with cepstral smoothing; HA, hearing aid; HI, hearing impaired; MAP, maximum a posteriori; NAL, National Acoustics Laboratory procedure; NH, normal hearing; NPSD, noise power spectral density; SCS, sparse coding shrinkage; SNR, signal-to-noise-ratio; SPP, speech presence probability; SRT, speech reception threshold; SSN, speech shaped noise * Corresponding author. Tel.: þ44 (0)2380596682; fax: þ44 (0)2380593190. E-mail addresses: [email protected], [email protected] (S. Bleeck). 0378-5955/$ e see front matter Ó 2014 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.heares.2014.01.006

speech-to-noise ratio that is 3e6 dB higher than normal-hearing (NH) people to achieve the same degree of speech intelligibility (Alcantara et al., 2003; Plomp, 1994). Therefore, noise reduction strategies in hearing aids are one critical factor to help improve quality of life for HA users. The most effective speech enhancement method to improve speech intelligibility today is through beamforming using microphone arrays (Kates and Weiss, 1996; Levitt, 2001; Schum, 2003), however, they work best with large microphone arrays. They also only work effectively when the target speech and interfering sounds are coming from different directions. However, due to the small size, usually only one or two microphones are placed in a HA and it is not possible to create large enough arrays. Therefore there is still a need to also develop better single channel noise reduction schemes. Currently, most HAs are equipped with a combination of single-channel noise reduction algorithms and beam-forming

J. Sang et al. / Hearing Research 310 (2014) 36e47

strategies (Widrow and Luo, 2003), and together they determine the final noise reduction performance of a HA. There are also situations in which only single-channel strategies can be used, for example in telephone speech or in HAs that are placed entirely in the ear canal. The scope of the present work is limited to the effects of singlechannel noise reduction algorithms on speech intelligibility in noise, more specifically in situations when the noise has the same long-term frequency spectrum as the speech signal. That is one of the most challenging situations for noise reduction algorithms. The scope does not include assessment of speech quality or trading off speech intelligibility against speech quality improvement. Previous research (Dahlquist et al., 2005; Levitt, 1993; Levitt et al., 1993; Weiss and Neuman, 1993) has shown that a higher signal-to-noise ratio (SNR) in HAs does improve speech quality, but does not necessarily lead to benefits in understanding speech for a hearing-impaired listener. Noise reduction algorithms lower the noise level thereby reducing the loudness and annoyance of the interfering noise. The lower noise level is less distracting and is a factor contributing to the reported improvements in sound quality. Noise reduction strategies often do not improve speech intelligibility, because the processing can remove essential parts of the signal or distort the speech in a way that reduces intelligibility. The main exception to this generalization is when speech and noise have different frequency spectra and simple filtering can reduce the remote masking effect of noise without reducing the speech signal adversely. Reviews of single-channel noise reduction algorithms with NH listeners to date have concluded that no speech intelligibility improvement occurs (for example Hu and Loizou, 2007), except for the algorithms which have a priori knowledge of the statistics of the speech and/or background noise (Kim and Loizou, 2010). However, such algorithms are neither robust nor practical in real acoustic environments. On the other hand, noise reduction algorithms have shown inconclusive positive and negative effects with HI listeners (Arehart et al., 2003; Dahlquist et al., 2005; Elberling et al., 1993; Harlander et al., 2012; Jamieson et al., 1995; Levitt, 2001; Levitt et al., 1993). Although there have been studies with positive effects on intelligibility for HI listeners, this may have been due to the noise having a different spectrum from the speech and is thus easier to reduce (Arehart et al., 2003). When noise spectrum is similar to speech spectrum, the general finding is that algorithms improve speech quality, without improving speech intelligibility. For example, Elberling et al. (1993) evaluated three spectral subtraction algorithms in babble noise and reported that the algorithms reduced the noise level but did not improve speech intelligibility in either NH listeners or HI listeners. Most recently, Harlander et al. (2012) evaluated model-based versus nonparametric monaural noise reduction approaches with HI listeners in stationary noises, non-stationary noises and one quasistationary noise. They found that none of the algorithms improved speech intelligibility, although the two model-based noise reduction algorithms improved speech quality (Harlander et al., 2012). The main exception to the generalization that singlechannel noise reduction algorithms do not improve speech intelligibility occurs in studies of cochlear implant (CI) users. For example, a nonlinear spectral subtraction algorithm (Lockwood and Boudy, 1992) that did not show speech intelligibility improvement for HA users (Dahlquist et al., 2005) showed intelligibility improvement for CI users for the same speech shaped noise and for the same sentence tests (Verschuur et al., 2006). A very similar effect was shown in an independent study (Yang and Fu, 2005). When developing noise reduction algorithms for HI listeners, all hearing loss factors should be taken into account, and compensated

37

for, where possible. For people with sensorineural hearing losses, hearing loss factors include threshold elevation, loudness recruitment, reduced frequency selectivity and reduced temporal resolution. Automatic gain control can compensate for threshold elevation and loudness recruitment, but there are currently no appropriate solutions to compensate reduced frequency selectivity and reduced temporal resolution. Some researchers have attempted to compensate for reduced frequency selectivity with spectral sharpening but this did not improve intelligibility (Baer et al., 1993). A possible solution to reduce the effects of reduced frequency and temporal resolution is to extract and preserve key speech information while at the same time reducing the remaining speech and the overall noise. This way, there will be less self-masking and noise-masking of speech components yet essential speech information may be preserved after noise reduction. Of course, this begs the question of how to identify and preserve the key speech information. That is the focus of the present work. To this end, we investigate here a sparse coding shrinkage (SCS) noise reduction algorithm to extract key information from noisy speech. The approach exploits the principle that the speech signal is highly redundant and information is distributed sparsely in a noisy speech signal. By increasing the sparseness of a noisy speech signal, there is a large likelihood that intelligibility is improved (Li et al., 2012). The algorithm assumes a super-Gaussian (sparse) distribution of the principal components in clean speech and works by applying sparse shrinkage on the principal components. SCS was first proposed by (Hyvärinen, 1999) for image noise reduction (Hyvärinen et al., 1998) and later also for speech enhancement in noise (Hu et al., 2013, 2011; Li, 2008; Li and Lutman, 2008; Potamitis et al., 2001; Sang et al., 2011a,b; Zou et al., 2008). Sparse coding has shown significant benefit for cochlear implant users (Li and Lutman, 2008) and this suggests that there may be potential benefits of SCS for HA users too. The performance of the SCS algorithm is compared with a stateof-the-art Wiener filtering approach: CS-WF (Breithaupt et al., 2008; Gerkmann and Martin, 2009; Gerkmann and Hendriks, 2012). Wiener filtering approaches can reach optimal performance when the speech and noise both have Gaussian distribution. However, in real environments, neither noise nor speech is usually Gaussian. As SCS has been developed to estimate the speech components with the assumption of super-Gaussian distribution, we hypothesize that SCS might perform better than CS-WF especially for HI listeners with reduced frequency and temporal resolution, who we propose would benefit from removal of redundant parts of the speech signal as well as noise. The SCS was also compared with unprocessed speech as baseline performance in a noisy environment without any algorithms applied. Previous research demonstrated that noise reduction algorithms might reduce speech intelligibility for HI listeners (Dahlquist et al., 2005). The comparison with unprocessed speech is used to investigate whether there is any benefit of noise reduction algorithms for HI listeners. Babble noise and speech shaped noise were chosen as the additive noise due to their similar average long term spectrum when compared with the speech signal. In much of the previous research, speech intelligibility is quantified in terms of percentage of identified words (or syllables) correct. Percentage intelligibility is often measured at fixed input SNRs. As such intelligibility measures choose fixed input SNR during speech recognition tests, a relatively low input SNR might show poor performance of both unprocessed speech and enhanced speech while a relatively high input SNR might already show high performance of unprocessed speech and no need of noise reduction algorithms. Therefore such intelligibility measures are inherently limited by floor or ceiling effects. An alternative measure of speech

38

J. Sang et al. / Hearing Research 310 (2014) 36e47

intelligibility that is less prone to floor and ceiling effects is the adaptive procedure to measure the speech reception threshold (SRT), which is the signal-to-noise ratio at which the participant scored a fixed percentage correct (Dirks et al., 1982). The adaptive method (‘up-down’ procedure) has been evaluated to be accurate and efficient (Dirks et al., 1982; Levitt, 1971). Our evaluation follows the three-up-one-down procedure in speech recognition tests to measure the SRT corresponding to 79.4% correct recognition threshold. The structure of this paper is as follows. Firstly, the methods of the two noise reduction algorithms and the subjective intelligibility evaluation are introduced in Section 2. After that, the speech intelligibility performance of the two algorithms are presented and analyzed in Section 3. Factors that affect the performance of noise reduction algorithms in speech intelligibility are discussed in Section 4. Conclusions are presented in Section 5. 2. Materials and methods 2.1. Signal processing 2.1.1. Principle of sparse coding shrinkage This SCS principle is applied to estimate a random variable corrupted in Gaussian noise given sparse distribution of the random variable. Details in this section are also described in Hyvärinen (1999). A noise reduction algorithm for speech enhancement based on the SCS principle will be introduced in the next section. We first consider only scalar random variables. s denotes the original non-Gaussian random variable and v the Gaussian noise with zero mean and variance s2. Assume that we observe only the random variable y:

y ¼ sþv

(1)

The maximum a posteriori (MAP) estimator is used to estimate the original variables:

b s ¼ arg max pðsjyÞ ¼ arg max s

s

pðyjsÞpðsÞ pðyÞ

(2)

wherebs is the estimated original clean variable s, p denotes probability density, p(yjs) is the conditional density of the observation y given s, which is the density of noise evaluated at y  s, i.e., p(yjs) ¼ pv(y  s), and p(s) is the a priori distribution density of original clean signal s, which for clarity of the following derivation we denote by ps(s). Since p(y) does not depend on s, Equation (2) can be rewritten as

b s ¼ arg max pv ðy  sÞps ðsÞ s

(3)

Using the negative log density fs ¼ lnps, then

b s ¼ arg min s

1 ðy  sÞ2 þ fs ðsÞ 2s2

(4)

Fig. 1. The shrinkage function used in the present SCS algorithm (dash-dotted line). y is the observed signal and bs is the estimated clean signal. The effect of this function is to reduce the absolute value of its argument by a certain amount, which depends on the noise variance. Small arguments are suppressed to zero (Hyvärinen, 1999).

2.1.2. Implementation of the sparse coding shrinkage algorithm Fig. 2 illustrates the flowchart of the sparse coding shrinkage in noisy speech. A short description of the algorithm was presented in our preliminary research (Sang et al., 2012). This flowchart was also similar to previous research of sparse coding shrinkage in speech enhancement (Potamitis et al., 2001; Sang et al., 2011a), except the derivation of the transformation matrix W. This flowchart is implemented on each divided speech segment with length of N. The observed noisy speech is reconstructed into a noisy speech matrix Z as Equation (7). The noisy speech matrix is transformed into principal components where clean signals are transformed into a sparse distribution and noise is transformed into a Gaussian distribution. The shrinkage function g($) is applied to suppress the noise in noisy components and estimate the clean components. After that, the inverse transform and reconstruction is calculated to derive the estimated clean speech signals. How to derive W is described in Fig. 3. W indicates the transpose of WT and W-T means the inverse of WT. The noisy speech signal z is assumed to be produced by corrupting the original speech sequence x with Gaussian noise n:

z ¼ xþn

(6)

The noisy speech matrix is constructed by reshaping zas overlapping frames (50% overlap),

2

z1 6 z2 Z ¼ 6 4 « zm

zm=2þ1 zm=2þ2 « zm=2þm

/ / / /

3 zmðl1Þ=2þ1 zmðl1Þ=2þ2 7 7 5 « zmðl1Þ=2þm

where l (l ¼ 15) denotes the number of frames and m (m ¼ 64) is the 4-ms window in each column at a sampling rate of 16 kHz. Accordingly, the total number of points in each speech segment N

Assuming fs to be strictly convex and differentiable, this equation could be solved using the approximation

b s ¼ gðyÞ

(5)

which is called ‘shrinkage function’. The shrinkage function used in our research is shown in Fig. 1 with its specific expression in Equation (15) (Hyvärinen, 1999). The effect of this function is to reduce the absolute value of its argument by a certain amount, which depends on the noise variance. Small arguments are suppressed to zero.

(7)

Fig. 2. Flowchart of sparse coding shrinkage algorithm in noisy speech.

J. Sang et al. / Hearing Research 310 (2014) 36e47

39

Fig. 4. Sparsity of each component si in S can be estimated through normalized kurtosis:

Fig. 3. Flowchart of simultaneous diagonalization of the estimated speech and noise covariance matrices.

equals m(l  1)/2 þ m and the duration of each speech segment is 32 ms. After reshaping, the original noisy speech can be written as

Z ¼ XþN

(8)

Noise is first estimated from the noisy signal through a noise power estimation method (explained in Section 2.4). The estimated b n and the estimated clean speech noise covariance matrix R b x are assumed to be uncorrelated. Fig. 3 shows covariance matrix R how to derive the eigenvalue matrix Lx and eigenvector matrix W through simultaneous diagonalization of the estimated clean speech and noise covariance matrices (Hu and Loizou, 2003). The transformation from noisy speech to principal components is realized with the eigenvector matrix W as illustrated in Fig. 3. Through the implementation in Fig. 3, not only the eigenvector matrix is derived but also noise is pre-whitened as illustrated in Equation (9)

b x W ¼ Lx WT R b nW ¼ I WT R

(9)

Transforming the noisy speech matrix to principal components is realized with the matrix W as follows:

Y ¼ W T Z ¼ W T X þ WT N ¼ S þ V

(10)

where Y ¼ [y1; y2;,,, yn], S ¼ [s1; s2;,,, sn], V ¼ [v1; v2;,,, vn], and the clean speech components si are in super-Gaussian distribution and noise components vi are in Gaussian distribution. Therefore the sparse coding shrinkage function can be applied to each component yi to estimate the clean components si:

b s i ¼ gðyi Þ

(13)

where sij is the jth observation value in si,m is the mean, s is the standard deviation and K is the measured normalized kurtosis. The measured kurtosis is much larger than zero, the normalized kurtosis of a Gaussian distribution. This means that the distribution of each component si is super-Gaussian. Different super-Gaussian levels have been categorized as moderately super Gaussian, Laplacian and strongly super Gaussian (Hyvärinen, 1999). The distribution of clean speech components was selected as a linear combination of Gaussian and Laplacian distributions (Hyvärinen, 1999):

  fs ðsi Þ ¼ C exp  as2i =2  bjsi j

(14)

where C is an irrelevant scaling constant. Different values of a and b represent different degrees of super-Gaussianity. Through the MAP derivation (Hyvärinen, 1999), the shrinkage function corresponding to the distribution of Equation (14) is derived as:

gðyi Þ ¼

  1 2 s signðy Þmax 0;  b jy j i i 1 þ s2 a

(15)

where s2 is the noise variance in each noise component vi. The above shrinkage function is shown as the dash-dotted line in Fig. 1. This shrinkage function is interpolated between the shrinkage function of the Gaussian density and the shrinkage function of the Laplacian density. Specifically, when the distribution of si is Lapqffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi lacian, a is 0 and b is estimated as 2=Efs2i g; when the distribution of si is Gaussian, b is 0 and a is estimated as 1=Efs2i g. Therefore it is reasonable to constrain the values of a and b in the intervals qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½0; 1=Efs2i g and ½0; 2=Efs2i g, respectively.

(11)

b s 2 ; /b s n  is the estimated clean speech matrix in the space S ¼ ½b s1; b of principal components. Inverse transformation of the estimated clean matrix yields

b ¼ WT b S X

 4 n sij  m 1X Kðsi Þ ¼ 3 n j¼1 s4

60

50

(12)

b Finally, the enhanced speech b x is reconstructed by reshaping X back into a vector by the overlap and add method (Deller et al., 2000). Although the above implementation is performed on a short speech segment (32 ms), the processing with longer speech can be realized by dividing it into short segments and using the overlap and add method (Deller et al., 2000). 2.1.3. Super-Gaussian distribution and shrinkage function Fig. 4 shows the distribution of one principal component in speech. This shows an example of moderately super-Gaussian distribution. The red line is shown to emphasize the peaked shape of the distribution which is different from the bell shape of Gaussian distribution. Sparsity can also be quantified by the kurtosis of the signal (Field, 1994). The more sparse, the higher the kurtosis. The lager sparsity level is also visible as a more peaked distribution shape in

40

30

20

10

0 -0.04

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

Fig. 4. An example of a histogram showing the distribution of the coefficients of one principal component vector siin speech. The horizontal axis indicates the amplitude of the principal components. The vertical axis indicates the number of components in each amplitude bin. The line is a fit with a super-Gaussian function and illustrates the difference to a Gaussian distribution.

40

J. Sang et al. / Hearing Research 310 (2014) 36e47

To simplify the estimation of a and b in Equation (15), we estiqffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2=Efs2i g,where m1 and m2

mate that and a ¼ m1 =Efs2i gb ¼ m2 

are coefficients to be adjusted according to the distribution of si. In our test, m1 is set to 1, m2 is set to 0.3.Efs2i g ¼ Efy2i g  s2 the speech and noise are assumed to be uncorrelated. The choice of moderately super Gaussian distributions is justiqffiffiffiffiffiffiffiffiffiffiffiffi fied by the criterion that when Efs2i gfs ð0Þ < p1ffiffiffi, the distribution 2

model can be assumed to be described as Equation (14)(Hyvärinen, 1999). 2.1.4. Estimation of the noise covariance matrix Generally, the noise covariance matrix needs to be estimated b n in Equation (9) and Fig. 3 when a noisy speech signal is given. as R Here, a state-of-the-art noise estimator proposed by Gerkmann and Hendriks (2012) is adopted to track non-stationary noise. This method estimates the noise power spectral density (NPSD) based on a speech presence probability (SPP), where the a priori SNR is a fixed value in estimating the SPP. The amplitude of the noise spectrum is therefore the square root of the noise power spectrum at each frequency bin frame by frame. The phase of the noise spectrum is assumed to be the same as that of noisy speech spectrum. The noise spectrum can be obtained by multiplying the amplitude of the noise spectrum with the phase of the noisy spectrum. The time-domain noise waveform is accordingly estimated with the inverse FFT of the estimated noise spectrum. The noise covariance matrix is estimated by calculating the covariance of noise in temporal domain as shown in Fig. 3. Another way to estimate the noise covariance matrix is estimated through direct inverse Fourier transform of the noise power spectral density according to WienereKhinchin equation (Deller et al., 2000):





Fnn ejw ¼ fnn ½m ¼

þN P m ¼ N

Zp 1 2p

fnn ½mejmw 



(Breithaupt et al., 2008; Gerkmann and Martin, 2009). This algorithm can reduce musical noise and suppress non-stationary noise effectively. 2.2. Stimuli The speech materials used in our experiments were BKB sentences (Bench et al., 1979) which are standard British sentences recorded by a female talker. The corpus consists of 21 lists with 16 sentences in each list. There are three or four key words in each sentence. Each list has 50 keywords. Two noise types were used: babble noise and speech shaped noise. The speech shaped noise is generated such that the frequency spectrum matched the long term average spectrum of speech being used. The babble noise is a multitalker mixture. The speech corrupted with noise was further processed with and without noise reduction strategies to produce stimuli that are denoted ‘noisy’, ‘CS-WF’ and ‘SCS’. The final presentation of these stimuli was adjusted to compensate for the hearing threshold elevation of subjects (See Section 2.3). Figs. 5 and 6 show the time domain waveforms and spectrograms of an example BKB sentence “she drinks from her cup” under 0 dB input SNR with different noise reduction strategies in the background of SSN and babble noise respectively. Clean speech and noisy speech are also shown. The waveforms and the spectrograms are (a) original speech; (b) noisy speech; (c) noisy speech processed with CS-WF; (d) noisy speech processed with SCS. Visual comparison between Figs. 5 and 6 shows that the noise reduction algorithms are less effective in babble noise. Babble noise is a competitive multi-talker noise and is difficult to reduce due to its non-stationary property. SCS reduced the speech shaped noise more efficiently (shown in Fig. 5) and CS-WF reduced babble noise more efficiently (shown in Fig. 6). 2.3. Subjective listening tests

(16)

Fnn ejw ejwm dw

p

where Fnn(ejw) is NPSD and 4nn[m] is the noise auto-correlation coefficient which can be derived through inverse Fourier transform of the NPSD. When the mean of the noise is zero, the noise auto-covariance coefficients are equal to the noise auto-correlation coefficients. If the noise covariance matrix is has the length M, it can also be constructed as a symmetric Toeplitz matrix with the first M values of noise auto-covariance coefficients. 2.1.5. Introduction of comparison algorithm This SCS algorithm was compared with a state-of-the-art Wiener filtering approach, of which the code was provided by Timo Gerkmann (Breithaupt et al., 2008; Gerkmann and Martin, 2009; Gerkmann and Hendriks, 2012). One characteristic of this Wiener filtering approach is that the a priori SNR is estimated by the cepstral smoothing method. We refer to this approach as ‘CSWF’. This algorithm was chosen because Wiener filters are used frequently in today’s hearing aids and CS-WF is a competitive, state-of-the-art, algorithm (Breithaupt et al., 2008; Gerkmann and Martin, 2009; Gerkmann and Hendriks, 2012). There are two critical techniques in CS-WF. One technique was to estimate the NPSD based on an SPP, where the a priori SNR was a fixed value in estimating the SPP (Gerkmann and Hendriks, 2012). The same NPSD estimation method as described in Section 2.4 is also used in SCS. The other technique involved estimating the a priori SNR using temporal cepstrum smoothing with bias compensation

Nine NH listeners and nine HI listeners with sensorineural hearing loss participated in this experiment. All subjects were native English speakers. The NH listeners had hearing thresholds at or below 20 dB HL from 250 Hz to 8 kHz, and their ages ranged from 20 to 36. Fig. 7 shows the individual hearing thresholds (measured by standard PTA) for the aided ears of 9 HI subjects. The HI listeners all had mild to severe hearing losses and most of them had sloping high frequency hearing losses. All listeners were tested monaurally on their better ear. All the HI listeners were experienced HA users and their ages ranged from 18 to 30. The tests were performed with their HAs taken off, and compensation was applied to each HI subject individually. Specifically, a linear gain prescription was computed from each individual’s audiogram through the NAL-R procedure (Dillon, 2001) and used to produce amplification with appropriate frequency-dependent shaping. Table 1 shows the age, tested ear, cause of hearing loss and HA experience of each HI participant. All of them are bilaterally hearing impaired. The experiment was approved by the ethical committee of the University of Southampton. 2.4. Equipment All listeners were seated in a sound-isolated room and listened to the sounds presented through Sennheiser HDA 200 headphones presented through a Behringer UCA202 sound card and Creek OBH- 21SE headphone amplifier. The presentation levels of speech were kept at 65 dB SPL for NH listeners and were adjusted individually for each HI listener to their comfortable and audible level.

J. Sang et al. / Hearing Research 310 (2014) 36e47

41

Fig. 5. Examples of speech (“she drinks from her cup”) in stationary noise with/without noise reduction algorithms. The spectrograms and waveforms are shown for (a) original speech; (b) speech in speech shaped noise (0 dB input SNR); (c) processed with CS-WF; (d) processed with SCS.

2.5. Procedure in speech intelligibility tests There were a total of six test conditions in this experiment: two noise types (SSN, babble) times three noise reduction conditions (no noise reduction, SCS, WF). No noise reduction condition means noisy speech which shows baseline performance. BKB lists were randomly selected from the corpus for each test condition. Subjects were instructed to repeat as many words as they could after listening to each sentence, and they were not given any feedback during the tests. Practice was given for each subject with one randomly selected condition to familiarize them with the test procedure. The order of the six conditions was balanced among the listeners. A subject test with the six conditions and a training session together took less than 1 h. The speech recognition test was performed through a three-upone-down adaptive procedure as described in Dahlquist et al. (2005) to find the speech reception threshold (SRT in dB) required for 79.4% correct recognition in each condition. A sentence was deemed to have been recognised correctly when at least two keywords were repeated correctly. Sentence order was controlled so that participants did not receive the same sentence repeatedly. The starting SNR was 5 dB with NH listeners and 15 dB with HI listeners. The initial step size of the procedure was 5 dB and the final step size was 1 dB. Only one trial was performed in each condition with each subject. Note that the algorithm is working at

varying input SNRs and different participants may be tested at different input SNRs. Particular considerations is therefore taken when pooling data across participants and analyze the data. 3. Results 3.1. Speech recognition results Fig. 8 shows the speech recognition performance of all participants with the results of SRT in all six test conditions: SSNNoisy, SSN-CS-WF, SSN-SCS, Babble-Noisy, Babble-CS-WF, Babble-SCS, for both 9 NH (left) and 9 HI subjects (right). Fig. 8(a) shows the spread of SRTs in box plots. Data were normally distributed and Fig. 8(b) shows the average SRT in each condition with a 95% confidence interval. Results of NH and HI listeners are shown next to each other to show interaction effects between hearing loss and noise reduction algorithm. The primary motivation of the experiment was to evaluate noise reduction algorithms for HI listeners, but we also performed tests with NH listeners. These were designed to highlight differences between NH and HI listeners in order to suggest how noise reduction algorithms that were originally developed for NH listeners can be improved for HI listeners. The spread of the SRTs in Fig. 8(a) illustrates the large intersubject variability in HI subjects compared to NH subjects. This is

42

J. Sang et al. / Hearing Research 310 (2014) 36e47

Fig. 6. Examples of speech (“she drinks from her cup”) in babble noise with/without noise reduction algorithms. The spectrograms and waveforms are shown for (a) original speech; (b) speech in babble noise (0 dB input SNR); (c) the same noisy speech processed with CS-WF; (d) the noisy speech processed with SCS.

presumably due to individual auditory deficits as well as individual experience with hearing aids. The mean SRT for NH listeners is shown in the left panel of Fig. 8(b) and in the second column of Table 2. Data was normally distributed and a two-way repeated ANOVA shows that for NH subjects, the effect of noise reduction algorithm is not significant [F(2,16) ¼ 2.4, p > 0.05], but the effect of noise type is significant for NH subjects [F(1,8) ¼ 27.7, p < 0.05]. There is no interaction between noise type and noise reduction algorithm [F(2,16) ¼ 2.4, p > 0.05]. The non-significant effect of algorithm indicates that these noise reduction algorithms do not benefit NH subjects for speech intelligibility. The results for HI subjects are shown in the right panel of Fig. 8(b) and the second column of Table 3. Data was normally distributed and a two-way repeated ANOVA shows that both the noise reduction algorithm and the noise type have significant effects [F(2,16) ¼ 9.4, p < 0.05, and F(1,8) ¼ 5.5, p < 0.05, respectively]. There is no interaction between noise reduction algorithm and noise type [F(2,16) ¼ 0.04, p > 0.05]. The significant effect of the algorithm indicates noise reduction algorithms can significantly improve speech intelligibility for HI subjects. A three-way repeated ANOVA where both NH and HI subjects were included showed that the main effects of subject type, noise type were significant [F(1,16) ¼ 17.767, p < 0.05, and F(1,16) ¼ 22.589, p < 0.05, respectively], but the main effect of the

noise reduction algorithm is not significant [F(2,32) ¼ 3.085, p > 0.05]. There is a significant interaction effect between subject type and noise reduction algorithm [F(2, 32) ¼ 9.210, p < 0.05]. There are no significant interaction effects between subject type and noise type or between noise type and noise reduction algorithm [F(1,16) ¼ 0.55, p > 0.05, F(2,32) ¼ 1.423, p > 0.05, respectively]. There is no three-way interaction between subject type, noise reduction algorithm and noise type [F(2,32) ¼ 1.561, p > 0.05]. These results indicate that the performance of noise reduction algorithms depends on the hearing loss level and the noise type. Fisher LSD post hoc tests were performed to detect significant differences in performance between any pair of the six conditions across NH subjects (Table 2) and HI subjects (Table 3) separately. The numbers in the brackets identify the conditions. The numbers without brackets in the second column present the average SRT of each condition. Significant differences (p < 0.05) are shown in boldface. For NH listeners, the noise reduction algorithms barely improve speech intelligibility in speech shaped noise (compare (1) and (2), (1) and (3), in Table 2) but significantly deteriorate speech intelligibility in babble noise (compare (4) and (5), (4) and (6), in Table 2). There is no significant intelligibility difference between speech in babble noise and speech in speech shaped noise within NH subjects (compare (1) and (4) in Table 2). For HI listeners, the noise reduction algorithms significantly improve speech intelligibility in speech shaped noise (compare (1) and (2), (1) and (3), in

J. Sang et al. / Hearing Research 310 (2014) 36e47

Fig. 7. Audiograms showing the individual hearing thresholds for the aided ears of HI subjects (N ¼ 9).

Table 3) but not significantly in babble noise (compare (4) and (5), (4) and (6), in Table 3). There is a significant intelligibility difference between speech in babble noise and speech in speech shaped noise within HI subjects (compare (1) and (4) in Table 2). Comparison between SCS and CS-WF in speech shaped noise through paired sample t-test shows that the power is only 0.2 with the available 9 HI subjects; at least 47 subjects would have been needed for p < 0.05 at 80% power to detect a within-subject between-condition difference of 1.0 dB. Comparison between SCS and CS-WF in babble noise through paired sample t-test shows that: the power is 0.37 with the available 9 NH subjects; at least 23 subjects would have been needed for p < 0.05 at 80% power to detect a withinsubject between-condition difference of 1.0 dB.

43

be a factor for potential speech recognition gain. However, it is worth noting that the adaptive testing approach meant that HI subjects were tested at higher input SNRs, as the SRT of noisy speech was higher in HI listeners than in NH subjects. Therefore another factor in speech recognition gain may be the input SNR to the algorithm, which depends on the individual SRT of unprocessed noisy speech. To investigate the relationship between speech recognition gain and the above two factors (the hearing loss level and the SRT of noisy speech), multivariate regression of speech recognition gain was conducted. The hearing loss level was quantified here by averaging the hearing thresholds at the three frequencies (1, 2, 4 kHz) in each subject. It is assumed that the thresholds in that frequency range are most important to understand speech (Amos and Humes, 2007). The SRT of noisy speech indicates the baseline performance of unprocessed noisy speech. The higher the SRT with noisy speech, the higher the input SNR to the algorithm during testing. Note that because individuals with greater hearing loss have higher SRT, the two factors are correlated with one another. The scatter plots in Fig. 9 show the relationship between speech recognition gain and average hearing threshold in four conditions. The scatter plots in Fig. 10 show the relationship between speech recognition gain and baseline performance in four conditions. The four conditions (aed) in Figs. 9 and 10 are CS-WF in speech shaped noise, SCS in speech shaped noise, CS-WF in babble noise, SCS in babble noise in order. The larger value of R2 (correlation squared) in

3.2. Multivariate regression of speech recognition gain Speech recognition gain is defined as the difference between the SRT for the unprocessed noisy condition and the SRT for the noise reduction algorithm. A positive speech recognition gain indicates a positive benefit in intelligibility from noise reduction algorithms. Single channel noise reduction algorithms seem to have reached their limits in terms of speech intelligibility improvement for NH. However, Fig. 8 shows that there are large differences between NH and HI subjects in speech recognition gain from noise reduction algorithms. HI subjects generally get higher speech recognition gain than NH subjects. Therefore the hearing loss level appears to

Table 1 Age, tested ear, cause of hearing loss and hearing aid experience of the listeners with hearing losses. Listener

Age

Gender

Ear

Cause of hearing loss

Hearing aid experience

HI1 HI2 HI3 HI4 HI5 HI6 HI7 HI8 HI9

20 31 22 18 21 20 22 20 22

F F F F F F M F M

R R R R R R L R R

Meningitis at 2 years old Congenital Congenital Congenital Congenital Tinnitus, noise exposure Congenital Congenital Congenital, hereditary

16 years 31 years 20 years 14 years 18 years 4 years 18 years 19 years 6 years

All of them are bilateral hearing impaired.

Fig. 8. SRTs for different conditions in 9 NH and 9 HI listeners. SSN: speech shaped noise; Noisy: noisy speech without noise reduction algorithms; CS-WF: the comparison algorithm; SCS: sparse coding shrinkage. (a) Box plots of SRTs. On each box, the central mark is the median; the edges of the box are the 25th and 75th percentiles; the whiskers extend to the most extreme measured data not considered outliers. (b) Mean SRTs with error bars indicate the 95% confidence intervals of the means. A more negative SRT corresponds to better performance.

44

J. Sang et al. / Hearing Research 310 (2014) 36e47

Table 2 Fisher LSD post hoc significant tests for the interaction of noise reduction algorithm and noise type in the experiment with NH subjects. Processing condition

Mean SRT

SSN-Noisy SSN-CS-WF SSN-SCS Babble-Noisy Babble-CS-WF Babble-SCS

(1) (2) (3) (4) (5) (6)

3.2 3.3 3.5 2.6 1.5 1.6

(1)

(2)

(3)

(4)

(5)

0.896 0.317 0.228 0.002 0.002

0.598 0.050 0.007 0.011

0.107 0.001 0.003

0.048 0.018

0.77

Significant effects (p < 0.05) are given in boldface. The number in each bracket indicates the number of each condition.

Table 3 Fisher LSD post hoc significant tests for the interaction of noise reduction algorithm and noise type in the experiment with HI subjects. Processing condition

Mean SRT

(1)

(2)

(3)

(4)

(5)

SSN-Noisy SSN-CS-WF SSN-SCS Babble-Noisy Babble-CS-WF Babble-SCS

(1) (2) (3) (4) (5) (6)

0.030 0.009 0.006 0.292 0.865

0.233 0.001 0.074 0.525

0.001 0.018 0.210

0.169 0.073

0.058

2.9 2.3 1.8 3.9 3.4 2.8

Significant effects (p < 0.05) are given in boldface. The number in each bracket indicates the number of each condition.

each plot shows the greater variance explained by the factor. The average hearing threshold or baseline performance correlates significantly (p < 0.05) with speech recognition gain in babble but not in speech shaped noise. However, the fact that Fig. 10(c and d) shows higher values of R2 than Fig. 9(c and d), suggests the SRT of noisy speech is the underlying explanatory factor. Fig. 10 shows the SRT of noisy speech minus SRT of processed speech as the dependent variable and SRT of processed speech as

the independent variable. Since both variables contain the SRT of processed speech, they are structurally correlated. The statistical solution is to analyse transformed variables instead: (SRT of noisy speech minus SRT of processed speech) as the dependent variable and (SRT of noisy speech plus SRT of processed speech) as the independent variable. Now the two variables are not structurally correlated. Results show that the two transformed variables are statistically correlated with high values of R2 as shown in Fig. 10(c and d). This further validates that the SRT of noisy speech is the underlying factor of speech recognition gain. In an attempt to separate the contributions of the two factors to the speech recognition gain in babble noise, multivariate forward stepwise linear regression was performed. The forward stepwise regression includes the most important (greatest explained variance) and excludes the less important variables in the linear model. Tables 4and 5 show the results of multivariate forward stepwise linear regression of the speech recognition gain in babble noise with the algorithms of CS-WF and SCS respectively. In the regression model, the dependent variable is the speech recognition gain, and the two independent variables are average hearing threshold and SRT in noisy speech. Tables 4 and 5 show that the factor of hearing threshold was excluded in favour of the factor of baseline SRT in the analysis. This suggests that the factor of SRT in noisy speech is the underlying explanatory variable rather than the average hearing threshold. This further indicates that speech recognition gain is dependent mainly on the input SNR of noisy speech. This also explains why performance of noise reduction algorithms depends on the hearing loss level. It is consistent with the findings from other studies that CI users tend to benefit from single-channel noise reduction algorithms. We rationalise that due to the extent of their hearing loss, the algorithms are necessarily evaluated with relatively high input SNR, where the algorithms simply work better.

Fig. 9. Scatter plots showing speech recognition gain versus average hearing threshold in four conditions (SCS-SSN, SCS-Babble, CS-WF-SSN, CS-WF-Babble).

J. Sang et al. / Hearing Research 310 (2014) 36e47

45

4. Discussion

4.3. The effect of input SNR

4.1. Comparison between SCS and CS-WF

As indicated above, the SRT of noisy speech governs the level of input SNR in individual evaluation. Noise reduction algorithms appear to perform better at higher SNR. Noise estimation is better at higher input SNR, where it is easier to differentiate speech and noise segments. If NH subjects could be tested at higher input SNR, for example using more difficult speech, they might get as much speech recognition gain as HI subjects. However, usually NH subjects reach ceiling performance at higher input SNR and cannot demonstrate any further improvement. The variation of speech recognition gain with input SNR of noisy speech is more obvious in babble noise than in speech shaped noise. This reflects the difficulty of reducing babble noise at low input SNR. This also shows that choice of input SNR during evaluation of noise reduction algorithms affects the results, which is especially important for non-adaptive speech recognition tests that choose fixed input SNRs (usually 0, 5, 10 dB). When specifying the performance of such algorithms, it is therefore vital to specify the test conditions in detail, including not only the relative frequency spectra of the speech and noise but also the input SNR. While the dependence of noise reduction algorithms on input SNR may appear to be obvious, and would appear to be an obvious explanation for differences in results of evaluations in NH and HI listeners, we are not aware of any previous study that has shown this dependence explicitly. These findings underline the need to evaluate noise reduction algorithms in the intended target population. Clearly, absence of benefit in NH listeners does not imply absence of benefit in HI listeners. Moreover, algorithms that show little benefit in HI listeners with moderate hearing loss may turn out to have useful benefit for listeners with more severe hearing loss. Furthermore, algorithms that show little benefit in HI listeners

Overall, there was no significant difference in performance between SCS and CS-WF either within NH subjects or HI subjects. This indicated that the performance of the two noise reduction algorithms was similar. As shown in Figs. 5 and 6, SCS improved speech intelligibility slightly more than CS-WF in HI subjects. However, SCS reduces the noise to a larger extent and this increases distortion. This may be more acceptable to HI listeners than to NH listeners, because they are less sensitive to speech distortion and more sensitive to noise level due to the hearing loss factors. 4.2. The effect of hearing loss level The two noise reduction algorithms (CS-WF and SCS) evaluated in our study showed greater benefits to HI listeners than to NH listeners for speech intelligibility. This is in accordance with previous findings that most single-channel noise reduction algorithms cannot improve speech intelligibility for NH listeners, but can improve intelligibility for profoundly HI listeners using CIs. This suggests that the benefits of noise reduction algorithms vary with the hearing loss level. The linear relationship between the speech recognition performance and hearing threshold across the 18 subjects in Fig. 9 also supported this hypothesis. However, when including the hearing threshold and SRT of noisy speech both into the stepwise regression model of speech recognition gain, the contribution of the hearing threshold is excluded (Tables 4 and 5). Therefore, the SRT of noisy speech appears to be a more critical factor for the performance of noise reduction algorithms. This will be discussed in the next section.

Fig. 10. Scatter plots showing speech recognition gain versus average SRT of unprocessed noisy speech (baseline performance) in four conditions (SCS-SSN, SCS-Babble, CS-WF-SSN, CS-WF-Babble).

46

J. Sang et al. / Hearing Research 310 (2014) 36e47

Table 4 The forward stepwise regression of speech recognition gain with the algorithm of CS-WF in babble noise.

Included variables SRT in noisy Excluded variables Average HL a b c

Ba

Std error B

bb

Sig.

0.233

0.053

0.738

0.000c

0.004

0.991

B is the unstandardized coefficient. b is the standardized coefficient. The value in boldface indicates significance (p < 0.001).

Table 5 The forward stepwise regression of speech recognition gain with the algorithm of SCS in babble noise. Ba Included variables SRT in Noisy Excluded variable Average HL a b c

0.302

Std error B 0.057

bb

Sig.

0.796

0.000

0.189

0.544

5. Conclusions This work focuses on the effects of sparse coding shrinkage noise reduction algorithms on speech intelligibility within NH and HI listeners. The SCS algorithm showed comparable performance as the competitive Wiener filtering algorithm in both NH and HI listeners. The difference between NH and HI subjects in intelligibility gain depends primarily on the input SNR rather than the hearing loss level although the two variables are correlated. Both algorithms performed better at higher input SNRs where HI subjects can get benefits but NH subjects have reached ceiling performance. Babble noise is still a challenge for the noise reduction algorithms used here, especially at low input SNRs and for listeners with mildto-moderate hearing losses.

Acknowledgements c

B is the unstandardized coefficient. b is the standardized coefficient. The value in boldface indicates significance (p < 0.001).

using standard speech materials may show benefits in the same listeners for more difficult speech materials, because higher input SNR will be needed in the comparative evaluation. 4.4. The effect of noise type Speech shaped noise and multi-talker babble noise, both having similar average long term spectra as the speech, were adopted as additive noise in our evaluation. However, speech shaped noise is stationary and babble noise is fluctuating. Both noise reduction algorithms performed better in stationary noise for both NH and HI subjects. For NH listeners, neither noise reduction algorithm affected speech intelligibility in speech shaped noise but both impaired speech intelligibility in babble noise. For HI listeners, the noise reduction algorithms showed significant intelligibility improvements in speech shaped noise but not in babble noise. A similar effect of noise type was also shown in CI users: Yang and Fu (2005) found that the same spectral subtraction algorithm worked much better in speech shaped noise than in babble noise in CI users. Due to the rapidly varying characteristics of babble noise, the noise power estimation methods cannot accurately track and estimate the noise power in each temporal frame. Babble noise, which shows similar fluctuating properties as the target speech, can be easily misidentified as speech resulting in weak noise reduction effects. 4.5. Acclimatization effects Acclimatization describes the process that a listener can adjust to a gradual change in the acoustical environment, allowing best performance across a range of environmental conditions. It is important to consider acclimatization effects when investigating noise reduction algorithms. Acclimatization effects may be different for different participants or different algorithms. Participants in our test only had a few minutes to adjust to each algorithm. A much longer period (in the order of days to weeks) would be needed for full acclimatization. This is impractical for laboratory investigation and would require a wearable take-home HA. In future experiments it will be important to give participants more time to acclimatise to the algorithms.

We thank Aapo Hyvarinen, Patrik Hoyer and Xin Zou for their advice in sparse coding shrinkage. We thank Timo Gerkmann for providing CS-WF code. We thank David Simpson, James M. Kates and Kathryn Hoberg Arehart for their advice in NAL-R compensation. We also thank all the subjects. This work was supported by the European Commission within the ITN AUDIS (grant agreement number PITN-GA-2008-214699).

References Alcantara, J.I., Moore, B.C., Kuhnel, V., Launer, S., 2003. Evaluation of the noise reduction system in a commercial digital hearing aids. Int. J. Audiol. 42, 34e42. Amos, N.E., Humes, L.E., 2007. Contribution of high frequencies to speech recognition in quiet and noise in listeners with varying degrees of high-frequency sensorineural hearing loss. J. Speech Lang. Hear. R. 50, 819e834. Arehart, K.H., Hansen, J.H.L., Gallant, S., Kalstein, L., 2003. Evaluation of an auditory masked threshold noise suppression algorithm in normal-hearing and hearingimpaired listeners. Speech Commun. 40, 575e592. Baer, T., Moore, B.C., Gatehouse, S., 1993. Spectral contrast enhancement of speech in noise for listeners with sensorineural hearing impairment: effects on intelligibility, quality and response times. J. Rehabil. Res. Dev. 30, 49e72. Bench, J., Kowal, A., Bamford, J., 1979. The BKB (Bamford-Kowal-Bench) sentence lists for partially-hearing children. Brit. J. Audiol. 13, 102e112. Breithaupt, C., Gerkmann, T., Martin, R., 2008. A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing. In: IEEE ICASSP 2008, pp. 4897e4900. Dahlquist, M., Lutman, M.E., Wood, S., Leijon, A., 2005. Methodology for quantifying perceptual effects from noise suppression systems. Int. J. Audiol. 44, 721e732. Deller, J.R., Hansen, J., Proakis, J.G., 2000. Discrete-time Processing of Speech Signals. IEEE Press, New York. Dillon, H., 2001. Hearing Aids. Thieme, New York. Dirks, D., Morgan, D., Dubno, J., 1982. A procedure for quantifying the effects of noise on speech recognition. J. Speech Hear. Disord. 47, 114e123. Elberling, C., Ludvigsen, C., Keidser, G., 1993. The design and testing of a noise reduction algorithm based on spectral subtraction. Scand. Audiol. Suppl., 39e49. Field, D.J., 1994. What is the goal of sensory coding. Neural Comput. 6, 559e601. Gerkmann, T., Martin, R., 2009. On the statistics of spectral amplitudes after variance reduction by temporal cepstrum smoothing and cepstral nulling. Signal Process. IEEE Transact. 57, 4165e4174. Gerkmann, T., Hendriks, R.C., 2012. Unbiased MMSE-based noise power estimation with low Complexity and low tracking Delay. Audio, Speech, Lang. Process. IEEE Transact. 20, 1383e1393. Harlander, N., Rosenkranz, T., Hohmann, V., 2012. Evaluation of model-based versus non-parametric monaural noise-reduction approaches for hearing aids. Int. J. Audiol., 1e13. Hu, H., Sang, J., Bleeck, S., Lutman, M.E., 2013. Non-negative matrix factorization on the envelope matrix in cochlear implant. In: Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, Vancouver, Canada, pp. 7790e7794. Hu, H., Li, G., Chen, L., Sang, J., Wang, S., Lutman, M.E., Bleeck, S., 2011. Enhanced Sparse Speech Processing Strategy for Cochlear Implants. Eusipco, Barcelona, Spain. Hu, Y., Loizou, P., 2003. A generalized subspace approach for enhancing speech corrupted with colored noise. IEEE Transact. Speech Audio Process. 11, 334e341. Hu, Y., Loizou, P., 2007. A comparative intelligibility study of single-microphone noise reduction algorithms. J. Acoust. Soc. Am. 122, 1777e1786. Hyvärinen, A., 1999. Sparse code shrinkage: denoising of nonGaussian data by maximum likelihood estimation. Neural Comput. 11, 1739e1768.

J. Sang et al. / Hearing Research 310 (2014) 36e47 Hyvärinen, A., Hoyer, P., Oja, E., 1998. Sparse code shrinkage for image denoising. In: Neural Networks Proceedings, 1998. IEEE World Congress on Computational Intelligence, vol. 2, pp. 859e864. Jamieson, D.G., Brennan, R.L., Cornelisse, L.E., 1995. Evaluation of a speech enhancement strategy with normal-hearing and hearing impaired listeners. Ear. Hear 16, 274e286. Kates, J.M., Weiss, M.R., 1996. A comparison of hearing-aid array-processing techniques. J. Acoust. Soc. Am. 99, 3138e3148. Kim, G., Loizou, P., 2010. Improving speech intelligibility in noise using a binary mask that is based on magnitude spectrum constraints. IEEE Signal Proc. let. 17, 1010e1013. Levitt, H., 1971. Transformed up-down methods in psychoacoustics. J. Acoust. Soc. Am. 49, 467e477. Levitt, H., 1993. Digital hearing aids. In: Studebaker, G.A., Hochberg, I. (Eds.), Acoustical Factors Affecting Hearing Aid Performance. Allyn and Bacon., Boston, pp. 317e335. Levitt, H., 2001. Noise reduction in hearing aids: a review. J. Rehabil. Res. Dev. 38, 111e121. Levitt, H., Bakke, M., Kates, J., Neuman, A., Schwander, T., Weiss, M., 1993. Signal processing for hearing impairment. Scand. Audiol. Suppl. 38, 7e19. Li, G., 2008. Speech Perception in a Sparse Domain (Ph. D. thesis). Institute of Sound and Vibration, University of Southampton. Li, G., Lutman, M.E., 2008. Sparse stimuli for cochlear implants. In: Proc. EUSIPCO, Lausanne, Switzerland. Li, G., Lutman, M.E., Wang, S., Bleeck, S., 2012. Relationship between speech recognition in noise and sparseness. Int. J. Audiol. 51, 75e82. Lockwood, P., Boudy, J., 1992. Experiments with a non-linear spectral subtractor (NSS), Hidden/Markov models and the projection, for robust speech recognition. Speech Commun. 11, 215e228.

47

Plomp, R., 1994. Noise, amplification, and compression: considerations of three main issues in hearing aid design. Ear Hear. 15, 2e12. Potamitis, I., Fakotakis, N., Kokkinakis, G., 2001. Speech enhancement using the sparse code shrinkage technique. In: IEEE ICASSP ’01, pp. 621e624. Sang, J., Hu, H., Li, G., Lutman, M.E., Bleeck, S., 2011a. Supervised sparse coding strategy in hearing aids. In: IEEE ICCT 2011 China. Sang, J., Li, G., Hu, H., Lutman, M.E., Bleeck, S., 2011b. Supervised Sparse Coding Strategy in Cochlear Implants. Interspeech, Florence, Italy. Sang, J., Hu, H., Zheng, C., Li, G., Lutman, M.E., Bleeck, S., 2012. Evaluation of a Sparse Coding Shrinkage Noise Reduction Algorithm in Normal Hearing and Hearing Impaired Listeners. EUSIPCO, Bucharest, Romania. Schum, D.J., 2003. Noise-reduction circuitry in hearing aids, II: goals and current strategies. Hear. J. 56, 32e41. Verschuur, C., Lutman, M.E., Wahat, N.H.A., 2006. Evaluation of a non-linear spectral subtraction noise suppression scheme in cochlear implant users. Cochlear Implants Int. 7, 188e193. Weiss, M., Neuman, A.C., 1993. Noise reduction in hearing aids. In: Studebaker, G.A., Hochberg, I. (Eds.), Acoustical Factors Affecting Hearing Aid Performance. Allyn and Bacon, Boston, pp. 337e352. Widrow, B., Luo, F., 2003. Microphone arrays for hearing aids: an overview. Speech Commun. 39, 139e146. Yang, L., Fu, Q., 2005. Spectral subtraction-based speech enhancement for cochlear implant patients in background noise. J. Acoust. Soc. Am. 117, 1001e 1004. Zou, X., Jancovic, P., Ju, L., Kokuer, M., 2008. Speech signal enhancement based on MAP algorithm in the ICA space. IEEE T. Signal Proces. 56, 1812e1820.

Evaluation of the sparse coding shrinkage noise reduction algorithm in normal hearing and hearing impaired listeners.

Although there are numerous single-channel noise reduction strategies to improve speech perception in noise, most of them improve speech quality but d...
2MB Sizes 0 Downloads 0 Views