Modeling the effects of a single reflection on binaural speech intelligibility Jan RenniesAnna Warzybok, Thomas Brand, and Birger KollmeierTD

Citation: J. Acoust. Soc. Am. 135, 1556 (2014); doi: 10.1121/1.4863197 View online: http://dx.doi.org/10.1121/1.4863197 View Table of Contents: http://asa.scitation.org/toc/jas/135/3 Published by the Acoustical Society of America

Modeling the effects of a single reflection on binaural speech intelligibility Jan Renniesa) Project Group Hearing, Speech and Audio Technology, Fraunhofer Institute for Digital Media Technology IDMT, D-26129 Oldenburg, Germany

Anna Warzybok, Thomas Brand, and Birger Kollmeierb) Cluster of Excellence Hearing4all, Medizinische Physik, Universit€ at Oldenburg, 26111 Oldenburg, Germany

(Received 12 November 2012; revised 28 December 2013; accepted 7 January 2014) Recently the influence of delay and azimuth of a single speech reflection on speech reception thresholds (SRTs) was systematically investigated using frontal, diffuse, and lateral noise [Warzybok et al. (2013). J. Acoust. Soc. Am. 133, 269–282]. The experiments showed that the benefit of an early reflection was independent of its azimuth and mostly independent of noise type, but that the detrimental effect of a late reflection depended on its direction relative to the noise. This study tests if different extensions of a binaural speech intelligibility model can predict these data. The extensions differ in the order in which binaural processing and temporal integration of early reflections take place. Models employing a correction for the detrimental effects of reverberation on speech intelligibility after performing the binaural processing predict SRTs in symmetric masking conditions (frontal, diffuse), but cannot predict the measured interaction of temporal and spatial integration. In contrast, a model extension accounting for the distinction between useful and detrimental reflections before the binaural processing stage predicts the data with an overall R2 of 0.95. This indicates that any model framework predicting speech intelligibility in rooms should incorporate an interaction between binaural and temporal integration of reflections C 2014 Acoustical Society of America. at a comparatively early stage. V [http://dx.doi.org/10.1121/1.4863197] PACS number(s): 43.71.Gv, 43.71.An, 43.66.Pn, 43.55.Hy [TD]

I. INTRODUCTION

Speech intelligibility in real rooms is often corrupted by ambient noise and reverberation, although human listeners show remarkable capabilities to understand speech even in so-called cocktail-party scenarios (Cherry, 1953; Bronkhorst, 2000). Understanding the mechanisms underlying the processing under such adverse conditions can help to improve existing speech intelligibility models and thereby provide tools that can be used in practical applications, such as, e.g., room acoustical design or the optimization of binaural signal enhancement algorithms. In a recent study, Warzybok et al. (2013) therefore systematically investigated the influence of amplitude, direction, and delay of a single reflection on binaural speech intelligibility in the presence of different noise types. The goal of the present study was to use these data to further investigate the interaction of temporal and spatial processing in the binaural auditory system by testing different interaction mechanisms within the framework of an existing speech intelligibility model.

a)

Author to whom correspondence should be addressed. Also at: Center of excellence Hearing4all, Carl-von-Ossietzky University, Oldenburg, D-26111 Oldenburg, Germany. Electronic mail: [email protected] b) Also at: Project Group Hearing, Speech and Audio Technology, Fraunhofer Institute for Digital Media Technology IDMT, D-26129 Oldenburg, Germany. 1556

J. Acoust. Soc. Am. 135 (3), March 2014

Pages: 1556–1567

Several studies have addressed the influence of reflections of the speech signal on speech intelligibility (e.g., Lochner and Burger, 1964; Soulodre et al., 1989; Bradley et al., 2003; Arweiler and Buchholz, 2011; Arweiler et al., 2013; Warzybok et al., 2013). It was generally agreed that speech reflections arriving within a certain time window after the direct sound can be (at least partly) integrated with the direct sound and thus facilitate speech perception. The length of the time window differed between studies, but was typically estimated to be about 50 to 100 ms (e.g., Lochner and Burger, 1964; Nabelek and Robinette, 1978; Warzybok et al., 2013). Reflections arriving at longer delays cannot be integrated and even have additional masking effects. The interaction between the temporal integration process and binaural processing was systematically investigated by Warzybok et al. (2013). They found that binaural unmasking of speech consisting of the direct sound and a single strong reflection was independent of reflection delay when both components were presented frontally. In other words, the dependence of speech reception thresholds [SRTs, i.e., signal-to-noise ratios (SNRs) at 50% intelligibility] on reflection delay was the same for frontal, lateral, and diffuse noise. This result indicated that binaural processing and temporal integration of speech information were independent processes, and was referred to as independence hypothesis by Warzybok et al. (2013). In conditions with a frontal direct sound and a lateral reflection, however, they found differences between the SRTs measured with the different noise

0001-4966/2014/135(3)/1556/12/$30.00

C 2014 Acoustical Society of America V

types. While the benefit of early reflections remained unchanged (apart from detrimental effects that could be explained by changes in spectra), the detrimental effects of a late, lateral reflection were reduced compared to a late, frontal reflection. In a lateral noise, this release from masking depended on reflection direction and was larger when the late reflection arrived from the same hemisphere as the masking noise. Warzybok et al. (2013) concluded that the independence hypothesis was fulfilled for the temporal integration of a beneficial (early) reflection, which was equally efficient for all reflection directions, but that the detrimental effect could be reduced when reflection and direct sound are spatially separated. Warzybok et al. (2013) did not evaluate binaural speech intelligibility models to test the independence hypothesis. The goal of the present study was to investigate how different mechanisms of binaural and temporal interaction realized in different binaural speech intelligibility models could account for the data of Warzybok et al. (2013). Several models were proposed to predict speech intelligibility in spatial listening conditions. van Wijngaarden and Drullman (2008) presented a binaural extension of the speech transmission index (STI; Steeneken and Houtgast, 1980; IEC, 2003). The binaural STI is based on the same modulation transfer function (MTF) as the monaural STI. The MTF, however, is additionally calculated for different interaural delays and the final MTF is obtained by selecting the interaural delay that maximizes the transmitted speech modulations. van Wijngaarden and Drullman (2008) showed that their model could well predict speech intelligibility in a number of spatial listening conditions. The binaural STI, however, does not exploit interaural level differences to effectively reduce noise components. In contrast, vom H€ ovel (1984) proposed a model with two distinct stages accounting for binaural processing and the effects of temporal distortions. Binaural processing is realized as an equalization-cancellation (EC) mechanism (Durlach, 1963, 1972) to optimize the SNR in different frequency bands based on interaural time and level differences of speech and noise signals. Temporal distortions of the speech signal such as echoes and reverberation are accounted for after the binaural processing by a correction factor derived from the “definition,” i.e., the ratio of the early reflection energy and the entire energy contained in the impulse response (IR) (ISO, 2009). This measure is also sometimes referred to as “Deutlichkeit” and is related to the clarity measure, which describes the ratio of the early reflection energy to the late reflection energy. The model of vom H€ovel (1984) was further extended to a binaural speech intelligibility model (BSIM) by Beutelmann and co-workers (Beutelmann and Brand, 2006; Beutelmann et al., 2010). BSIM provides accurate predictions of SRTs for speech that is hardly affected by reverberation. For reverberant speech, predictions become increasingly inaccurate the more reverberation is introduced (Rennies et al., 2011a). This is due to the fact that BSIM considers the entire speech (including the late reflections) as useful. To overcome this limitation, Rennies et al. (2011a) developed three different extensions of BSIM that are based J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

on room acoustical measures. Two of the extensions follow the suggestion of vom H€ovel (1984) and consist of post hoc corrections, i.e., the binaurally optimized SNRs after the EC mechanism are corrected depending on the degree of reverberation. In the first extension, the correction factor is based on the MTF, which is also used in the calculation of the STI. In the second extension, the correction is based on the definition as suggested by vom H€ovel (1984). Both MTF and definition are calculated from the IR of the speech transmission path. The third extension differs from the other two in that the separation of presumably useful, early reflections and presumably detrimental, late reflections is made before the binaural processing in the EC stage. This is achieved by calculating the input speech signal as the convolution of the clean speech and the early part of the IR, while the clean speech convolved with the late part of the IR is added to the masking noise. This extension differs fundamentally from the other two since the EC mechanism is based only on the early reflection components, while it is still based on the entire reverberant speech signal in the models with the post hoc correction (as in the original BSIM). A further model for predicting binaural speech intelligibility was developed by Lavandier and co-workers (Lavandier and Culling, 2010; Lavandier et al., 2010, 2012). Their model is also based on the EC mechanism, but binaural interaction and better-ear listening are modeled independently. In addition, the revised model of Lavandier et al. (2010, 2012) uses binaural IRs as input signals instead of speech and noise convolved with IRs as in BSIM. In the same way as in the third extension of BSIM described above the model proposed by Lavandier et al. (2010) also applies a separation of the input speech signals into useful and detrimental parts prior to the binaural processing. In the present study, the model-based investigation of the data of Warzybok et al. (2013) was conducted within the model framework of BSIM, since the framework contains different model versions that readily allow a direct comparison of orders in which binaural processing and temporal integration of early reflections take place. Even though the modeling efforts reported here are restricted to a specific model framework that has been successfully employed in the past, the tested design principles of the model stages accounting for the interaction between spatial and temporal integration should be usable in other model frameworks as well. A direct comparison of the different versions of BSIM is particularly interesting because, despite the fundamental differences with respect to interaction of temporal and spatial processing as described above, Rennies et al. (2011a) did not observe significant differences in the predictions for the conditions tested in their study: All extensions provided about equally good predictions, although the extension based on the MTF correction slightly overestimated the influence of reverberation on speech intelligibility. Rennies et al. (2011a) argued that the missing differences between the model versions could be due to properties of their tested stimuli. One reason could have been that the influence of early reflections was too small in their highly reverberant room. In addition, their speech stimuli were rather symmetrical with respect to Rennies et al.: Early reflections and speech intelligibility

1557

the reflection pattern arriving at the two ears. This may not have introduced enough binaural disparities between direct sound and reflections to reveal differences in the model predictions. The conditions investigated by Warzybok et al. (2013) were designed to increase the influence of reflections by reducing the reverberation pattern to a single reflection which had the same amplitude as the direct sound. In addition, the spatial configuration of direct sound and reflection was systematically varied. These conditions therefore provide a set of data to systematically test the different model versions of Rennies et al. (2011a). If the binaural auditory system does indeed adjust its spatial unmasking mechanism based on the spatial interaction of noise and detrimental reflection, then the model version allowing the EC mechanism to distinguish between early and late parts of the speech signal should agree better with the data than the model versions based on post hoc corrections that do not take the interaction of temporal and spatial processing into account. This was explicitly tested in this study. II. METHODS A. Experimental setup

A detailed description of the experimental method and setup that was used to measure the data considered in this study was provided by Warzybok et al. (2013). Briefly, a series of speech intelligibility experiments in different spatial conditions was conducted (see Fig. 1 and Table I in Warzybok et al., 2013). The direct sound of the speech was always presented frontally (S0). A stationary noise with a speech-shaped spectrum was presented either frontally (N0), laterally at 135 of azimuth (N135), or diffusely (ND). The azimuthal direction of the single reflection was varied in the different experiments. The data considered in the present study were measured with a reflection of the same amplitude as the direct sound. The delay of the reflection relative to the direct sound was varied between 0 and 200 ms. For each condition, SRTs were measured with normal-hearing listeners using the Oldenburg sentence test (Wagener et al., 1999a,b; Wagener et al., 1999c). The level of the noise was always 65 dB sound pressure level (SPL) at the right ear, and the speech level was varied adaptively to converge to the SRT. Warzybok et al. (2013) presented their SRTs as better-ear SNRs at 50% intelligibility by correcting for broadband interaural SNR differences in conditions with the higher SNR at the left ear to facilitate the comparison across conditions. The speech level was always calculated from the root-mean-square (rms) value of the entire speech signal containing both direct sound and reflection. The same way of calculating SRTs was used in this study. B. Original and extended BSIM

All four versions of BSIM were tested in the present study, i.e., the original version of Beutelmann et al. (2010) and the three extensions of Rennies et al. (2011a). A detailed description of the models is provided in the respective publications. Briefly, the original BSIM processes binaural input 1558

J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

signals, i.e., interfering noise and a stationary speech-shaped noise convolved with the IRs of the speech source to simulate the input speech. The input signals are processed in different auditory filters. In each filter, the SNR is optimized by an independent EC mechanism. The optimized SNRs of all auditory filters are then used as input to the speech intelligibility index (SII; ANSI, 1997). In two of the extended model versions, the SNR at the output of the EC stage is multiplied by a correction factor, namely the MTF (this extension is referred to as BSIM-MTF in the following) or the definition (referred to as BSIM-D100, since the separation of early and late reflections was made at 100 ms). In the third extension, the input speech signal is separated into useful and detrimental components prior to the EC stage and the detrimental components are added to the noise (BSIM-UD100). A schematic of the different model structures is provided in Fig. 2 of Rennies et al. (2011a). In all model versions, the SRT for a given acoustic condition is derived from the calculated SII by first selecting a fixed reference SII value and then varying the SNR until the SII equaled this reference value. In the present study, the reference SII was set to 0.24. This ensured that SRTs predicted for the reference condition (S0R0N0) for a delay of 0 ms (effectively equivalent to no reflection because direct sound and reflection were added coherently) was approximately the same as the mean SRT observed in the experiment (deviations < 0.1 dB). This condition was chosen as the reference condition since the predictions of the different model versions are exactly the same for conditions without any reverberation or reflection. This is due to the fact that the corrections factors become 1 (for the post hoc corrections, see below) and the splitting of the input speech into useful and detrimental parts does not change the input signals (see Rennies et al., 2011a). The reference SII was slightly higher in this study than the value of 0.23 used by Rennies et al. (2011a) reflecting the fact that their set of subjects had slightly lower average SRTs than the subjects of Warzybok et al. (2013). The models can account for individual hearing thresholds by introducing uncorrelated noise at the two ears which cannot be canceled by the EC mechanism. The use of individual audiograms was shown to be crucial for predictions of speech intelligibility in hearing-impaired listeners (Beutelmann et al., 2010). For normal-hearing listeners, the use of individual audiograms does not significantly affect predictions of speech intelligibility in noise at intermediate noise levels, but is crucial to obtain accurate predictions for speech in quiet (Rennies et al., 2011a). Since the focus of this study was on normal-hearing listeners and conditions of speech in noise at intermediate levels, an average audiogram of 0 dB hearing level was assumed at all frequencies. The original BSIM as well as BSIM-MTF were used exactly as described by Beutelmann et al. (2010) and Rennies et al. (2011a), respectively. For BSIM-D100 and BSIM-UD100, a single modification was introduced in the present study compared to the version proposed by Rennies et al. (2011a). As mentioned above, these models are based on a separation of early and late reflections contained in the IR. In BSIM-D100, the definition Dte (i.e., the ratio of the Rennies et al.: Early reflections and speech intelligibility

early reflection energy up to time te and the entire energy contained in the IR) is used as the correction factor. It is calculated from the IR h(t) by ð te Dte ¼



0

2 hðtÞ  wearly ðtÞ dt ð1 ; h2 ðtÞdt

(1)

0

where the extraction of the early reflections is realized by applying the step-like weighting function ( wearly;step ðtÞ ¼

1 for t  te 0 for t > te:

8 1 > > < wearly; ramp ðtÞ ¼ 1  ðte þ d=2Þ1  t > > : 0

(2)

for

The value of te was set to 100 ms by Rennies et al. (2011a) to achieve good predictions in their highly reverberant conditions, and it was not changed in the present study. Warzybok et al. (2013) argued that such a step-like splitting might be unsuitable for the speech stimuli consisting only of direct sound and a single reflection. Depending on the delay of the reflection, either the entire speech energy or half of the speech energy would be contained in the early part of the IR, i.e., the definition would have a value of either 1 or 0.5. This would lead to a rather binary dependence of predicted SRTs on reflection delay which was not observed in the data. Therefore, another weighting function with a smoother transition was also tested in the present study. As a simple realization of such a transition, a linearly decreasing ramp was used to fade out the early part, i.e.,

0  t < te  d=2

for te  d=2  t  te þ d=2 for

(3)

t > te þ d=2;

where d is the duration of the ramp and was set to 200 ms. This weighting ensures that half of the linear decrease was reached at time te and that, in the limiting case of very short ramp durations wearly; ramp ðtÞ approaches wearly; step ðtÞ. Using d ¼ 200 ms and te ¼ 100 ms results in a ramp that has no steady state in the beginning, but starts decreasing immediately after the direct sound. This led to good predictions of BSIM-D100 in the reference condition (S0R0N0, see Sec. III) and was kept fixed for all conditions considered in this study. The same ramp was also used for all predictions of BSIM-UD100. This model additionally requires the extraction of the late reflections from the speech signal. This was achieved by using the weighting function wlate; ramp ðtÞ ¼ 1  wearly; ramp ðtÞ to fade in the late part over the same period of time as used to fade out the early part. The two ramps were symmetrical, i.e., their sum was equal to 1 at any time, and they crossed each other at the separation time of 100 ms. Apart from the modified weighting function, BSIM-D100 and BSIM-UD100 were left unchanged and used as described by Rennies et al. (2011a). The influence of varying the duration and shape of the weighting functions is discussed in Sec. IV C.

Remember that the speech level was always defined as the rms-level of the entire speech signal including direct sound and reflection. This means that the energy of the direct sound was smaller in signals consisting of both direct sound and

III. RESULTS A. Influence of a frontal speech reflection on speech intelligibility under different noise conditions

The top left panel of Fig. 1 shows the SRT data of Warzybok et al. (2013) as a function of reflection delay for the S0R0 condition with a frontal (N0, black squares), diffuse (ND, dark gray circles), and lateral (N135, light gray triangles) interferer. Error bars indicate standard deviations. J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

FIG. 1. Top left panel: SRTs measured by Warzybok et al. (2013) for speech consisting of frontal direct sound (S0) and a frontal reflection (R0) as a function of reflection delay for a frontal (N0, black squares), diffuse (ND, dark gray circles), and lateral noise (N135, light gray triangles). Top right panel: Corresponding predictions of BSIM (dashed lines) and BSIM-MTF (solid lines). The bottom panels show the amount of binaural unmasking calculated as the SRT difference between frontal noise and diffuse or lateral noise. Rennies et al.: Early reflections and speech intelligibility

1559

reflection compared to a signal consisting of direct sound only (reflection delay 0 ms) at the same level. If, despite this reduction in direct sound level, the measured SRTs are the same for a given condition, this indicates that the reflection energy is perfectly integrated with the direct sound. The S0R0 data shown in Fig. 1 indicate that this was the case for delays up to between 25 and 50 ms for all noise types. Similarly, the SRT increase at larger delays was very similar for all noise types. In consequence, the amount of binaural unmasking calculated as the SRT improvement compared to the N0 condition was independent of reflection delay for both non-frontal interferers [binaural intelligibility level difference (BILD), shown in the bottom left panel of Fig. 1]. The total amount of binaural unmasking was about 8 dB (N135) and 4 dB (ND). Warzybok et al. (2013) calculated a speech-weighted SNR to estimate the contribution of spectral changes due to head-related transfer functions to the overall unmasking effect. They found that about 4 and 3 dB of the observed binaural unmasking for lateral and diffuse noise, respectively, could be attributed to spectral effects; the remainder was due to binaural processing (for details, see Warzybok et al., 2013). The corresponding predictions of the original BSIM and the extension based on MTF (BSIM-MTF) are shown in the right panels of Fig. 1 as dashed and solid lines, respectively. For each model, the prediction accuracy is indicated in terms of the coefficient of determination (R2) and the bias. R2 was calculated as the square of the linear correlation coefficient according to Pearson. The bias represents the overall offset of the predictions and was calculated as the y-intercept b of the (unity slope) linear function SRT exp ¼ SRT pred þ b, which was fitted to the scatter plot of measured SRTs against predicted SRTs by minimizing the mean-squared error (see also Fig. 6). The original BSIM quantitatively predicts the binaural unmasking for both types of non-frontal interferers for speech

signals consisting of direct sound only. However, the increase in SRTs with increasing reflection delay is not predicted. In contrast, BSIM-MTF predicts both measured effects. For frontal and diffuse noise, predictions agree reasonably well with the data of Warzybok et al. (2013), although the increase of SRTs is slightly steeper at intermediate reflection delays. For the lateral interferer, this increase in SRT with reflection delay is considerably overestimated resulting in an about 4-dB higher SRT at a delay of 200 ms compared to the measured data. In consequence, the predicted BILD is not independent of reflection delay. Altogether, BSIM-MTF does not yield improved prediction accuracy with respect to the indicated error measures compared to BSIM. The corresponding predictions of the other two model extensions are shown in Fig. 2. The left and mid panels illustrate the effect of the different ways to computationally separate early and late parts of the IR on predictions of BSIM-D100. SRTs shown in the top left panel were predicted using the traditional, step-like weighting function according to Eq. (2) (see schematic in the panel). The predictions show the expected behavior reflecting the binary values of the definition for IRs consisting of only two components: Predicted SRTs are characterized by two steady states, one at a lower SNR for delays up to 75 ms, and one at a higher SNR at delays of 100 and 200 ms. The predictions of BSIM-D100 with the adapted ramp according to Eq. (3) are shown in the mid panels of Fig. 2. In general, the same trends as for BSIM-MTF occur, with a slightly better agreement with the experimental data for N0 and ND due to the adjusted ramp. In particular, the overestimated SRT increase at the largest reflection delay in the N135 condition is the same in both models using a post hoc SNR correction. In contrast, BSIM-UD100 predicts both the increase in SRTs with increasing reflection delay (top right panel of Fig. 2) and delay-independent BILDs for both

FIG. 2. Same data representation as in Fig. 1, but for predictions of BSIMD100 (left and mid panel) and BSIM-UD100 (right panel). The schematics in each upper panel illustrate the weighting function used to separate early (black dashed lines) and late parts (solid gray line) of the IRs of the speech transmission paths (see Sec. II B).

1560

J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

Rennies et al.: Early reflections and speech intelligibility

non-frontal interferers (bottom right panel) such that predictions quantitatively agree with measured SRTs for almost all conditions. The only discrepancy between data and predictions of BSIM-UD100 is observed at the longest delay for all masking conditions. While the data show an increase in SRT of between 4.1 dB (N0) and 5.6 dB (N135) from 0 to 200 ms delay, BSIM-UD100 predicts only a 3-dB increase for all masking conditions. Thus, the detrimental effect of a late reflection on speech intelligibility is underestimated (see Sec. IV B for a discussion of this effect). B. Spatial separation of direct sound and speech reflection in lateral and diffuse noise

In a further set of experiments, Warzybok et al. (2013) measured SRTs with the same frontal direct sound (S0) and the same diffuse (ND) and lateral noise (N135), but with a non-frontal reflection at 45 azimuth (R45). These data are shown in the top left panel of Fig. 3 (black lines). For comparison, SRTs measured with a frontal reflection (R0) are shown in gray (re-plotted from Fig. 1). Measured SRTs clearly differed between a frontal and a lateral reflection. At a delay of 0 ms, considerably higher SRTs occurred for a lateral reflection than for a frontal reflection. Warzybok et al. (2013) argued that this was the result of spectral coloration due to the addition of two speech signals arriving simultaneously from different directions (0 and 45 ) and with different head-related spectro-temporal properties. At delays between 10 and 50 ms, SRTs were very similar for both reflection directions. At delays between 75 and 200 ms, however, the increase in SRT with reflection delay was no longer observed with a lateral reflection. The

FIG. 3. Black lines represent SRTs measured by Warzybok et al. (2013) (top left panel) and predicted by the extended models (other panels) for stimuli consisting of frontal direct sound (S0), a lateral speech reflection (R45), and a diffuse (ND) or lateral (N135) noise as a function of reflection delay. For comparison, SRTs for a frontal reflection are shown in gray (re-plotted from Fig. 1). J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

corresponding predictions of the extended model versions are shown in the top right and bottom panels. As before, predictions are very similar for both models employing a post hoc correction (BSIM-MTF and BSIM-D100). Both models predict the SRT decrease from 0 to 10 ms delay. In fact, this effect is also predicted by the original BSIM (not shown), which otherwise does not show any dependence of SRTs on reflection delay (as described above). The extended models also quantitatively predict the reduced SRT increase at longer delays for diffuse noise. For a lateral interferer, the detrimental effects of a later reflection are again considerably overestimated. In contrast, BSIM-UD100 can also predict this interaction of reflection delay and azimuth for a lateral interferer, i.e., SRTs measured with a lateral reflection are generally predicted by the model. C. Effects of speech reflection azimuth in lateral and diffuse noise

In a subsequent experiment, Warzybok et al. (2013) systematically varied the azimuthal direction of the reflection while keeping the frontal direct sound and the diffuse noise the same. The measured SRTs are shown in the top left panel of Fig. 4. At short delays (10 and 50 ms), measured SRTs were very similar for all reflection azimuths. At the longest delay (200 ms), SRTs were lower for a reflection spatially separated from the direct sound than for a frontal reflection, indicating that the spatial separation reduced the detrimental effect of a late reflection. The corresponding predictions (other panels) indicate reasonably accurate predictions for all model extensions. As expected, SRTs are symmetrical with respect to reflections arriving from the left and right hemisphere. All models predict this symmetry.

FIG. 4. Same data representation as in Fig. 3, but for SRTs measured with frontal direct sound (S0), a diffuse noise (ND), and different speech reflection azimuths. Different line styles indicate the direction of arrival of the speech reflection (solid black: Frontal; dotted dark gray: Right hemisphere; dashed light gray: Left hemisphere). Rennies et al.: Early reflections and speech intelligibility

1561

Warzybok et al. (2013) also investigated the effect of reflection azimuth in the presence of a lateral interferer (N135), again using frontal direct sound. The corresponding measured SRTs are shown in the top left panel of Fig. 5. For a reflection from behind (R180) and for reflections arriving from the hemisphere contralateral to the noise source (R225, R270, and R315), the dependence of SRTs on reflection delay was similar to the pattern observed for a frontal reflection: SRTs were similar for delays of 10 and 50 ms, and significantly increased for a delay of 200 ms. SRTs for a contralateral reflection tended to be about 2 dB lower than for a frontal reflection. For an ipsilateral reflection (R45, R90, and R135), however, there was no difference in SRT between the different reflection delays, i.e., the detrimental effect of the late reflection was absent, indicating an asymmetrical dependence of SRTs on reflection azimuth. The corresponding predictions of the models are shown in the other panels of Fig. 5. As before, SRTs predicted by BSIM-MTF and BSIM-D100 are very similar. Both models predict a monotonic increase in SRT with delay for all reflection azimuths, i.e., they fail to predict the reduced detrimental effect at a delay of 200 ms observed for an ipsilateral reflection. A further discrepancy between data and predictions is that the predicted SRT increase is smaller for a contralateral reflection than for symmetric reflections (R0 and R180). In contrast, BSIM-UD100 predicts the independence of SRTs on reflection delay for an ipsilateral reflection as well as the same SRT increase for a contralateral and a symmetric reflection. As observed above, this increase in SRT at the longest reflection delay is underestimated by about 2 dB. All models predict the same SRTs for a frontal and a contralateral reflection at a delay of 10 ms, i.e., the significant 2-dB advantage

FIG. 5. Same data representation as in Fig. 4, but for SRTs measured with frontal direct sound (S0), a lateral noise at the right side of the listener (N135), and different speech reflection azimuths. Different line styles indicate the direction of arrival of the speech reflection (solid black: Front/back; dotted dark gray: Right hemisphere; dashed light gray: Left hemisphere). 1562

J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

observed for a contralateral reflection at short delays is not predicted. D. Overall correlation analysis

Figure 6 shows the correlation analyses of measured and predicted SRTs for the original model (top left panel) and its extended versions (other panels). In each panel measured SRTs are plotted as a function of the corresponding predictions for all 62 conditions described above, i.e., for all combinations of reflection azimuth, delay, and type of interferer. To facilitate the interpretation, the different noise types are indicated by different symbols (N135: Light gray squares; ND: Dark gray crosses; N0: Black circles). The solid bisecting line in each panel represents the perfect match of data and predictions. The dashed line has a slope of unity and was fitted to the scatter plot by minimizing the mean-squared error. The vertical or horizontal distance between this line and the bisecting line is equivalent to the prediction bias as described previously. In addition to the bias, the coefficient of determination and the rms prediction error e are indicated in each panel. Table I summarizes the separate correlation analyses for each noise type (first three rows) and across all noise types (last row). The top left panel of Fig. 6 confirms the main findings for the predictions of the original BSIM: While the model predicts the general differences between the three noise types, the correlations within each of the noise types are very poor (see also Table I), since the model predicts almost the same SRTs for all reflection delays (cf. Fig. 1).

FIG. 6. Scatter plots of measured SRTs (ordinates) against predicted SRTs (abscissae) for the original BSIM (top left panel) and its extended versions (other panels) for all 62 conditions investigated in this study. Circles, crosses, and squares represent SRTs measured in frontal, diffuse, and lateral noise, respectively. The solid line is the bisecting line. The dashed line has a slope of unity and was fitted to the data by minimizing the mean-square error. The distance between the two lines represents the bias of the predictions.  indicates the rms prediction error. Rennies et al.: Early reflections and speech intelligibility

TABLE I. Coefficients of determination (R2), bias (in dB), and rms prediction error  (in dB) between model predictions and the data of Warzybok et al. (2013) for each of the noise types and across all noise types (last row). The second column indicates the number of data points across which the values were calculated. BSIM

BSIM-MTF

BSIM-D100

BSIM-UD100

Noise

# Cond

R2

Bias



R2

Bias



R2

Bias



R2

Bias



N0 ND N135

7 23 32

0.32 0.01 0.02

1.6 1.1 0.5

2.1 1.7 1.8

0.86 0.75 0.61

0.3 0.4 2.9

0.6 0.9 3.3

0.98 0.78 0.66

0.4 0.1 2.6

0.5 0.6 2.9

>0.99 0.77 0.70

0.3 0.1 0.6

0.3 0.6 1.1

Overall

62

0.77

0.9

1.8

0.71

1.7

2.4

0.74

1.2

2.1

0.95

0.3

0.9

In contrast, predictions of BSIM-MTF (top right panel) and BSIM-D100 (bottom left panel) are correlated with the data for all noise types. Across all conditions, however, predictions are not better than for the original BSIM as expressed in terms of R2 and bias. The light gray squares indicate that this is mainly due to the overestimated SRTs in the conditions with lateral noise. The bottom right panel shows that BSIM-UD100 predicts the individual conditions with a reasonably high accuracy, and an overall very high correlation (R2 ¼ 0.95), a negligible bias, and an rms error below 1 dB, although some data predictions could still be improved (see below). Note that the lower R2 values for N135 compared to N0 (see Table I) do not necessarily reflect worse prediction accuracy, but are rather a result of the different number of data points (7 data points for N0, 32 data points for N135) and the fact that a larger number of data points with low delays (10 and 50 ms) were measured for N135 (all tested reflection azimuths) than for N0 (a single reflection azimuth). For lower delays, SRTs were similar both in the predictions and in the experimental data, resulting in a small range of values. As illustrated in Fig. 5, predictions of BSIM-UD100 reasonably meet the measured SRT range for these short delays. However, they do not perfectly match the experimental data. This, in combination with the small range of existing SRT values, results in a lower linear correlation. In addition, SRTs were measured with a higher density of delays for N0 (0, 10, 25, 50, 75, 100, and 200 ms) than for N135 (10, 50, 200 ms). The N0 data indicate a rather monotonic increase of SRTs with delay (see Fig. 1), and most of the measured data points are well predicted by BSIM-UD100 including the linear ramp as a weighting function (Fig. 2).

IV. DISCUSSION A. Integration of an early speech reflection

Listening conditions that were symmetric with respect to the azimuthal directivity of the signals arriving at the listeners’ ears were investigated by Warzybok et al. (2013) using frontal direct sound, a frontal reflection, and a frontal or diffuse noise. The data showed that early reflection energy was perfectly integrated with the direct sound for short reflection delays: While the data tended to show a small increase in SRT also at short reflection delays, this effect was not significant up to a delay of about 50 ms. In the investigated J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

symmetric listening conditions, all model versions predict this effect at very short delays. These are the only conditions in which predictions of the original BSIM do not substantially deviate from the data, since SRTs predicted by BSIM are entirely independent of reflection delay (see Fig. 1). It is interesting to note that the benefit due to an early reflection measured by Warzybok et al. (2013) was independent of its direction in the presence of a diffuse noise. This was predicted reasonably well by all model versions, but was not observed by Arweiler and Buchholz (2011). They measured speech intelligibility in diffuse noise and varied the SNR by either increasing the direct speech energy or the energy of the early reflections using a realistic IR containing 20 reflections within 55 ms after the frontal direct sound. In addition to a realistic sound presentation preserving the original azimuth and elevation angles of the reflections, Arweiler and Buchholz (2011) also measured intelligibility in a condition in which all reflections were presented from a frontal loudspeaker. They found (1) that an increase in direct-speech energy was more beneficial than an equivalent increase in energy of the early reflections, and (2) that this difference was smaller when all reflections were presented from the frontal speaker. The first effect indicated a non-optimal integration of the reflections with the direct sound and might be attributed to spectral differences between direct sound and reflections (cf. Arweiler and Buchholz, 2011; Warzybok et al., 2013). From the second observation Arweiler and Buchholz (2011) concluded that temporal integration of early reflections may be easier when they arrive from the same direction as the direct sound. This is at odds with the findings of Warzybok et al. (2013) that an early reflection is equally beneficial in diffuse noise independent of its azimuth. The reason for this discrepancy is not clear, but it is possible that the integration of a single reflection spatially separated from the direct sound is easier than the integration of several reflections with different directions of arrival. Another reason may be the length of the time window for integrating early reflections. Warzybok et al. (2013) observed optimal integration of an early reflection up to a delay of 25 ms in the diffuse noise condition. In the study of Arweiler and Buchholz (2011) the reflection arrived within 55 ms after the direct sound so that some of the reflections may not have been fully integrated with the direct sound and therefore resulted in lower speech intelligibility scores than a condition with the direct sound only. To what extend such an effect can be predicted by the Rennies et al.: Early reflections and speech intelligibility

1563

model versions discussed in the present study remains to be investigated. B. Effects of a late speech reflection

At the longest delay investigated by Warzybok et al. (2013) SRTs were significantly increased by more than 4 dB compared to the condition with direct sound only. Warzybok et al. (2013) argued that an SRT increase of 3 dB would be expected if the loss of direct speech energy was the only reason for increased SRTs. Such an SRT increase is measured for a frontal reflection delayed by 100 ms. The fact that SRTs increased further at the largest delay of 200 ms indicates that the reflection was not only useless for the listener (in terms of a loss of speech energy), but had an additional detrimental impact on speech intelligibility. One possible reason is that the late reflection acted as an additional masker, thus increasing the overall masker energy. In this study, the noise level was fixed at 65 dB SPL. This means that the energetic contribution of the late reflection to the overall masker level decreased with decreasing speech level (i.e., with decreasing SNR). Thus, if additional masker energy was the only reason for the detrimental effect at large delays, one should expect that this effect is larger at higher speech levels. The data of Warzybok et al. (2013) indicate that this is not the case. In fact, the detrimental effect was larger for the N135 condition (5.3 dB) than in the N0 condition (4.1 dB), although SRTs in the N135 condition were generally much lower. Thus, the increase in masker level by the late reflection is probably not the only underlying mechanism. BSIM-UD100 slightly underestimates this detrimental effect, especially for the N135 condition (increase about 2 dB too small) indicating that an additional deleterious effect is not fully captured by the model. As described above, BSIM-UD100 is based on the SII, i.e., on the concept of weighting long-term average spectra of speech and noise. Recent studies have argued that this concept is oversimplified and that more accurate speech intelligibility predictions can be achieved by accounting for temporal modulations in the background noise (Dubbelboer and Houtgast, 2007; Jørgensen and Dau, 2011; Stone et al., 2011, 2012). Although the STI is also based on the relation between temporal modulations and speech intelligibility, the concept discussed in these studies differs from the concept of the STI in that the STI is based on the assumption that the reduction of temporal modulations of the clean speech signal alone determines intelligibility. In contrast, the recent studies argue that the modulations contained in the noise signal may also play an important role even when apparently stationary maskers are used as in the present study (Stone et al., 2011, 2012). Jørgensen and Dau (2011) transferred this concept into the speech-based power spectrum model (sEPSM), which is based on the SNR in the envelope domain (SNRenv) and which could account for decreases in intelligibility for speech subjected to reverberation and spectral subtraction. Although the sEPSM was not yet shown to predict the influence of a single reflection on speech intelligibility, it is possible that the late reflection 1564

J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

used in the present study affected speech processing in the modulation domain and therefore contributed to the observed deleterious effect. A possible approach for an enhanced prediction model for binaural speech intelligibility (including the detrimental effect at large delays) could be to combine the binaural processing front-end of BSIM-UD with a model also accounting for the effects of temporal modulations on intelligibility (such as, e.g., sEPSM). Another interesting aspect for future investigations is if—in addition to possible effects of temporal modulations—the reflection had an additional deleterious influence due to its similarity to the direct sound (informational masking, see, e.g., Bronkhorst, 2000; Brungart, 2001; Brungart and Simpson, 2002; Rhebergen et al., 2005). Since the reflection was an intelligible, delayed copy of the direct sound with the same spectrum, one could assume that the late reflection was perceived as a second, interfering speaker with rather high potential of evoking informational masking. To what extend the intelligible content of the reflection influenced SRTs could be tested by using the same stimuli as in the present study, but with an unintelligible late reflection (e.g., using a noise carrier with the same envelope). This would not change the predictions of BSIM-UD100, but would eliminate possible informational masking related to the speech content of the reflection. C. Interaction of spatial and temporal integration of a speech reflection

In contrast to symmetric listening conditions, substantial differences in prediction accuracy are observed in listening conditions with stimuli that are asymmetric with respect to their azimuthal directivity. This is true for asymmetry introduced by a lateral noise, a lateral reflection, or both. In all of these conditions, only BSIM-UD100 predicts the measured trends, while the models based on a post hoc correction (BSIM-D100, BSIM-MTF) considerably overestimate the detrimental effects of the reflection with increasing delay. Remember that only in BSIM-UD100 the binaural processing stage (i.e., the EC-mechanism) accounts for the fact that only the early parts of the speech signal are useful while late parts are detrimental, while BSIM-D100 and BSIM-MTF correct for this fact after the binaural processing stage. The failure of the latter models to predict SRTs for the lateral noise strongly suggests that the temporal integration properties of the auditory system need to be accounted for in the binaural processing stage of speech intelligibility models. This also supports the intuitive notion that an effective binaural processing stage in the auditory system (as realized in this model using an EC-mechanism) should optimize its parameters based on the speech components that are actually useful for the listener (BSIM-UD100) rather than on the entire physical speech signal including detrimental components (BSIM, BSIM-D100, BSIM-MTF). The model predictions of this study indicate that this fundamental difference in the interaction of temporal and spatial processing between the different model versions is only relevant for listening conditions with binaurally asymmetric stimuli. In particular, the Rennies et al.: Early reflections and speech intelligibility

independence of binaural unmasking regarding reflection delay for a frontal reflection and a lateral interferer can only be predicted by BSIM-UD100 (see Fig. 2). The second effect that is not predicted by the models using a post hoc correction is the interaction of reflection azimuth and noise azimuth for the lateral noise (see Fig. 5). The data of Warzybok et al. (2013) showed that the detrimental effect of a late reflection was considerably reduced when it arrived from a similar direction as the noise. This trend is predicted by BSIM-UD100, while BSIM-D100 and BSIM-MTF predict a similar SRT increase for all lateral reflections. Across all conditions BSIM-UD100 is in much better agreement with the data of Warzybok et al. (2013) than the other model versions. However, there are two measured effects that remain unaccounted for by this model. One effect is the masking component of a late reflection as discussed above. The other effect is that (in combination with a frontal direct speech component) an early reflection can be even more beneficial than a frontal reflection when it arrives from a direction contralateral to the N135 noise (compare black and light gray symbols at delays of 10 and 50 ms in Fig. 5). The underlying mechanism for the latter effect is not clear. Warzybok et al. (2013) argued that the effect was probably not due to spectral advantages since they did not find differences in speech-weighted SNR between the conditions with an ipsilateral or a frontal reflection. Another possible reason for the effect is a binaural advantage resulting from an increased effective spatial separation of speech and noise: If the auditory system considers the direct sound and the reflection to be equally beneficial, then the optimum speech direction for a binaural unmasking mechanism could be somewhere between the direct sound and the contralateral reflection. This could lead to a slightly increased spatial separation between noise and effective speech direction (and consequently a slightly larger binaural gain) compared to the condition when direct sound and reflection are presented frontally. However, such an effect should be modeled by an effective binaural processing mechanism as the EC stage. In the implemented EC mechanism, the originally lower SRTs are entirely equalized by the broadband better-ear normalization employed in this study (see Sec. II A) indicating that no additional advantage due to interaural time differences (ITD) is predicted by BSIM. The fact that the model does not predict the observed effect could mean that the EC stage is not optimally parameterized for the present data, or that another mechanism is responsible for the observed effect. One possible way of further investigating the role of early reflections that are spatially separated from both the direct speech sound and from a lateral noise could be to reduce the effects of better-ear listening. This could be achieved by using stimuli as employed by Lavandier and Culling (2008), who generated their binaural stimuli with just two microphones, but no physical head model in order to focus on ITD processing. In this light, it will also be interesting to test the model proposed by Lavandier et al. (2010). It also employs a step-like separation of the input speech IRs into early and late components in a way very similar to the original realization in BSIM-UD100 to account for J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

detrimental effects of speech reverberation. This means that, in order for the model of Lavandier et al. (2010) to account for the data of Warzybok et al. (2013), similar modifications as described in the present study could be required. In particular, this includes a smoother transition between useful and detrimental components to avoid step-like SRT increases. D. Role of model parameters for temporal integration of speech components

Throughout the predictions presented so far, the parameters of the models were fixed. In particular, this included parameters of BSIM-UD, which turned out to be the only model predicting the main trends of the experimental data in all conditions: The separation time te between early and late speech components (fixed at 100 ms), the duration d of the weighting function to separate early and late components [fixed at 200 ms, see Eq. (3)], and the shape of the weighting functions (linear ramp). These parameters were introduced in BSIM by Rennies et al. (2011a) and in the present study to account for the temporal integration of the direct sound with early reflections. In this study, the selection of the parameter values was motivated by previous studies (te ¼ 100 ms was also used by Rennies et al., 2011a,b) or by an initial fit which had resulted in reasonable prediction accuracy (d ¼ 200 ms and linear ramp shape). To assess the influence of the parameters on the predictions of BSIM-UD, a systematic parameter variation was made. The results of the analyses for a symmetrical condition and a condition involving binaural unmasking are shown in Fig. 7 (left column: S0R0N0, right column: S0R0N135). In

FIG. 7. Influence of ramp duration (top panels, note that curves almost overlap for 50 and 100 ms), separation time te between early and late reflections (mid panels), and ramp shape (bottom panels) on predictions of BSIM-UD for the S0R0N0 condition (left column) and the S0R0N135 condition (right column) of Warzybok et al. (2013). In each panel, measured SRTs are indicated by the light gray line. Rennies et al.: Early reflections and speech intelligibility

1565

each panel, experimental data are represented by the light gray line, while the black and dark gray lines show predictions using different parameter settings. The top panels illustrate the influence of the ramp duration for a fixed te ¼ 100 ms and a linear ramp. Note that 200 ms (¼ 2  te) represents the longest possible duration for this type of weighting function under the constraint that the first sample of the IR be given a weight of unity [see Eq. (3) and schematic in Fig. 2]. It can be observed for both experimental conditions that reducing the ramp duration shifts the point at which predicted SRTs start increasing toward longer delays. While predicted SRTs are still within one standard deviation of the data, the reduction of the ramp duration generally reduces the prediction accuracy, since SRTs tend to increase monotonically already at short delays (although this effect is not significant up to 50 ms, see Sec. III A). This indicates that the weighting function for the early speech components should start decaying already at an early point after the direct sound. The effect of varying te is shown in the mid panels of Fig. 7. For each te, the ramp duration was set to its maximum value (2  te). This was based on a systematic test of various combinations of te and ramp duration which indicated that, in general, predictions were closest to the experimental data when the slope of the weighting functions already started at very short delays (accounting for the observed trend of monotonically increasing SRTs). As before, a linear ramp was used as the weighting function. Note that reflections at delays longer than the ramp duration are always fully assigned to the detrimental part. Consequently, the smaller the te, the shorter the delay at which the maximum and steady state of the predicted SRTs occurs. For small te (especially 25 ms), this point appears to be too early, while predictions for intermediate separation times such as 100 ms are in reasonable agreement with the data. Larger values of te result in SRT increases that are too shallow compared to the data. Thus, the choice of te has a significant influence on the dependence of predicted SRTs on reflection delay. The comparison between data and predictions indicates that a value between about 75 and 100 ms leads to good predictions, i.e., te should be larger than 50 ms, which is a value often employed in room acoustical studies to calculate the definition (e.g., Lochner and Burger, 1964; Nabelek and Robinette, 1978; ISO, 2009). Finally, the shape of the weighting function was also varied. The resulting predictions for te ¼ 100 ms and a ramp duration of 200 ms are shown in the bottom panels of Fig. 7. In addition to the linear ramp [Eq. (3)], a squared-cosine ramp was also tested. For both types of ramps, two further ramps were calculated as the square-roots of the linear ramp and the squared-cosine ramp (the latter thus becoming a cosine ramp). Since all ramps had the same duration, predicted SRTs intersect at delays of 0 and 200 ms. For intermediate delays, the curves differ slightly by up to about 1 dB. It can be seen that the SRT increase at short delays with either type of “linear” ramp is steeper than with the corresponding squared-cosine ramp owing to the shallower decrease of the weighting function. It appears that predictions are slightly better for the linear and the squared-cosine ramps without taking the square-root of these weighting functions. In conclusion, the shape of the weighting function (linear, squared linear, cosine, or squared cosine) plays a minor role 1566

J. Acoust. Soc. Am., Vol. 135, No. 3, March 2014

for a given ramp duration and separation time te, while variations of the separation time and the duration of the ramps influence predicted SRTs. The selected values of te ¼ 100 ms and a ramp duration of 200 ms lead to accurate predictions as shown above, while the exact values do not seem to be crucial since a similar (or even slightly higher) accuracy is achieved for te ¼ 75 ms and a ramp duration of 150 ms. E. Practical implications for the application of the definition

In this study, the definition used as a post hoc correction within BSIM could not predict intelligibility in all experimental conditions. However, the definition is still a very useful and widely used measure in room acoustics. It is therefore interesting to consider the implications of this study for the use of the definition. The comparison of model predictions and data indicates that a step-like weighting function for extracting the early, useful speech components from the IR is not adequate to model speech intelligibility for stimuli consisting of direct sound and a single reflection (see Fig. 2). However, such a step-weighting is the common way to calculate, e.g., the definition or clarity measure (ISO, 2009). On the one hand it is possible that a smooth transition (e.g., linear ramp) as suggested in the present study is necessary only for rather artificial stimuli with a single or a few strong reflections as used by Warzybok et al. (2013). In practice many IRs roughly follow a smooth, exponential decay. For such IRs the exact separation time might not be crucial and a step-like weighting might be well justified. On the other hand, if a smooth separation of early and late components leads to similar values of the definition for realistic IRs, then this may indicate that using a smooth separation is more generally applicable, because it can be used to adequately describe IRs with a realistic reverberation tail as well as IRs consisting of a few, strong reflections. A possible consequence of the current study might therefore be to replace the currently used steplike weighting function in the calculation of the definition with a more appropriate, smooth weighting function. To investigate the comparability of both weighting functions for practical purposes in room acoustics, the definition was calculated using both the step-like weighting and the linear ramp weighting to derive the energy of the early part for a set of 114 binaural IRs (left and right ear for each of the 57 listening conditions). The set comprised a wide range of reverberant conditions and consisted of IRs that were used in previous studies to investigate binaural speech intelligibility using BSIM (Beutelmann and Brand, 2006; Beutelmann et al., 2010; Rennies et al., 2011a). Both IRs measured in real rooms and IRs simulated using different room simulation tools were included in the calculations (see original publications for details). Definition values (classical, step-like weighting) were between 0.45 and 1.00 for D100, between 0.30 and 1.00 for D80, and between 0.27 and 1.00 for D50. For this set of IRs the correlation of the definition calculated using the step-like and linear weighting (with maximum ramp duration of 2  te) was very high with R2 > 0.94 and bias 0.06 for each of D50, D80, and D100. An even better agreement was obtained for the squared-cosine ramp (each R2 > 0.98 and bias

Modeling the effects of a single reflection on binaural speech intelligibility.

Recently the influence of delay and azimuth of a single speech reflection on speech reception thresholds (SRTs) was systematically investigated using ...
878KB Sizes 0 Downloads 3 Views