Effects of voice style, noise level, and acoustic feedback on objective and subjective voice evaluations.

Bottalico et al.: JASA Express Letters

[http://dx.doi.org/10.1121/1.4936643]

Published Online 1 December 2015

Effects of voice style, noise level, and acoustic feedback on objective and subjective voice evaluations Pasquale Bottalico,a) Simone Graetzer, and Eric J. Hunter Voice Biomechanics and Acoustics Laboratory, Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan 48824, USA [email protected], [email protected], [email protected]

Abstract: Speakers adjust their vocal effort when communicating in different room acoustic and noise conditions and when instructed to speak at different volumes. The present paper reports on the effects of voice style, noise level, and acoustic feedback on vocal effort, evaluated as sound pressure level, and self-reported vocal fatigue, comfort, and control. Speakers increased their level in the presence of babble and when instructed to talk in a loud style, and lowered it when acoustic feedback was increased and when talking in a soft style. Self-reported responses indicated a preference for the normal style without babble noise. C 2015 Acoustical Society of America V

[NX] Date Received: August 17, 2015

Date Accepted: October 28, 2015

1. Introduction The interaction between the person, the room, and the activity leads to different sensations of vocal comfort, control, fatigue, and effort. The maximization of vocal comfort and control, and the minimization of vocal fatigue and effort, is particularly important when (1) the person is at high risk of vocal injury, such as in teaching environments1 when the classroom acoustics are poor,2 and (2) the person is speaking with an overused or under-recovered voice.3 Vocal comfort can be defined as a psychological entity of which the magnitude is determined by those aspects that reduce the vocal effort.4 It appears to decrease with the speaker’s perceived fatigue and the sensation of needing to increase the voice level.5 Vocal comfort can be defined as the capacity to self-regulate vocal behaviour, e.g., sound pressure level (SPL) or intensity. The sensation of control relates to the capacity to adjust the voice to maintain a level that is suitable for communication given the environmental conditions. Vocal fatigue is a progressive increase in phonatory effort, from which one can recover with rest.3 Vocal effort can be defined as the exertion of the speaker as quantified by the A-weighted SPL (dB) at a distance of 1 m from the mouth.6 In addition, vocal effort can be defined as a physiological entity that accounts for changes in voice production when loading increases.7 It is affected by speaker-listener distance, background noise level, and other acoustic characteristics of the room. Moreover, the characteristics of the communication environment are known to affect vocal effort in both adults and children.8 In this study, the effects of voice style (corresponding to soft, normal, and loud levels), background noise level, and external auditory feedback on (1) vocal effort (SPL) and (2) self-reported vocal comfort, control, and fatigue were evaluated. 2. Experimental method The speech of 20 talkers in a semi-reverberant room was recorded in three different styles corresponding to soft, normal, and loud levels, both with and without artificial multi-talker child babble, and with and without polycarbonate panels at 1 m from the subject. These panels increased external auditory feedback, providing an early reflection of a talker’s speech. With protocol approval of the Michigan State University’s Human Research Protection Programs Human Subject’s Review Board, ten male and ten females were recruited to participate. These subjects, of between 18 and 29 years, with a mean ( x ) of 21 years, were self-reported nonsmoking and without a selfreported speech or hearing impairment. The instructions given for the styles were as a)

Author to whom correspondence should be addressed.

EL498 J. Acoust. Soc. Am. 138 (6), December 2015

C 2015 Acoustical Society of America V


[http://dx.doi.org/10.1121/1.4936643]


follows. Soft: “Imagine you are saying something to a friend who is next to you. You want her to hear you but no one else. Do not whisper”; Normal: “Speak in your normal voice”; Loud: “Imagine you are in a classroom and you want to be heard by all of the children.” 2.1 Room acoustic conditions and measurement procedures The experiment took place in a classroom of dimensions 5.8 m 6 m 2.7 m, in which the floor and ceiling were covered by absorbent material (carpet and absorbent tiles). Speech was acquired by an omnidirectional head-mounted microphone (HMM Glottal Enterprises M-80) and recorded by a Roland R-05 digital recorder with a sampling rate of 44.1 kHz. Speech was recorded in two noise conditions: background and babble noise. The average background noise level, mainly generated by the HVAC system, was 40.5 dBA. Children’s babble noise at an averaged A-weighted level of 61 dB (as measured at the talker position) was emitted by a directional speaker (Yamaha studio monitor model HS5). This level represents a common noise level generated by children in a classroom engaged in quiet group work or individual work with some movement.9 Room acoustic parameters were measured in an unoccupied state without furniture from the impulse responses (IRs) generated by a balloon pop.10 The 12 IRs were recorded in four source positions and three microphone positions. Room acoustic parameters in the octave band ranging from 125 to 8000 Hz were calculated. The midfrequency reverberation time (T20 500–1000 Hz) was 0.53 s [standard deviation (s.d.) 0.04], while the mid-frequency clarity (C50 500–1000 Hz) was 5.7 dB (s.d. 1.2). Regarding T20, the standard deviation of the mean spatial values (s.d. 0.01) was lower than the JND (0.03 s) and therefore T20 demonstrated rather uniform spatial behavior. C50 values ranged between 3.52 and 7.47 dB; higher values were found in the positions closer to the window. The dimensions of the transparent shield of the polycarbonate panels were 56 cm by 66 cm (2200 2600 ). The increase in the external auditory feedback introduced by the panels was quantified by means of the C50 calculated from oral-binaural room IRs. These IRs were measured using a Head and Torso Simulator (HATS) placed in the talker position in unoccupied conditions. Sine sweeps were used as excitation signals. Figure 1 shows the trend in C50 in the octave bands ranging from 125 to 8000 Hz with and without panels. The increase due to the panels is evident in the higher frequencies, which are the most important for speech. 2.2 Instructions, stimuli, and questionnaires The subjects were instructed to read a text comprising three standard passages (“Marvin Williams,” 1st paragraph of the “Rainbow” passage, and “Stella”). The text, which was 1 to 2 min in length, was attached to a small stand placed at a distance of 1 m from the speakers. Subjects were asked to answer three questions after each reading of the text. These concerned the experience of talking in the various acoustic conditions. Subjects responded to the questions by making a vertical tick on a continuous horizontal line of 100 mm length (a visual analogue scale or VAS). The score was measured as the distance of the tick from the left end of the line and converted to a percentage. The questions were as follows. (1) Fatigue: How fatigued would your voice be if you were to speak continuously in this condition for 20 min? (2) Comfort: How comfortable was it to speak in this condition? (3) Control: How well were you able to

Fig. 1. C50 (dB) by panel condition per octave band measured using an oral-binaural impulse response. J. Acoust. Soc. Am. 138 (6), December 2015

Bottalico et al. EL499


[http://dx.doi.org/10.1121/1.4936643]


control your voice in this condition? The extremes of the lines were “not at all” (left) and “extremely” (right). 2.3 Analysis MATLAB version 2014b was used for speech signal analysis. For each condition, a time history with SPL evaluated at 0.125 s intervals was obtained for the entirety of each reading of the texts, for a total of 12 time histories per subject. The average among all the SPL values was computed per subject and this mean was subtracted from each time history value for that subject (termed DSPL). This within-subject centering was performed in order to evaluate the variation in the subject’s vocal behavior in the different conditions from their typical vocal behavior. Statistical analysis was conducted using R version 3.1.2. Information-theoretic metrics (including the Akaike information criterion) and the likelihood ratio test were used to compare nested models. Models were built and post hoc comparisons were run using LME4, LMERTEST, and MULTCOMP packages. In particular, linear mixed effect (LME) models were fit by restricted maximum likelihood (i.e., REML estimates of the covariance parameters were calculated). The Satterthwaite method was used to approximate degrees of freedom. 3. Results 3.1 Vocal effort For the objective analyses of vocal effort, the effects of the variables voice style, noise level and panel on DSPL were considered. The effects are shown in Fig. 2. Summary statistics are reported in Table 1. A LME model was run with the response variable DSPL (dB) and the terms style, noise, panel, and time (in 0.125 s intervals) with interactions of style and noise and noise and panel and a correlated random intercept and slope for time and subject. The estimates of the standard deviations of the random effects for the intercept and the slope were 0.71 and 0.019 DSPL. The residual standard deviation was 9.9 DSPL. The fixed effects b coefficients were 11.4 for the intercept, and 0.008 for time (p ¼ 0.086). The estimate for the normal style was 9.2 DSPL higher than that of the soft style [standard error (SE) ¼ 0.11, p < 0.0001], while the estimate for the loud style was 16.8 DSPL higher (SE ¼ 0.11, p < 0.0001). Tukey contrasts indicated a significant difference between all styles at p < 0.0001. The difference between soft and normal and between normal and loud was 7.7 and 6.9 DSPL, respectively. The estimate for the babble noise was 9 DSPL higher than that of the background noise (p < 0.0001). The estimate for panels was 0.23 DSPL lower than that for the room without panels (p < 0.01). The interaction between style and noise was significant [v2 (2) ¼ 863, p < 0.0001] as was the interaction between noise and panel [v2 (1) ¼ 15.7, p < 0.0001]; there was a larger difference in DSPL between babble and background noise conditions in the soft style than in other styles, and there was a larger difference between the

Fig. 2. Variation in DSPL (dB) with style (a), noise (b), and panel (c) conditions. Error bars represent 95% confidence intervals. EL500 J. Acoust. Soc. Am. 138 (6), December 2015

Bottalico et al.


[http://dx.doi.org/10.1121/1.4936643]


Table 1. DSPL and self-reported vocal fatigue, comfort, and control by panel, style, and noise conditions (across subjects). When panel ¼ 1, panels are present; panel ¼ 0, absent. Style ¼ S refers to soft style, style ¼ N, normal style, and style ¼ L, loud or raised style. When noise ¼ 1, babble noise is present; noise ¼ 0, background noise. Condition Panel 0 0 0 0 0 0 1 1 1 1 1 1

DSPL (dB)

Fatigue (%)

Comfort (%)

Control (%)

Style

Noise

x

s.d.

x

s.d.

x

s.d.

x

s.d.

S N L S N L S N L S N L

0 0 0 1 1 1 0 0 0 1 1 1

11.9 2.2 5.1 2.8 3.3 9.5 11.7 2.8 4.8 3.6 2.6 9.0

8.7 10.8 12.5 6.6 8.5 10.4 9.0 11.0 13.0 6.5 8.4 10.8

25.6 34.0 55.2 30.3 30.0 59.2 26.6 24.9 55.2 30.1 30.0 63.4

26.2 25.6 22.5 24.2 21.5 24.7 23.0 23.7 18.7 25.4 26.8 18.4

57.8 69.2 46.6 44.2 58.1 34.5 48.4 71.3 52.9 40.6 56.6 35.7

27.3 19.9 21.9 21.7 24.0 18.3 25.2 14.8 17.4 20.8 22.6 15.8

48.0 72.4 53.6 45.7 63.0 40.0 47.1 68.3 59.0 44.5 65.3 44.0

26.0 17.8 19.8 21.3 18.8 20.6 26.3 19.8 18.1 20.3 17.3 18.2

panel and no panel conditions when babble noise was present than when it was absent. The interaction of noise and panel is shown in Fig. 3. 3.2 Self-reported vocal fatigue, comfort, and control For self-reported fatigue, comfort, and control, the effects of the variables voice style, noise level, and panel were considered. Summary statistics are reported in Table 1. Three LME models were fit by REML in which the subjective responses on the VASs were the response variables fatigue, comfort, and control (F, CM, and CN). The predictors were style, noise, and panel, and there was a random intercept for subject. In the case of self-reported fatigue, there was an effect of style. Tukey’s multiple comparisons indicated a difference between soft and loud and between normal and

Fig. 3. Variation in DSPL (dB) with noise (x axis), style (facets), and panel (line type) conditions. Error bars represent confidence intervals. J. Acoust. Soc. Am. 138 (6), December 2015



[http://dx.doi.org/10.1121/1.4936643]


loud styles at p < 0.0001. As could be expected, the loud style was associated with higher self-reported fatigue than soft and normal styles. The estimate of the standard deviation of the random effect (subject) was 10.41%. The residual standard deviation was 21%. No effect of panel or noise was found; however, there was a tendency for self-reported fatigue to increase in the absence of panels and also in the presence of babble noise. The absence of a significant effect of noise confirms the unconscious nature of the Lombard effect. Regarding self-reported comfort, there was an effect of noise; in the background noise condition, the estimate was 12.8% higher than that for the babble condition (SE ¼ 2.5, p < 0.0001). That is, comfort decreased significantly in the presence of babble noise. There was also an effect of style [v2 (2) ¼ 129.1, p < 0.0001]. Tukey contrasts indicated a difference between normal and soft styles and normal and loud styles at p < 0.0001. As might be expected, the normal style was associated with greater selfreported comfort than soft and loud styles. The results for self-reported control were similar to those for self-reported comfort. This finding is predictable given the insight into the relationship between control and comfort that “the human being is a comfort-seeking animal who will, given the opportunity, interact with the environment in ways that secure comfort.”11 The presence of babble noise was associated with a decrease of 7.6% in the estimate (p < 0.005). There was also an effect of style [v2 (2) ¼ 55.6, p < 0.0001]. Tukey contrasts for the style variable were very similar to those associated with self-reported comfort (normal-soft and normal-loud contrasts were significant at p < 0.0001). 4. Conclusions This study describes the effect of voice style, background noise level, and external auditory feedback on vocal effort (SPL) and self-reported vocal comfort, control, and fatigue. The results indicate a reliable effect of style on SPL. The difference in vocal effort (SPL) between soft and normal styles was 9.33 dB while the difference between soft and loud was 16.78 dB. Perceived vocal fatigue did not increase from soft to normal styles but there was an increase of 30% in perceived vocal fatigue from normal/ soft to loud styles. Self-reported voice comfort and control were higher (by 20%) in normal style than in soft and loud styles, while soft and loud styles did not differ. Regarding the effect of the artificial multi-talker babble noise, there was an increase in SPL of 8.96 dB when babble noise was present relative to the background noise condition. Given the variation in noise level of approximately 20 dB, the slope of the increase in voice level with noise (Lombard effect) was 0.24 dB/dB. This result is similar to the slope of 0.33 dB/dB found by Kryter12 in a laboratory setting. Selfreported voice comfort and control were lower (12.8% and 7.7%, respectively) when babble noise was present. No effect of noise on self-reported fatigue was found, confirming that the Lombard effect is unconscious in nature. The increase in the external auditory feedback due to the presence of reflective panels significantly affected vocal effort. SPL decreased by a statistically significant 0.23 dB when panels were present. Importantly, in babble noise, SPL decreased by 0.5 dB. That is to say, the subjects benefited in an objectively measurable way from the panels, but this benefit was not perceived by the subjects. In conclusion, the present paper reports the effects of voice style, background noise level, and external auditory feedback on subjective and objective voice measurements. These effects were measured under laboratory conditions. Conversations in real world environments with communication partners typically involve communicative (e.g., information-sharing and social) goals, which can be difficult to replicate within a laboratory environment. Nevertheless, such factors as communication goals will be considered in future laboratory work. Previous studies showed a higher slope of the Lombard effect in real settings. For example, a slope of 0.78 dB/dB was reported for teachers in real classrooms.2 Hence, future research will consider early reflection effects in real classroom settings, in which the effects are predicted to increase in strength. Acknowledgments The authors would like to thank L. Hunter, L. Glowski, and A. Lee and the subjects for their involvement. Research was supported by the NIDCD of the NIH under Award No. R01DC012315. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. References and links 1

E. J. Hunter and I. R. Titze, “Variations in intensity, fundamental frequency, and voicing for teachers in occupational versus non-occupational settings,” J. Speech Lang. Hear. Res. 53(4), 862–875 (2010).

EL502 J. Acoust. Soc. Am. 138 (6), December 2015

Bottalico et al.


[http://dx.doi.org/10.1121/1.4936643]


2

P. Bottalico and A. Astolfi, “Investigations into vocal doses and parameters pertaining to primary school teachers in classrooms,” J. Acoust. Soc. Am. 131, 2817–2827 (2012). 3 E. J. Hunter and I. R. Titze, “Quantifying vocal fatigue recovery: Dynamic vocal recovery trajectories after a vocal loading exercise,” Ann. Otol. Rhinol. Laryngol. 118(6), 449–460 (2009). 4 I. R. Titze, Principles of Voice Production (National Center for Voice and Speech, Salt Lake City, 2000), pp. 1–409. 5 P. Pelegrın-Garcıa and J. Brunskog, “Speakers’ comfort and voice level variation in classrooms: Laboratory research,” J. Acoust. Soc. Am. 132, 249–260 (2012). 6 ISO 9921:2002(E), Ergonomics—Assessment of Speech Communication (International Organization for Standardization, Geneva, 2002). 7 H. Traunm€ uller and A. Eriksson, “Acoustic effects of variation in vocal effort by men, women and children,” J. Acoust. Soc. Am. 107, 3438–3451 (2000). 8 E. J. Hunter, A. E. Halpern, and J. L. Spielman, “Impact of four nonclinical speaking environments on the child’s fundamental frequency and voice level: A preliminary case study,” Lang. Speech Hear. Serv. Schools 43, 252–263 (2012). 9 B. Shield and J. E. Dockrell, “External and internal noise surveys of London primary schools,” J. Acoust. Soc. Am. 115(2), 730–738 (2004). 10 ISO 3382-2:2008(E), Acoustics—Measurement of Room Acoustic Parameters, Part 2: Reverberation Time in Ordinary Rooms (International Organization for Standardization, Geneva, 2008). 11 M. A. Humphreys and J. F. Nicol, “Understanding the adaptive approach to thermal comfort,” ASHRAE Tech. Data Bull. 14(1), 1–14 (1998). 12 K. D. Kryter, “Effects of ear protective devices on the intelligibility of speech in noise,” J. Acoust. Soc. Am. 18, 413–417 (1946).

J. Acoust. Soc. Am. 138 (6), December 2015


Subjective and Objective Effects of Androgen Ablation Therapy on Voice.

Objective and Subjective Aspects of Voice in Pregnancy.

Multiparameter voice assessment for voice disorder patients: a correlation analysis between objective and subjective parameters.

Objective and subjective voice examination in korean medicine.

Voice change in end-stage renal disease patients after hemodialysis: correlation of subjective hoarseness and objective acoustic parameters.

Voice selectivity in the temporal voice area despite matched low-level acoustic cues.

Effect of Training and Level of External Auditory Feedback on the Singing Voice: Volume and Quality.

Subjective and objective parameters of the adult female voice after cricotracheal resection and dilation.

[Subjective and objective voice assessment following partial resection of the larynx].

Classroom Noise and Teachers' Voice Production.

Objective dysphonia measures in the program Praat: smoothed cepstral peak prominence and acoustic voice quality index.

Influence of Smartphones and Software on Acoustic Voice Measures.

Combined Use of Standard and Throat Microphones for Measurement of Acoustic Voice Parameters and Voice Categorization.

Effects of septoplasty on speech and voice.

Cerebellar voice tremor: an acoustic analysis.

An Objective Parameter for Quantifying the Turbulent Noise Portion of Voice Signals.

Acoustic and aerodynamic measures of the voice during pregnancy.

Acoustic and perceptual characteristics of the voice in patients with vocal polyps after surgery and voice therapy.

Integrating voice evaluation: correlation between acoustic and audio-perceptual measures.

The potential effects of rhinoplasty on voice.

Benefit From Directional Microphone Hearing Aids: Objective and Subjective Evaluations.

Maximum duration of phonation: objective tool for assessment of voice.

EGG and acoustic analyses of different voice samples: comparison between perceptual evaluation and voice activity and participation profile.

Exploring the feasibility of smart phone microphone for measurement of acoustic voice parameters and voice pathology screening.