On the use of comfortable listening levels in speech experiments Claude Simona) Departmentof Phonetics and Linguistics,University CollegeLondon,GowerStreet,LondonWC1, Great Britain
(Received24 March 1977)
In orderto investigate theeffectof levelof presentation on subjects' labeling of speechlike soundpatterns, synthetic stimuliwereconstructed, varyingsystematically F0 contours, VOT, andF2 transitions. These stimuliwerepresented in randomsequences at levelsbetween 15and 105dB SPLto subjects withnormal
hearing. No significant response variation wasobserved in therange40-100dB SPL.Subjects' labeling' behavior suddenly breaksdownbelowlevelsof around35 dB SPL.Thesecondary findings of thestudy arealsodiscussed in termsof different specific processing strategies for different specific speech features. PACS numbers:43.70.Dn, 43.70.Ve, 43.66.Cb INTRODUCTION
sizer. • Four basic speechsoundpatternswere manipu-
Normally hearing listeners are able to adjust their
most comfortable listening level (MCL)very consistently over repeated trials. In speech situations, most individuals do not vary more than 10 dB and the overall group variation is less than 1 dB (Yentry, Rubin, and
lated: fundmental frequency (Fo) contour, voice onset time (VOT), F1, and F2 transitions. The spectral characteristics
1-3.
of these stimuli are illustrated
in Figs.
Fo patterns (Fig. 1) were carried by a single-
syllable utterance "Oh."
They changed from extreme
Sjogren, 1976). Interlistener variability, however, is
rise ("Oh?," stimulus no. 1) to extreme fall ("Oh!," stimulus no. 9) in equal logarithmic steps with respect
relatively high: Standard deviations of MCL in a group of listeners vary according to reports between 7 dB
to the level tone of stimulus no. 5. The Fo patterns of stimuli no. 1 and 9 were exact copies oœnatural utter-
(Gabrielssonet al., 1974) and 12 dB (Ventry, Rubin,
ances by a British female speaker (see also Fourcin, 1974). VOT and F1 transition patterns are illustrated in Fig. 2. The stimuli ranged from "coat" (VOT = 70 ms) to "goat" (VOT = 0 ms) in VOT steps of 10 ms. Fig-
Hill,
1971; Martin et al.,
1976; Gabrielsson and
and Hill, 1971; Martin, personal communication). Although some workers believe that there may be some
difference in MCL settings'between sexes (Martin, personal communication; Ventry, Rubin, and Hill,
1971,
p. 1811), it has not beenfoundto be statistically significant (Ventry, Rubin, and Hill, 1971). In the majority of speech perception experiments, stimuli are presented at a comfortable loudness level for the subjects. The mere variability
range of about 20 dB (standard devia-
tion is about 12 dB) in individual MCL settings casts a shadow of doubt upon the reliability of such a method. In particular, it may well be of some consequenceon
subject's labeling of speechlike synthetic stimuli where
ure 3 shows the spectral characteristics of the F2 pat-• tern stimuli. They ranged from "boo" to "do," or
phonetically[bu]to [du], the onsetof F2 transitiongoing from 750 to 1890 Hz in eight logarithmic steps. All other acoustic parameters remained constant throughout all stimuli. The periodic excitation of formants always started at the same relative instant, i.e., the first excitation pulse always occurred haft a period after the onset of Fl. These stimuli were arranged into five different quasi random lists, each of which contained three
the signal has been deprived of most redundant features.
occurrences
Moreover, with the development of speech and speech pattern audiometric tests (see for example, Fourcin,
on magnetic tapes and were played binaurally through
1974), it would be desirable to compare results of such tests with standardized scores obtainedfrom normally hearing subjects when the level of presentation is varied over a relatively wide range. The present experiments will hopefully contribute some normal response pattern
references which could serve as a basis for evaluating results obtained with hearing impaired listeners. I.
EXPERIMENTAL
SETUP
In order to investigate the effect of loudness level of
stimulus.
The
lists
were
recorded
headphones to ten normally hearingBritish adults•' at five different levels in varying random order. These levels were 15, 35, 55, 75, and 95 dB SPL rms for the "coat-goat" and the "Oh" stimuli, 10 dB higher for all levels
for the "boo-do"
stimuli.
Listeners
were
asked
to make a binary choice and label the three types of stimuli in terms of "rise" and "fall" for the F o patterns, "coat" or "goat" for the VOT and F1 patterns, and "boo" or "do" for the F2 patterns. II.
presentation on listeners' labeling performances, speechlike synthetic stimuli were constructed with a
of each
RESULTS
The responses of all subjects were pooledfor each
level of presentation. They are shownin Figs. 4-6. In
computer-controlled parallel formant speech synthe1
a)Presentaddress:Communicantion SciencesLaboratoryDoetotal Program in Speech and Hearing Science, CUNY Grad•uate School and University Center, 33 West 42nd St., New York, NY 10036 and Dept. of Communication Sciences and Disorders, Montclair State College, Upper Montclair, NJ 07043.
744
5
-J
9
FIG. 1. Fundamental frequency con- 500 Hztour of the "Oh" stimuli. Only stim-
-
'•- 100 hz
uli no. 1, 5, and 9 are shown.
Is
J.Acoust Soc.Am.64(3),Sept1978
'0001-4966/78/6403-0744500.80¸ 1978Acoustical Society ofAmerica
744
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 132.174.255.116 On: Mon, 22 Dec 2014 10:44:28
745
ClaudeSimon:On the useof comfortablelisteninglevels
745
R F3
.......
/
........ \
'-'
F2
'.,
\
',,,
F1
Fx KHz
I
I
i
!
I
,
1 FIG. 2. Spectral characteristics of the "Coat-goat" stimuli. Fo contour represents the frequency of the pulse excitation.
Resonances(formants) are shownin black during periodic excitation and in white with black outline during noise excitation. Formant widths correspond to relative amplitudes. Bandwidths are 80, 120, and 180 Hz for F1, F2, and F3, respectively.
same stimuli (Fourcin, 1974; Simonand Fourcin, 1976). Responsecurves correspondingto levels of 95, 75, and 55 dB SPL follow one another very closely and cross the level of random labeling, as expected, between stimulus 5 and stimulus 6. When the level of presentation is lowered, however, to 35 dB SPL the response starts changingnoticeably (shift of label boundary) and becomes quite atypical for a level of 15 dB SPL.
In Fig. 5, the response to the "coat-goat" stimuli is shown as a function of VaT along the x axis. Levels
95-55 dB SPL give rise to very similar response patterns, with the changeoverfrom "coat" labels ("K") to '•goat"labels ("G") occurring between 20 and 30 ms VaT. For a level of 35 dB SPL the phoneme boundary shifts to a much larger VaT value (nearly 40 ms) and for level 15 dB, not only the boundaryhas shifted but [ bu-du]
,
i
5 stimulus no.
9
FIG. 4. Responses of subjects to the "Oh" stimuli, in terms of "rise" labels (R) and "fall" labels (F) as a function of stimulus number. The top and bottom of the vertical axis correspond, respectively, to 100% "rise" and 100% "fall" labels. The horizontal line half way up the vertical axis represents the level of random labeling (i.e., equal numbers of voiced and voiceless
Fig. 4, R represents rise labels, F represents fall labels, as a function of stimulus number along the horizontal axis. The horizontal line half way up the vertical axis corresponds to equal numbers of rise and fall labels. The general shapes of the response curves in Fig. 4 agrees with previous results obtained with the
,
......
labels).
Stimulus no. 5 has a level Fo contour.
15 dB SPL, --
75 dB SPL,
-35 dB SPL, ---55
dBSPL,•
95 dB SPL.
subjects have become much less confident in their judgements, which is after all not surprising. In short, there is a marked difference between response patterns for levels of 15 and 35 dB on the one hand, and response patterns for levels above, say, 50 dB on the other.
Figure 6 showsthe labels givenby subjectsto the "boodo" stimuli (variable F2 transitions).
These stimuli
were presented at slightly higher levels than the previous ones, i.e., 25, 45, 65, and 85 dB, and are very similar to one another, showing good categorical labeling and phoneme boundary occurring at F2 locus frequency between !. 3 nna !. 4 k•.. mha r•..•pnn.• ,n .•,im•lli presented at a level of 105 dB is less categorical, shows
more/b/
("B") labels than for the previous three lev-
els, and the phoneme boundary'now occurs at 1.5 kHz.
G
stimuli
1'5'
F2
F1 i
ß2 Fx
i
i
ms
KHz
(rising F1 onset)
FIG. 3. Spectral characteristics of the '%oo-do" stimuli. Only stimulus no. 1 is shown here (F2 locus = 750 Hz). Formant widths correspond to relative amplitudes. Bandwidths are 80 and 120 Hz for F1 and F2, respectively. Fo contour represents the frequency of the pulse excitation.
FIG. 5. Responses of subjects to the "coat-goat" stimuli in terms of "coat" lebels (K) and "goat" labels (G) as a function of the VOT value of each stimulus along the x axis. The top
andbottom of the vertical axis correspond, respectively, to 100% voiced labels and 100% voiceless labels. SPL, ---35 dB SPL, ---55 dBSPL, •75
- ..... 15 dB dBSPL,
95 dB SPL.
J. Acoust. Soc. Am., Vol. 64, No. 3, September1978
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 132.174.255.116 On: Mon, 22 Dec 2014 10:44:28
746
Claude Simon:Ontheu•eof comfortable listening levels
746 TABLE II. Example of penalty scoring (for "boo-do" stimuli) on subjects' responses. The number of/b/labels is shown
B
for each subject under each stimulus number and summed on the bottom line which is used to work out the average phoneme boundary. In this case, it occurs between stimuli 6 and 7. Individual responses, ideally, should score 3 on the left of stimulus 7 and 0 on the right of stimulus 6. The absolute difference
between
ideal
and observed
scores
are
summed
in
the rightmost column, thus giving a penalty score for each subject in each stimulus set. ,
STIMULUS
1
2
3
4
5
6
7
8
9
PENALTY SCORES ,
SUBJECT
1
3
3
3
3
3
1
1
0
0
3
2
2
3
2
2
3
2
3
2
3
12
3
3
3
3
3
3
3
3
1
0
4
terms of "boo" labels (B) and "do" labels (D) as a function of
4
3
3
3
3
3
3
1
0
0
1
F2 locus frequency for each stimulus along the x axis. The top and bottom of the vertical axis correspond, respectively,
5
3
3
3
3
2
0
0
0
0
4
to 100% "boo" labels and 100% "do" labels.
6
3
3
3
3
3
3
1
0
0
1
7
3
2
3
3
3
3
3
1
0
5
8
3
3
3
3
3
3
3
1
0
4
9
3
3
3
2
3
2
0
2
1
5
10
3
3
3
3
3
2
0
0
0
1
29
29
29
28
29
22
15
7
4
7•0
11901340 1500 I•18•0 F2 locus (Hz)
FIG. 6. Responsesof subjects to the "boo-do" stimuli in
- - -45 dB SPL, - - -65 dB SPL,
- .....
25 dB SPL,
85 dB SPL,
105 dB SPL.
In the case of level of 25 dB SPL, the overall response
is poor, showingonly a slight apparenttrend of discrimination between the two types of labeling.
Several important points must be clarified about these results. One could argue that at the lowest levels of presentation, subjects may not have been able to hear words at all, which would have confusedthe results. Also the variation of response pattern as a function of level of presentation needs to be evaluated more objectively. The Pearson correlation coefficient r between stimulus number and type of label given was calculated for all listeners
and all levels.
It was found that sub-
jects' response changessignificantly as a function of stimulus variation even for the lowest presentation lev-
els: at 15 dB SPL, for the "oh" and the "coat-goat" stimuli, r = - 0. 828 and r=- 0. 913, respectively, both
coefficients beingsignificant at the 0L01level (onetailed
tests), and at 25 dB SPL, for the '•)oo-do" stimuli, •, = - 0. 621 (p = 0. 05, one tailed test). When, however, an analysis of variance, using the arcsine transformation (see Brownless, Hodges, and Rosenblatt, 1953), was performed on the listeners' scores, the F ratios obtained were not found to be of statistical significance
(p larger than0.05 for all stimuli). Onthe otherhand, a t-test performed on the very same scores obtained for
low level and comfortable level of presentation (lowest level versus nearest level to 80 dB SPL) showed that subjects gave significantly more confident responses
(higher scores) when listening to stimuli at a comfort-
able loudnesslevel (• < 0.01 for all three types of TABLE I.
Results of a t-test performed on the labeling scores
obtained by subjects with the lowest and comfortable levels of presentationfor the three types of stimuli. Subjects' labeling is consistently poorer for the lowest presentation levels. STIMULUS
VARIABLE
df
T value
p (1 tail)
level
8
3.23
0. 006
stimuli) as can be seen in Table I. The apparent contradiction between the last two sets of results prompted us to find other ways of quantifying the subjects' responses. Two procedures were used. The first one consisted in assigning a penalty score to each subject in each test in the manner described in
Table II. 3 The secondone, whichwas shownby Simon Oh
(15 vs 75 dB)
c-goat
15 vs 95 dB
6
3.46
0.007
b-doo
25 vs
8
5.87
0.0005
85 dB
,
and Fourcin (19'•8) to be powerful tool for the treatment of labeling responses, involved the classification of subjects' response patterns into three types: categorical, progressive, and scattered labeling. This method is described in Fig. '•. The results of these two procedures
are given in Table III (for penalty scores) and in Table IV (for response types distribution). An analysis of variance was performed
on these two sets of data and its,
J. Acoust. Soc. Am., Vol. 64, No. 3, September1978
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 132.174.255.116 On: Mon, 22 Dec 2014 10:44:28
747
Claude Simon:Ontheuseof comfortable listening levels
747 III.
AND
FIRST
n
STIMULI
GET
75% LABEL
LAST
n
STIMULI
GET
75% LABEL
DISCUSSION
AND
CONCLUDING
REMARKS
Subjects give predominantly categorical responses at comfortable levels of presentation and mostly scattered and progressive labeling at low levels (see also
1
Table IV).
Neither penalty scores nor responses types,
however, change significantly between presentation levels above the lowest level, say between 30 and 100 dB
YES
SPL (see Table VI). One might have expected a progressive deterioriation of labeling curves and response patINVERSIONS
terns as the level was progressively decreased. Instead one witnesses a sudden breakdown of subjects' labeling behavior at or around 30 dB SPL. At rms lev-
?
els of 15 and 25 dB SPL, the crucial cues may not have been audible, since they must have been well below 15 dB--although this was not specifically measured. The relevant feature in the VOT-pattern stimuli (onset of CATEGORY
1
2
(categorical)
voicing) will be more prominent (information contained in all three formants) than for the F2-pattern stimuli
3
(progressive)
(scattered)
in which the relevant information FIG. 7. Decision flow chart used for classifying response types. Responses type i is called "categorical" (clear categorization of stimuli and sharp switchover from one label to
the other), response type 2 is "progressive" (progressive transition from one label to the other) and response type 3 is "scattered" (no clear labeling strategy exhibited by the subject).
seems.
.outcome is shown in Table V. Penalty scores increase 'as the level of presentation decreases, significantly so
Place categories
in speech sounds tend to be
more universally constant (across languages)than categories along the voicing continuumø This could well be due to the presence of "special" neural processes in the
Ior voicing judgments ("coat.-goat" test;p