On the use of comfortable listening levels in speech experiments Claude Simona) Departmentof Phonetics and Linguistics,University CollegeLondon,GowerStreet,LondonWC1, Great Britain

(Received24 March 1977)

In orderto investigate theeffectof levelof presentation on subjects' labeling of speechlike soundpatterns, synthetic stimuliwereconstructed, varyingsystematically F0 contours, VOT, andF2 transitions. These stimuliwerepresented in randomsequences at levelsbetween 15and 105dB SPLto subjects withnormal

hearing. No significant response variation wasobserved in therange40-100dB SPL.Subjects' labeling' behavior suddenly breaksdownbelowlevelsof around35 dB SPL.Thesecondary findings of thestudy arealsodiscussed in termsof different specific processing strategies for different specific speech features. PACS numbers:43.70.Dn, 43.70.Ve, 43.66.Cb INTRODUCTION

sizer. • Four basic speechsoundpatternswere manipu-

Normally hearing listeners are able to adjust their

most comfortable listening level (MCL)very consistently over repeated trials. In speech situations, most individuals do not vary more than 10 dB and the overall group variation is less than 1 dB (Yentry, Rubin, and

lated: fundmental frequency (Fo) contour, voice onset time (VOT), F1, and F2 transitions. The spectral characteristics

1-3.

of these stimuli are illustrated

in Figs.

Fo patterns (Fig. 1) were carried by a single-

syllable utterance "Oh."

They changed from extreme

Sjogren, 1976). Interlistener variability, however, is

rise ("Oh?," stimulus no. 1) to extreme fall ("Oh!," stimulus no. 9) in equal logarithmic steps with respect

relatively high: Standard deviations of MCL in a group of listeners vary according to reports between 7 dB

to the level tone of stimulus no. 5. The Fo patterns of stimuli no. 1 and 9 were exact copies oœnatural utter-

(Gabrielssonet al., 1974) and 12 dB (Ventry, Rubin,

ances by a British female speaker (see also Fourcin, 1974). VOT and F1 transition patterns are illustrated in Fig. 2. The stimuli ranged from "coat" (VOT = 70 ms) to "goat" (VOT = 0 ms) in VOT steps of 10 ms. Fig-

Hill,

1971; Martin et al.,

1976; Gabrielsson and

and Hill, 1971; Martin, personal communication). Although some workers believe that there may be some

difference in MCL settings'between sexes (Martin, personal communication; Ventry, Rubin, and Hill,

1971,

p. 1811), it has not beenfoundto be statistically significant (Ventry, Rubin, and Hill, 1971). In the majority of speech perception experiments, stimuli are presented at a comfortable loudness level for the subjects. The mere variability

range of about 20 dB (standard devia-

tion is about 12 dB) in individual MCL settings casts a shadow of doubt upon the reliability of such a method. In particular, it may well be of some consequenceon

subject's labeling of speechlike synthetic stimuli where

ure 3 shows the spectral characteristics of the F2 pat-• tern stimuli. They ranged from "boo" to "do," or

phonetically[bu]to [du], the onsetof F2 transitiongoing from 750 to 1890 Hz in eight logarithmic steps. All other acoustic parameters remained constant throughout all stimuli. The periodic excitation of formants always started at the same relative instant, i.e., the first excitation pulse always occurred haft a period after the onset of Fl. These stimuli were arranged into five different quasi random lists, each of which contained three

the signal has been deprived of most redundant features.

occurrences

Moreover, with the development of speech and speech pattern audiometric tests (see for example, Fourcin,

on magnetic tapes and were played binaurally through

1974), it would be desirable to compare results of such tests with standardized scores obtainedfrom normally hearing subjects when the level of presentation is varied over a relatively wide range. The present experiments will hopefully contribute some normal response pattern

references which could serve as a basis for evaluating results obtained with hearing impaired listeners. I.

EXPERIMENTAL

SETUP

In order to investigate the effect of loudness level of

stimulus.

The

lists

were

recorded

headphones to ten normally hearingBritish adults•' at five different levels in varying random order. These levels were 15, 35, 55, 75, and 95 dB SPL rms for the "coat-goat" and the "Oh" stimuli, 10 dB higher for all levels

for the "boo-do"

stimuli.

Listeners

were

asked

to make a binary choice and label the three types of stimuli in terms of "rise" and "fall" for the F o patterns, "coat" or "goat" for the VOT and F1 patterns, and "boo" or "do" for the F2 patterns. II.

presentation on listeners' labeling performances, speechlike synthetic stimuli were constructed with a

of each

RESULTS

The responses of all subjects were pooledfor each

level of presentation. They are shownin Figs. 4-6. In

computer-controlled parallel formant speech synthe1

a)Presentaddress:Communicantion SciencesLaboratoryDoetotal Program in Speech and Hearing Science, CUNY Grad•uate School and University Center, 33 West 42nd St., New York, NY 10036 and Dept. of Communication Sciences and Disorders, Montclair State College, Upper Montclair, NJ 07043.

744

5

-J

9

FIG. 1. Fundamental frequency con- 500 Hztour of the "Oh" stimuli. Only stim-

-

'•- 100 hz

uli no. 1, 5, and 9 are shown.

Is

J.Acoust Soc.Am.64(3),Sept1978

'0001-4966/78/6403-0744500.80¸ 1978Acoustical Society ofAmerica

744

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 132.174.255.116 On: Mon, 22 Dec 2014 10:44:28

745

ClaudeSimon:On the useof comfortablelisteninglevels

745

R F3

.......

/

........ \

'-'

F2

'.,

\

',,,

F1

Fx KHz

I

I

i

!

I

,

1 FIG. 2. Spectral characteristics of the "Coat-goat" stimuli. Fo contour represents the frequency of the pulse excitation.

Resonances(formants) are shownin black during periodic excitation and in white with black outline during noise excitation. Formant widths correspond to relative amplitudes. Bandwidths are 80, 120, and 180 Hz for F1, F2, and F3, respectively.

same stimuli (Fourcin, 1974; Simonand Fourcin, 1976). Responsecurves correspondingto levels of 95, 75, and 55 dB SPL follow one another very closely and cross the level of random labeling, as expected, between stimulus 5 and stimulus 6. When the level of presentation is lowered, however, to 35 dB SPL the response starts changingnoticeably (shift of label boundary) and becomes quite atypical for a level of 15 dB SPL.

In Fig. 5, the response to the "coat-goat" stimuli is shown as a function of VaT along the x axis. Levels

95-55 dB SPL give rise to very similar response patterns, with the changeoverfrom "coat" labels ("K") to '•goat"labels ("G") occurring between 20 and 30 ms VaT. For a level of 35 dB SPL the phoneme boundary shifts to a much larger VaT value (nearly 40 ms) and for level 15 dB, not only the boundaryhas shifted but [ bu-du]

,

i

5 stimulus no.

9

FIG. 4. Responses of subjects to the "Oh" stimuli, in terms of "rise" labels (R) and "fall" labels (F) as a function of stimulus number. The top and bottom of the vertical axis correspond, respectively, to 100% "rise" and 100% "fall" labels. The horizontal line half way up the vertical axis represents the level of random labeling (i.e., equal numbers of voiced and voiceless

Fig. 4, R represents rise labels, F represents fall labels, as a function of stimulus number along the horizontal axis. The horizontal line half way up the vertical axis corresponds to equal numbers of rise and fall labels. The general shapes of the response curves in Fig. 4 agrees with previous results obtained with the

,

......

labels).

Stimulus no. 5 has a level Fo contour.

15 dB SPL, --

75 dB SPL,

-35 dB SPL, ---55

dBSPL,•

95 dB SPL.

subjects have become much less confident in their judgements, which is after all not surprising. In short, there is a marked difference between response patterns for levels of 15 and 35 dB on the one hand, and response patterns for levels above, say, 50 dB on the other.

Figure 6 showsthe labels givenby subjectsto the "boodo" stimuli (variable F2 transitions).

These stimuli

were presented at slightly higher levels than the previous ones, i.e., 25, 45, 65, and 85 dB, and are very similar to one another, showing good categorical labeling and phoneme boundary occurring at F2 locus frequency between !. 3 nna !. 4 k•.. mha r•..•pnn.• ,n .•,im•lli presented at a level of 105 dB is less categorical, shows

more/b/

("B") labels than for the previous three lev-

els, and the phoneme boundary'now occurs at 1.5 kHz.

G

stimuli

1'5'

F2

F1 i

ß2 Fx

i

i

ms

KHz

(rising F1 onset)

FIG. 3. Spectral characteristics of the '%oo-do" stimuli. Only stimulus no. 1 is shown here (F2 locus = 750 Hz). Formant widths correspond to relative amplitudes. Bandwidths are 80 and 120 Hz for F1 and F2, respectively. Fo contour represents the frequency of the pulse excitation.

FIG. 5. Responses of subjects to the "coat-goat" stimuli in terms of "coat" lebels (K) and "goat" labels (G) as a function of the VOT value of each stimulus along the x axis. The top

andbottom of the vertical axis correspond, respectively, to 100% voiced labels and 100% voiceless labels. SPL, ---35 dB SPL, ---55 dBSPL, •75

- ..... 15 dB dBSPL,

95 dB SPL.

J. Acoust. Soc. Am., Vol. 64, No. 3, September1978

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 132.174.255.116 On: Mon, 22 Dec 2014 10:44:28

746

Claude Simon:Ontheu•eof comfortable listening levels

746 TABLE II. Example of penalty scoring (for "boo-do" stimuli) on subjects' responses. The number of/b/labels is shown

B

for each subject under each stimulus number and summed on the bottom line which is used to work out the average phoneme boundary. In this case, it occurs between stimuli 6 and 7. Individual responses, ideally, should score 3 on the left of stimulus 7 and 0 on the right of stimulus 6. The absolute difference

between

ideal

and observed

scores

are

summed

in

the rightmost column, thus giving a penalty score for each subject in each stimulus set. ,

STIMULUS

1

2

3

4

5

6

7

8

9

PENALTY SCORES ,

SUBJECT

1

3

3

3

3

3

1

1

0

0

3

2

2

3

2

2

3

2

3

2

3

12

3

3

3

3

3

3

3

3

1

0

4

terms of "boo" labels (B) and "do" labels (D) as a function of

4

3

3

3

3

3

3

1

0

0

1

F2 locus frequency for each stimulus along the x axis. The top and bottom of the vertical axis correspond, respectively,

5

3

3

3

3

2

0

0

0

0

4

to 100% "boo" labels and 100% "do" labels.

6

3

3

3

3

3

3

1

0

0

1

7

3

2

3

3

3

3

3

1

0

5

8

3

3

3

3

3

3

3

1

0

4

9

3

3

3

2

3

2

0

2

1

5

10

3

3

3

3

3

2

0

0

0

1

29

29

29

28

29

22

15

7

4

7•0

11901340 1500 I•18•0 F2 locus (Hz)

FIG. 6. Responsesof subjects to the "boo-do" stimuli in

- - -45 dB SPL, - - -65 dB SPL,

- .....

25 dB SPL,

85 dB SPL,

105 dB SPL.

In the case of level of 25 dB SPL, the overall response

is poor, showingonly a slight apparenttrend of discrimination between the two types of labeling.

Several important points must be clarified about these results. One could argue that at the lowest levels of presentation, subjects may not have been able to hear words at all, which would have confusedthe results. Also the variation of response pattern as a function of level of presentation needs to be evaluated more objectively. The Pearson correlation coefficient r between stimulus number and type of label given was calculated for all listeners

and all levels.

It was found that sub-

jects' response changessignificantly as a function of stimulus variation even for the lowest presentation lev-

els: at 15 dB SPL, for the "oh" and the "coat-goat" stimuli, r = - 0. 828 and r=- 0. 913, respectively, both

coefficients beingsignificant at the 0L01level (onetailed

tests), and at 25 dB SPL, for the '•)oo-do" stimuli, •, = - 0. 621 (p = 0. 05, one tailed test). When, however, an analysis of variance, using the arcsine transformation (see Brownless, Hodges, and Rosenblatt, 1953), was performed on the listeners' scores, the F ratios obtained were not found to be of statistical significance

(p larger than0.05 for all stimuli). Onthe otherhand, a t-test performed on the very same scores obtained for

low level and comfortable level of presentation (lowest level versus nearest level to 80 dB SPL) showed that subjects gave significantly more confident responses

(higher scores) when listening to stimuli at a comfort-

able loudnesslevel (• < 0.01 for all three types of TABLE I.

Results of a t-test performed on the labeling scores

obtained by subjects with the lowest and comfortable levels of presentationfor the three types of stimuli. Subjects' labeling is consistently poorer for the lowest presentation levels. STIMULUS

VARIABLE

df

T value

p (1 tail)

level

8

3.23

0. 006

stimuli) as can be seen in Table I. The apparent contradiction between the last two sets of results prompted us to find other ways of quantifying the subjects' responses. Two procedures were used. The first one consisted in assigning a penalty score to each subject in each test in the manner described in

Table II. 3 The secondone, whichwas shownby Simon Oh

(15 vs 75 dB)

c-goat

15 vs 95 dB

6

3.46

0.007

b-doo

25 vs

8

5.87

0.0005

85 dB

,

and Fourcin (19'•8) to be powerful tool for the treatment of labeling responses, involved the classification of subjects' response patterns into three types: categorical, progressive, and scattered labeling. This method is described in Fig. '•. The results of these two procedures

are given in Table III (for penalty scores) and in Table IV (for response types distribution). An analysis of variance was performed

on these two sets of data and its,

J. Acoust. Soc. Am., Vol. 64, No. 3, September1978

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 132.174.255.116 On: Mon, 22 Dec 2014 10:44:28

747

Claude Simon:Ontheuseof comfortable listening levels

747 III.

AND

FIRST

n

STIMULI

GET

75% LABEL

LAST

n

STIMULI

GET

75% LABEL

DISCUSSION

AND

CONCLUDING

REMARKS

Subjects give predominantly categorical responses at comfortable levels of presentation and mostly scattered and progressive labeling at low levels (see also

1

Table IV).

Neither penalty scores nor responses types,

however, change significantly between presentation levels above the lowest level, say between 30 and 100 dB

YES

SPL (see Table VI). One might have expected a progressive deterioriation of labeling curves and response patINVERSIONS

terns as the level was progressively decreased. Instead one witnesses a sudden breakdown of subjects' labeling behavior at or around 30 dB SPL. At rms lev-

?

els of 15 and 25 dB SPL, the crucial cues may not have been audible, since they must have been well below 15 dB--although this was not specifically measured. The relevant feature in the VOT-pattern stimuli (onset of CATEGORY

1

2

(categorical)

voicing) will be more prominent (information contained in all three formants) than for the F2-pattern stimuli

3

(progressive)

(scattered)

in which the relevant information FIG. 7. Decision flow chart used for classifying response types. Responses type i is called "categorical" (clear categorization of stimuli and sharp switchover from one label to

the other), response type 2 is "progressive" (progressive transition from one label to the other) and response type 3 is "scattered" (no clear labeling strategy exhibited by the subject).

seems.

.outcome is shown in Table V. Penalty scores increase 'as the level of presentation decreases, significantly so

Place categories

in speech sounds tend to be

more universally constant (across languages)than categories along the voicing continuumø This could well be due to the presence of "special" neural processes in the

Ior voicing judgments ("coat.-goat" test;p

On the use of comfortable listening levels in speech experiments.

On the use of comfortable listening levels in speech experiments Claude Simona) Departmentof Phonetics and Linguistics,University CollegeLondon,GowerS...
760KB Sizes 0 Downloads 0 Views