INTELLIGIBILITY CHARACTERISTICS OF SUPERIOR ESOPHAGEAL SPEECH PRESENTED UNDER VARIOUS LEVELS OF MASKING NOISE YOSHIYUKI HORII and BERND WEINBERG Purdue University, West Lafayette, Indiana

Broad-band masking of speech was used to assess the effects that broad-band masking noise had upon the recognition of consonants and vowels produced by esophageal speakers. Procedures were developed to compare the articulation functions of superior esophageal speech with those of normal speech under comparable levels of masking noise. Within the range of speech-to-noise ratios studied, articulation functions for vowels were essentially the same for esophageal and normal talkers (4g peer dB). With respect to consonants, the intelligibility scores for esophageal speech were 12 to 14% lower than for normal speech under adverse noise conditions. Gains in the consonant articulation functions were 2.5g/dB and 4%/dB for normal and esophageal talkers, respectively. For adverse noise conditions, the lowered consonant scores for esophageal speakers were the result of poorer than normal intelligibility for liquid-glides and nasal and, secondarily, for stop consonants. Additional differences between the intelligibility characteristics of esophageal and normal speech were found in word-position and voicing features. Broad-band masking noise is known to have a differential effect upon the recognition of speech sounds (Horii, House, and Hughes, 1971; Miller and Nicely, 1955). In this context, esophageal speakers frequently assert that the intelligibility of their speech is adversely affected by noise. The manner in which esophageal speakers raise this assertion makes it apparent that they assume (1) that their esophageal speech is different from normal speech, and (2) that, as a consequence of these physical differences, esophageal speech is more vulnerable to deterioration by noise than normal speech produced with a laryngeal sound source. 4 The obiective of the present work was to measure the intelligibility of superior esophageal speech as a function of adverse listening conditions, that is, under various levels of masking noise. It was assumed that the broad-band masking of speech would provide an incisive approach to begin looking a t how esophageal speech is different from normal speech. In addition, it was assumed that the analysis of listeners' responses to recorded esophageal speech materials being masked by continuous broad-band noise would provide an inferential asessment of the validity of the noise interference problem reported by laryngectomized patients. Thus, procedures were developed to compare 413

Downloaded From: https://jslhr.pubs.asha.org/ by a University of Texas, Austin User on 03/27/2018 Terms of Use: https://pubs.asha.org/ss/rights_and_permissions.aspx

the articulation functions of superior esophageal speech with those of normal speech under comparable levels of masking noise. METHODS

Subjects Two superior esophageal speakers, one man and one woman, provided the speech materials used in the listening tests. The talkers were highly experienced, highly proficient esophageal speakers who had used esophageal speech as a primary method of communication for over five years. Their speech was automatic, highly intelligible, free of extraneous noises, and generally pleasant to listen to. In short, their speech was among the best alaryngeal speech the authors had ever heard.

Speech Materials and Recordings The two talkers recorded six lists of a consonant rhyme test (House et al., 1965) and six lists of a vowel rhyme test (Horii, 1969). Each consonant list consisted of 50 monosyllabic words in which half had test consonants in the word-initial position, while the remaining half had test consonants in the word-final position. In this rhyme test, consonants appeared with frequencies approximately equal to those observed in actual English texts. On the other hand, each vowel list consisted of 24 monosyllabic words. In the vowel rhyme test, 12 different vowels appeared twice in each list.

Listeners and Listening Procedures The listeners were 16 young adults (college students) who were paid for their services. Each listener passed a discrete frequency audiometrie screening test at a hearing level of 15 dB (ANSI, 1969). Listeners were unfamiliar with alaryngeal speech and were not experienced or trained in psychoacoustic listening procedures. The recorded word lists were delivered binaurally using a high-quality tape system and matched earphones (Grason-Stadler, Model TDH-39 with Zwislocki-type cushions). The entire experiment was conducted over a period of fifteen, 50-minute, daily sessions and all listening was done in a quiet room furnished with individual stations. In each session, 12 word lists (six consonant lists and six vowel lists) were presented at four different signal-to-noise ratios. The stimulus materials were appropriately counterbalanced with respect to list order, speaker order, and signal-to-noise ratios in order to preclude listeners from learning the order of test items and to enhance listeners' attention to the listening task. The signal levels of each test list were determined by playing the test recordings into a Bruel and Kjaer graphic level recorder (Model 2305) and 414 Ioumal of Speech and Hearing Research

18 413-419 1975

Downloaded From: https://jslhr.pubs.asha.org/ by a University of Texas, Austin User on 03/27/2018 Terms of Use: https://pubs.asha.org/ss/rights_and_permissions.aspx

measuring the level of the vocalic maxima of each word relative to a 1000-Hz reference signal. The speech level of each word list was defined operationally as the mean level of the vocalic nuclei of the words in each list. The test tapes were prepared so that the average speech level of word lists, that is, the mean level of the vocalic nuclei of the words in each test list, was 65 dB SPL under the earphones. For the vowel tests, the signal-to-noise ratios employed were - 9 dB, - 5 dB, and - 1 dB. The S/N ratios for the consonant tests were - 9 dB, - 5 dB, - 1 dB, and + 3 dB. All consonant and vowel lists recorded by each of the two speakers were presented once under each S/N condition. In addition, all lists were presented in a clear condition-that is, without any masking noise. In the case of the tests administered with a masking noise, the noise was not recorded for test administration, but was produced by a standard white-noise generator (Grason-Stadler, Model 455C) and mixed with the speech at the time of testing. Thus, the esophageal speech word lists were masked by a continuous white noise. Both the noise and speech were essentially low-passed with a cutoff frequency near 8000 Hz, the upper frequency response limit of the earphones. In these tests, a S/N ratio of 0 dB was established by setting the level of the reference tone preceding each word list to a level equal to the average level of the speech. Thus, equal levels of the reference tone and the corresponding noise defined a S/N ratio of 0 dB. Other values of S/N ratio were obtained by varying the noise levels with appropriately placed attenuators. A closed-set response strategy was used in the listening experiment (House et al., 1965). Specifically, listeners were provided answer sheets containing 50 six-word ensembles for each consonant list and 24 six-word ensembles for each vowel list. The listeners' task was to identify each stimulus word from a six-word response set appearing on his answer sheet. The specific type of speech materials and response format was selected because it permitted the use of untrained listeners, provided stable scores after repeated exposure, reduced the effect of familiarity of test words, and substantially minimized the problems associated with an indeterminate response set (House et al., 1965). All of the S/N ratios were identical with those used by Horii et al. (1971), who recently described intelligibility functions for normal speech masked by an identical continuous noise. Because a fundamental objective of this project was to compare intelligibility functions of superior esophageal speech with those of normal speech, the test materials, response format, transmission equipment, listening conditions, levels of masking, masking signal, and nature of specifying speech levels were identical with those of Horii et al. (1971). R E S U L T S AND D I S C U S S I O N

Talker Equivalence An initial analysis of the equivalence between the two talkers showed that Horm, WEINBEI~G:EsophagealSpeech 415

Downloaded From: https://jslhr.pubs.asha.org/ by a University of Texas, Austin User on 03/27/2018 Terms of Use: https://pubs.asha.org/ss/rights_and_permissions.aspx

there were no significant differences between the average intelligibility scores of the two talkers over the range of S / N ratios employed. By way of example, Figure 1 provides a comparison of the consonant articulation scores for each of the two talkers as a function of masking conditions and attests to the equivalence of their speech in terms of average intelligibility characteristics. Similar results were obtained when the talkers' average values for vowel test materials were compared. Thus, all information to be reported reflects the pooling of talker data.

I00

I00 '

VO

'

'

.....

'"

W BO Z 0

7

$

. ~k "-'~

O-6O

Q. 6O

U) W

w

W

O: 4 0

~ 4o

nF- 40

~

n- 20

I.u W I1~ 2 0 0

u

r..) LU

~o

0 i

4

-~ S/N

FI(;trRE

-~

'+~"'ct.

RATIO (dB)

1.

Consonant

articulation functions for Talker 1 ( o ) and Talker 2 ( 9 ). CL refers to clear (no noise) condition.

0

'

-~

i

i

-5

-,

S/N

FIGURE

1...

i

nO U

0

.3"'c-

I

-9

RATIO (dB)

2.

Vowel

( o ) and consonant ( a )articulation runetions for esophageal talkers.

I

i

-~

-,

S/N

RATIO

I

+3 {dB}

FXGU~ 3. Articulation functions for normal (dashed lines) and esophageal talkers (solid lines). Vowel scores are designated with 9 , o and consonant scores are designated 9

~

~

.

Artictdation Functions [or Esophageal Speech The general results of the listening tests are summarized in Figure 2, where the percentage of correct responses to the consonant and vowel test lists is plotted as a function of S/N ratio. Each data point represents an average of 9600 responses for the consonants (300 words X 2 talkers X 16 listeners) and 4600 responses for the vowels. These results show that, within the range of S / N ratios employed, the consonant functions for esophageal speech have a slope of about 5~/dB, while the vowel curves have a slope of about 4~/dB. The intelligibility of rhyme-test words heard in the quiet was high-about 98~g for vowels and 94~; for consonants-affirming to at least one dimension the superior proficiency of the two esophageal talkers. For each S / N condition, the intelligibility of vowels was considerably higher than that for consonants. This increase in intelligibility for vowels was expected because the S / N ratios are more realistically viewed as vowel-to-noise ratios. Since the intensity levels of consonants produced by esophageal speakers are, on the average, presumed to be considerably lower than those for vowels, consonants received more masking at a nominal S/N ratio. 416 1ournalof Speech and Hearing Research

18 413-419

Downloaded From: https://jslhr.pubs.asha.org/ by a University of Texas, Austin User on 03/27/2018 Terms of Use: https://pubs.asha.org/ss/rights_and_permissions.aspx

1975

Comparisons between Normal and Esophageal Speech The intelligibility functions for esophageal speech are compared with those for normal speech in Figure 3. The data for normal talkers come from previous work obtained under identical experimental conditions (see Figure 4, Horii et al., 1971). In this figure each data point for normal speech is based on 13,200 responses for consonants (22 listeners, two talkers, and six lists of 50 words each) and 6336 responses for the vowels. Overall intelligibility functions for vowels are essentially the same for esophageal and normal speech. The vowel functions for both types of speech have a slope of 4%/dB. With respect to consonants, the intelligibility scores for esophageal speech are lower than for normal speech under the more adverse S/N conditions. The slope of the consonant function for esophageal speakers is about 5%/dB, while for normal speakers the slope is 2.5%/dB. Thus, the consonant data lend support to the clinical hypothesis that the consonant discriminability and, by inference, the intelligibility of speech produced by highly proficient esophageal speakers is reduced in adverse noise conditions. The differences in average consonant intelligibility between esophageal and normal speech were about 12 to 14% under the two ( - 9 and - 5 dB S/N) highest noise conditions. Additional comparisons were made to identify some of the factors that led to the reduction in esophageal consonant intelligibility under adverse noise conditions. For example, Figures 4a-d provide individual comparisons of the intelligibility scores for normal and esophageal speakers as a function of four classes of consonants: namely, liquid-glides, nasals, stops, and fricatives. Under adverse noise conditions, the lower consonant scores for esophageal speakers were largely the result of reduced intelligibility of liquid-glides, nasals, and secondarily, stops. The differences between normal and esophageal speech were particularly large for liquid-glides and nasals. Of interest was the observation that the intelligibility scores for fricatives produced by esophageal and normal talkers were nearly identical under adverse S/N conditions. Earlier studies of the intelligibility of normal speech being masked by a continuous noise have demonstrated that the intelligibility of various classes of consonants is not equal (Miller and Nicely, 1955; Horii et al., 1971). Figure 5a illustrates the articulation functions for liquid-glides, nasal, plosive, and fricative consonants of normal speech (from Horii et al., 1971). In general, these data emphasize the inequality of intelligibility among consonants masked by a continuous noise. The rank ordering of consonants, from high to low, was liquid-glides, nasals, stops, and fricatives. By contrast, Figure 5b illustrates the articulation functions for these same four classes of consonants produced by esophageal speakers. In this case, large differences in intelligibility as function of differing consonant classes are not present. Rather, highly similar articulation functions were obtained for liquid-glides, nasal, plosive, and fricative consonants produced by esophageal speakers. Additional comparisons uncovered other important differences between inHORII, WEINBERG:EsophagealSpeech 417

Downloaded From: https://jslhr.pubs.asha.org/ by a University of Texas, Austin User on 03/27/2018 Terms of Use: https://pubs.asha.org/ss/rights_and_permissions.aspx

telligibility characteristics of normal and esophageal speech. For example, the intelligibility functions for word-initial and word-final consonants of esophageal and normal speech are compared in Figure 6. For normal speech, consonants in the word-initial position were consistently more intelligible than word-final consonants. For esophageal speech, intelligibility of word-initial and word-final consonants were essentially equivalent. Secondly, voiced consonants of normal speakers were, on the average, much more intelligible than voiceless consonants (overall averages were 69 vs 47~). For esophageal speakers, the voiced consonant advantage was minimal (54 vs 51~;).

g,oo

a. liquid, glides

b. n a s a l 9

c. s l o p s

d. f r t c a l l v e s

.oF

IJIJU~ 601 n,-Z

-n-O 40[ - 9 - 5 - I

+3

SIN

-9-5

-

', + '3

..,

,

- 9 - 5 - I

~3

RATIO (dB)

FzctraE 4 a-d. Articulation fnnctions for liquid-glide, nasal, stop, and fricative consonants; open circles ( o ) represent average intelligibility scores of esophageal talkers, while closed circles ( 9 ) represent normal talkers' scores. A I00

IOO

h. 8 0 C/)

i

i

i

e..7, ILl 8 0

80

i .,-0

Z

0 60 w n,,

bl

I,- 4 0 u kl

o Normal

0 u I o' -9

.i -5

i

-I

i -~3

S/N

40

I-- 4 O 0

20

~ zo

o

E 0 U

-9

RATIO

-5

-I

t-3

(dB)

FIGVRE 5a, b. Articulation functions for liquidglide( x ) , n a s a l ( o ) , s t o p ( ~ ),and fricative ( c~ ) consonants produced by normal (Figure a) and esophageal (-Figure b) talkers.

0

z

i

-9

-5

S/N

RATIO

[I

I,,

-

+3

(dB)

FxGtrm~ 6. Articulation functions for word-initial (open and filled circles) and word-final (open and filled triangles) consonants. Norreal talkers (dashed lines) and esophageal talkers (solid lines).

In general, previous research and theory suggest that the intelligibility of consonants being masked by continuous noise varies as a function of the inherent power among the consonant sounds (see Miller and Nicely, 1955; Horii et al., 1971; and Figure 5a). If, on the other hand, a continuous noise masker did not differentially affect consonant intelligibility, one would expect the average intelligibility scores of differing classes of consonants to be 418

lournal of Speech and Hearing Research

18 413-419

Downloaded From: https://jslhr.pubs.asha.org/ by a University of Texas, Austin User on 03/27/2018 Terms of Use: https://pubs.asha.org/ss/rights_and_permissions.aspx

1975

homogeneous. For consonants produced by esophageal speakers, the latter appears to be the case (see Figure 5b, for example). Unfortunately, little is known about the inherent power or intensity characteristics of esophageal speech produced by laryngectomized talkers, although several authors have commented about the general restriction in intensity variation and overall reduction in average speech level associated with esophageal speech production (Drummond, 1965; Hyman, 1955; Diedrich, 1968). Finally, it is important to note that in tests using a continuous noise masker, the intelligibility contributions of such diverse cues as consonantal spectra and vocalic transitions cannot easily be assessed as S/N ratios are manipulated (Horii et al., 1971). Unfortunately, here also little is known. Specifically, there is no information about such cues as esophageal consonant burst features, and formant transitions, and so forth. Until such information becomes available, definitive statements cannot be made about the specific factors underlying the increased vulnerability of consonant intelligibility of esophageal speech under adverse conditions. To this end, efforts are currently underway in our laboratory to obtain information about these important physical properties of esophageal speech. ACKNOWLEDGMENT This investigation was supported in part by research grants from the United Health Foundation, Elkhart, Indiana; Little Red Door, Inc., Marion County (Indiana) Cancer Society; and Purdue Research Foundation. We thank James C. Shanks, Indiana University Medical Center, William Cooper, and Raymond Daniloff, Purdue University, for their valuable contributions. Reprint requests should be sent to the authors, Department of Audiology and Speech Sciences, Purdue University, West Lafayette, Indiana 47907. REFERENCES DIEDRICH, W. M., The mechanism of esophageal speech. Sound Production in Man. New York: Annals of the New York Academy of Sciences, 155, 303-317 (1968). DRUMMOND, S., The effects of environmental noise on pseudovoice after Iaryngectomy. J. Laryng., 79, 193-202 (1965). Homi, Y., Specifying the speech-to-noise ratio: Development and evaluation of a noise with speech-envelope characteristics. Doctoral dissertation, Purdue Univ. (1969). HORa, Y., HousE, A. S., and HUGHES, G. W., A masking noise with speech envelope characteristics for studying intelligibility. 1. acoust. Soc. Amer., 49, 1849-1856 (1971). HousE, A. S., WILLIAMS,C. E., HECK~.R,M. H. L., and KRY'rER,K., Articulation testing methods: Consonantal differentiation with a closed-response set. I. acoust. Soc. Amer., 37, 158-166 (1965). HYMAN, M., An experimental study of artificial larynx and esophageal speech. I. Speech Hear. Dis., 20, 291-299 (1955). MILLER, G. A., and NICELY, P. E., An analysis of perceptual confusions among English consonants. I. acoust. Soc. Amer., 27, 338-352 (1955).

Received March 15, 1974. Accepted January 20, 1975.

Homi, WEmBF.aG: Esophageal Speech 419

Downloaded From: https://jslhr.pubs.asha.org/ by a University of Texas, Austin User on 03/27/2018 Terms of Use: https://pubs.asha.org/ss/rights_and_permissions.aspx

Intelligibility characteristics of superior esophageal speech presented under various levels of masking noise.

Broad-band masking of speech was used to assess the effects that broad-band masking noise had upon the recognition of consonants and vowels produced b...
554KB Sizes 0 Downloads 0 Views