Phonetica 32: 254-263 (1975)

Variation in Lingual Coarticulation at Certain Juncture Boundaries1 L o - S o u n S u , R aym ond D a n il o ff a n d R o b e r t H a m m a r b e r g 2* 4

IBM System Products Division, Hopewell Junction, N.Y. and Department of Audiology and Speech Sciences, Purdue University, West Lafayette, Ind.

Abstract. The acoustic spectra of [m] produced by speakers of American English in various [mV] contexts was studied for coarticulatory variations caused by the [vowel] under conditions of varying juncture boundaries occurring between [m] and [vowel]. Dispersion analysis was used to estimate the coarticulatory variation in [m] spectra. Low level junctures marked by short pauses do not disrupt nasal-vowel coarticulation. However, higher level junctures such as between extraposed clauses or phrases and the body of an utterance are more often than not marked with long pauses, and concomitant reduction of nasal-vowel coarticulation.

Introduction

1 Research supported in part by the Air Force Office of Scientific Research, Air Force Systems Command, USAF, under Grant No. AFOSR 69-1776. The United States Govern­ ment is authorized to reproduce and distribute reprints for Governmental purposes not­ withstanding any copyright notation hereon. 4 The authors wish to express their appreciation to Professor K. S. Fu and Professor K. P. Li for their valuable suggestions and helpful discussions.

Downloaded by: King's College London 137.73.144.138 - 3/7/2018 7:50:14 AM

Research indicates that coarticulation occurs across certain juncture boundaries. If junctures are marked by relatively long pauses, co­ articulation across the juncture may well be reduced or absent. Juncture boundaries are syntactic elements which productively can be marked by long pauses. Whether a juncture is marked by pause is a performance phenomenon, dependent on the speaker’s articulation behavior. This study explored the relationship between coarticulation, some juncture types, and pause intervals. Our technique made use of an acoustic analysis of nasal spectra developed by Su et al. [1974]. It is a well-known phenomenon [F u jim u r a , 1962; F u jim u ra and L indquist , 1971; H e f f n e r , 1969;

S u /D a n ii .o f f /H ammarbkrg

255

1960] that during the production of the labial-nasal consonant [m] in English, the tongue is usually in the position for a following vowel even while [m] is being articulated. Thus, the emitted nasal sound demonstrates coarticulatory variations in its spectrum. In this study, sentences containing [m] preceding either a front or a back vowel were devised so that junctures (constituent boundaries) of several types fell between [m] and the following vowel. The nasal consonant was then analyzed for spectral shifts indicative of change in coarticulatory pattern.

J ones ,

Materials and Methods

Acoustic Analysis Rationale. The transfer function of an [m] segment usually consists of poles and zeroes. The poles are formed mainly by the cascaded pharyngeal and nasal cavities. The zeroes are formed by the closed oral cavity which functions as a side-branch resonator [F lanagan, 1965]. Studying the pole-zero patterns of nasal consonants, F ujimura [1962] found the frequencies of the four formants for [n] in the main frequency range (below 3 kHz) to be typically 350, 1,050, 1,960, and 2,950 Hz. The spectrum of [m] is obtained by replacing the second formant of [n] by a cluster of two formants and an antiformant which is located between 750 and 1,250 Hz. The coarticulation under study is attributable to tongue-

Downloaded by: King's College London 137.73.144.138 - 3/7/2018 7:50:14 AM

Procedures Subjects. Subjects were 10 males ages 24-38. Each liad normal hearing in at least one ear, was free of oro-facial defects and spoke a misarticulation-free dialect of Eastern or Midwest American English. Each was seated in a sound-treated chamber and was instructed to read first a list of single words and then a list of sentences from printed cards. Subjects were instructed to read at a natural and comfortable loudness, rate, and pitch level. Equipment and speech materials. Recordings were made with a Bruel and Kjaer 0.5-in condenser microphone and complement on full track tape using an Ampex AG-350 recorder at a measured S/N (rms) of better than 40 dB. The disyllabic tokens, [ha 'mid] and [ha 'mod], were spoken four times by each speaker. Coarticulation of vowel and [m] was expected to be at a maximum within such a word since no juncture boundary was present between [m] and the following vowel. Previous work by Su et at. [1974] demonstrated large differences in [m] spectra attributable to coarticulation with the close, spread, front vowel [i] and the mid, rounded, back vowel [o], respectively. The sentences listed in table I were spoken three times by each speaker. They contain the [m] + [i] and [m] + [o] sequences embedded in sentences so that various junctures occur at the transword boundary between [m] and [vowel]. In order of ascending level of juncture, sentence 1 presents a juncture between modifier and head. The juncture of sentence 2, as spoken, was similar to that of sentence 1. Sentence 3 represents a noun phrase-verb phrase juncture. Sentences 4-5 present a juncture between extraposed pre­ positional phrases, and sentences 6-7 present junctures between extraposed clauses. The junctures in sentences 4—5, 6—7 are major constituent boundaries, and it was expected that if speakers were to pause, these junctures would show the longest pause times and the greatest alteration of coarticulation.

256

S u /D a n ii .o f f /H am .m arberg

Table I. Sentences spoken by subjects 1 2 3 4 5 6 7

Graham £aly and Clem Ackley murdered William Oakley Some alligators, some eels, and some ocean perch were caught today Some eat to live, some act in spite, and some obey blindly To the Moslem, evil is smoking and drink Before autumn, open the fireplace grate If you want some, open the box and get it If you have a spasm, ease up on the medication

position-related modification of the oral cavity which affects the formant-antiformant cluster of the [m] spectrum most. Therefore, the frequency range from approximately 250-3,681 Hz should be enough to characterize the [m] spectra. Acoustic analysis. The speech signals were fed into a filter bank (range 250-10 kHz) and sampled every 10 msec. The sampled data (time samples) were then digitized into a 35-dimensional vector using a 10-bit A/D converter [Su, 1973], The speech signals passed through a spectral shaping circuit before filtering. The shaping had a fiat frequency response above 2,200 Hz and a + 6 dB/oct asymptotic slope below that frequency. The outputs of filters 1-25 (range 250-3,661 Hz) were changed to decibel values and were used since they cover the main frequency range for [m] spectra. However, the entire spectrum of the utterances (250-10 kHz) was printed out by computer, and the spectral samples of [m] were manually selected for analysis. Through experimentation, we found that a minimum of six time-samples of [m] spectra from each repetition of [m] provided a stable mean spectrum (250-3,681 Hz). As a result of this criterion, four of our ten speakers were discarded from data analysis because several of their sentences displayed [m] segments of such short duration that the necessary six time-samples of [m] spectra were unavailable. For the purposes of this study, [i] was considered to be an adequate representative of the class of front vowels, [o] of back vowels; samples of [m] extracted from [mi] context were designated as [mr], and samples of [m] from [mo] contexts were grouped as [ m j. Our examination of how coarticulation of [m] with vowels affects the spectra of [m], involved the use of dispersion analysis on the spectral data [W ilks, 1962; Li et al., 1969]. Dispersion analysis is a feature selection technique based upon the clustering property of samples. Eigen-vectors are calculated to represent a new transformational space such that the dispersion of samples in the new space is ordered corresponding to the order of eigen­ vectors. The dispersion is greatest along the first eigen-vector (EV1), second along EV2, etc. Projections of samples on the first few eigen-vectors normally reveal clustering properties of samples. In order to visualize the clustering properties, Su et al. [1974] projected spectral samples of [m] on the first two eigen-vectors calculated using nasal spectral samples extracted from isolated /ha ’mVd/ utterances. They found that spectral samples of [mr] and of [mu] formed two distinct clusters on EV1-EV2 plane and clearly revealed the strong coarticulation characteristics between [m] and the following vowels. The digital spectrograms were examined to determine sentence and word durations. The presence of an acoustic pause, if any, was measured from the digital spectrogram which contained an associated power level read-out. A pause was defined as an interval between [m] and [vowel] in which there were spectral shifts and a decline in power level of ap­ proximately 5-10 dB from the average level of [m] until it rose to within 5-10 dB of the average power level of the following vowel.

Downloaded by: King's College London 137.73.144.138 - 3/7/2018 7:50:14 AM

Nasal-vowel sequences in italics.

Coarticulation at Juncture Boundaries

257

Coarticulation Measure In order to perform the quantitative analysis, we define: Ymt = 25-dimensional mean spectra of [mr] time samples = (Ÿmf [1], Ÿmt [2]......ÿmf [25])

(1)

Y mb = 25-dimensional mean spectra of [mb] time samples = (ymb [1], ymb [2], ymb [25])

(2)

whereas, ÿmt (k) = the kth component of the vector Ymt

(3)

ÿmb (k) = the kth component of the vector Ymb-

(4)

Then the Euclidean distance between Ymt and Ymb can be expressed by distance (Ymr, Y mb) = ! E [ymr (k) - ymb (k)]2. k= 1

(5)

By calculating distance (Ymt, Ymb) under each condition for each speaker, we obtained an estimate of differences in the [m] spectra attributable mainly to coarticulation with a front vowel and with a back vowel.

Previous work, Su et al. [1974], demonstrated that the spectra of [m] followed by front vowels and by back vowels, [mt] and [mb], formed two distinct clusters. For a given speaker, the Euclidean distance ( Y m r, Y m t>) between the two clustering centers represents both differences in vocal tract shape unique to the speaker, and differences in coarticulatory behavior. Since nasal tract character­ istics are similar in (m+Vf) and (m+Vb) utterances, the distance (Ymr,Ymb) represents, primarily, coarticulatory difference in [m] spectra attributable to front versus back vowels. The mean distance ( Y mr, Y m b) values for all six subjects are as follows: 35.62, 32.30, 50.61, 43.98, 35.46, 44.80 (mean 40.46; SD 6.46). We must also note at this point that the distance measure will reflect differences in [m] spectra attributable to vocal frequency and intensity. However, careful examination of our data reveals that these parameters produce changes in the distance measure which are much smaller than those associated with changes in vocal tract shape caused by coarticulatory movements of the tongue. In order to measure gross changes in coarticulation caused by the junctures present in the sentences, distance ( Y m f , Y m b ) intraword for four repetitions of [ha ’mid] and [ha ’mod] was calculated. Distance

Downloaded by: King's College London 137.73.144.138 - 3/7/2018 7:50:14 AM

Results

258

S u /D a n ilo f f /H ammarberg

Table 11. Change in coarticulation in Euclidean distance between nasal spectrum vectors associated with juncture boundaries Sentences

1 2 3 4-5 6-7

Subjects 1 2

3

4

7.70 6.25 1.71 1.51 - 5.97 2.07 -13.20 5.76 -24.74 -8.48

- 2.94 - 3.35 - 2.05 -24.48 -38.64

-19.0 1.75 -24.4 -1.68 0.68 4.57 -25.18 -9.86 -18.94 2.16

5

Mean

SD

- 3.03 - 5.96 - 1.65 -15.55 -17.73

9.64 9.06 4.99 11.41 12.70

6 -11.94 - 9.54 -10.19 -26.35 -17.73

Table 111. Pause intervals (msec) between [m] and [vowel] in the sentences spoken Sentences

Subjects 1 2

Mean 4

3

1

[m + i] [m + o] mean

13 36 24.5

30 40 35

0 53 26.5

2

[m + i] [m + o] mean

23 20 23

36 20 28

16 56 36

3

[m + i] [m + o] mean

36 0 18

4-5

[m + i] [m + o] mean

330 295 312.5

10 0 5 546 693 619.5

6-7

[m + i] [m + o] mean

230 265 247.5

643 620 631.5

13 0 6.5 380 540 460 500 560 530

5

SD

6

16 0 8 0 56 28

0 0 0 0 0 0

0 26 13 0 0 0

0 0 0

0 0 0

10 0 5

183 93 138

30 333 181.5

226 113 169.5

73 116 94.5

23 63 43 80 96 88

17.83

5.95

19.2

7.03

5.75

3.01

292.4

98.7

293.5

105.96

(Y mf, Ÿ mb) intraword represented coarticulation in the intraword context where juncture effects were assumed to be at a minimum. Distance (Ÿmf, Ÿmb) transword calculated over three repetitions of each sentence represented coarticulation occurring within the sentences where junctures occurred at the transword boundary between [m] + [vowel]. The change in coarticulation caused by juncture within the sentences was estimated as: V mb)tr ans wor d

~ distance

(Y m f ? ^ m b ) in tr f t w o r d Downloaded by: King's College London 137.73.144.138 - 3/7/2018 7:50:14 AM

change in coarticulation = distance (Vmt,

Coarticulation at Juncture Boundaries

259

These data are presented in table II for the six subjects. Table III lists the pause-interval durations which occurred between the [m] and the following vowel. Figures 1 and 2 present the mean and standard deviations of pause-interval durations and change scores for the six subjects. A positive change score indicates that the Euclidean distance between (Ÿmt) and (Ÿmt,) is greater for the [m] -j- [vowel] sequence in sentences than for the sequence in an intraword [ha ’mVd] context. A negative change score indicates greater distance between (Ÿmr) and (Y mb) in an intraword context than in a sentential, transword context. A positive change score indicates more coarticulation in the sentence or less coarticulation in the intraword [m] + [V] sequence, and a negative change score, vice versa. It was our expectation that the rather low level junctures (consti­ tuent boundaries) in sentences 1, 2, and 3 would be marked with little pause, and hence coarticulation in these sentences should approximate that within a word. With the exception of subject 4, the change scores, whether positive or negative, were generally smaller than the change scores for sentences 4-5 and 6-7. All but two of the 14 change scores for sentences 4-5 and 6-7 were negative, indicating that the juncture in these two sentences were effective in reducing coarticulation. Observe that for all subjects, either sentences 4—5 or 6-7 showed the greatest reduction in coarticulation (largest negative change score) of any of the five sentence types. The mean data reveal a negative change score (reduction in coarticulation) for all sentence types with a sharp increase registered for sentences 4—5 and 6-7. The pause duration data reveal that the pause intervals for sentences 4—5 and 6-7 were greater than pauses in sentences 1, 2, and 3. The mean data reveal that pause durations for sentences 4-5 and 6-7 were roughly twenty times as great as for sentences 1, 2, and 3. It would appear that pause duration and The data for subject 4, especially, indicate that short pauses do not preclude a reduction in coarticulation. His data revealed an amount of interword coarticulation which was comparable to the other subjects, so that his rather large negative change scores for sentences 1 and 2 can probably be interpreted as a reduction in coarticulation at junctures marked by short pauses. The fact that 16 of 21 change scores for sentences 1, 2, and 3 were either positive or rather small indicates that short pauses at these junctional boundaries need not inhibit coarticulation to an appreciable degree.

Downloaded by: King's College London 137.73.144.138 - 3/7/2018 7:50:14 AM

n asal-v o w el c o a rtic u la tio n w ith in th e sentences a re inversely re la te d .

260

S u /D a nii .o f f / H ammarberg

(0

0£> 300-

Sentence number

Sentence number

Fig. 1. Mean and standard deviations of pause interval between [m] and [vowel] within sentences conditioned by presence or absence of a juncture between [m] and [vowel]. Data based on results for six speakers. Fig. 2. Change scores and standard deviations in Euclidean distance between [m] spectra coarticulated with front and back vowels conditioned by presence or absence of a juncture between [m] and [vowel]. Data based on results for six speakers.

It should be remarked upon at this point that the amount of co­ articulation of a given type present in any speech sample is highly speaker dependent. Examples of strong speaker dependence are seen in the data of K e n t et al. [1974] and D a n il o f f and M o l l [1968J for velar and labial coarticulation, respectively. Such differences account not only for the variation in intraword coarticulation seen in the dis­ tance measures, but can be presumed to be operative in the reductions in coarticulation for sentences 1 and 2 visible in the change scores, table II, for subjects 4 and 6. Nevertheless, there is enough consistency in the data to suggest that [mV] lingual coarticulation is a stable phenomenon, useful enough for identifying speakers individually [Su et al., 1974] and for marking junctural boundaries.

Su et al. [1974], concluded that ‘the acoustic clue caused by the coarticulation between [m] and the following vowel is strongly speaker dependent’ and that a measure of the coarticulatory relationship be­ tween [m] and a following vowel could thus be an efficient acoustic clue in speaker identification. The results of the present study indicate that such measurements ought to be taken at points where only a lower

Downloaded by: King's College London 137.73.144.138 - 3/7/2018 7:50:14 AM

Discussion

261

level juncture boundary, or no boundary at all, intervenes between the [m] and the vowel. We are here operating under the assumption that the function of coarticulation is to transform a sequence of discrete entities into a continuum by smoothing out the transitions between the phonetic segments generated by the phonological component of the grammar [D a n il o ff and H a m m a rberg , 1973]. If this assumption is correct, it should be the case that coarticulation could be dispensed with across pauses, i.e. at breaks in the speech continuum, since no smoothing out should be necessary under such circumstances. Ideally, pauses in speech should be correlated with major gram­ matical constituent boundaries, with the size of the pause further being correlated with the rank of the boundary. This derives from the commonsensical assumption that between segments within a word there would be no pauses, whereas a pause is to be expected in the juncture between two sentences. If we add to this our assumptions about the correlation between coarticulation and pauses, we are led to expect massive coarticulation where the boundary between the [m] and the following vowel is of low rank or absent, and, conversely, that at higher-ranking boundaries there should be less coarticulation. The results of our study, as well as the results of previous studies support this hypothesis. It has been found that ordinary word bound­ aries such as the ones found in sentences 1, 2, and 3 do not hinder coarticulation of lip rounding [D a n ilo ff and M o l l , 1968], nor coarticulation of velar movements [M c C l e a n , 1973; M o ll and D a n il o f f , 1970]. At higher ranking boundaries, however, the co­ articulation of velar movements has been found to be greatly reduced [ M o ll and D a n il o f f , 1970]. Our results extend these findings to the cases of lingual coarticulation between an [m] and a succeeding vowel. Our results show that at the higher level boundaries, as in sentences 4-5 and 6-7, the pauses were the longest and the degree of coarticula­ tion the least. In sentences 1, 2, and 3 we observed both more co­ articulation and shorter pauses. Voice print spectrograms and careful listening indicated that the junctures in sentences 1, 2, and 3 rarely were marked by glottal stops, but more often by vocal fry phonation, irregular pitch periods, and small silent intervals. The relationship between coarticulation and grammatical bound­ aries would appear to be an indirect one. The grammar dictates the placement of boundaries, and hence where pauses in speech may

Downloaded by: King's College London 137.73.144.138 - 3/7/2018 7:50:14 AM

Coarticulation at Juncture Boundaries

262

S u /D a n ilo f f /H ammarberg

occur. Whether these pauses are then performed (i.e. realized) or not, depends on other factors, none of which we have any knowledge of at the present time. Similarly, the performance of a pause indicates a situation where coarticulation may (but not must) be dispensed with. But whether coarticulation occurs or not, is again determined by factors unknown. The realization of grammatical junctures may take various forms. There may be no distinct manifestation of the juncture at all, such as in the case of the lower level boundaries. In connection with the higher level boundaries one may find a pause (silent interval), an alteration of the mode of phonation (fry phonation, irregular period of phonation, double periods, etc.), a change of articulatory targets (allophonic variation), and, finally, an interruption of coarticulation. These phenomena are not mutually exclusive, but may occur in certain combinations at certain levels of juncture [L e h is t e , 1960; G a r d in g , 1971; C h r is t ie , 1974]. It may be that the mode of manifest­ ing boundaries is highly idiosyncratic, i.e. idiolectal, and hence beyond generalizations about details. The reduction of coarticulation could thus be one of the options available to speakers in manifesting higher level grammatical boundaries in speech.

Zusammenfassung Variabilität lingualer Koarticulation an bestimmten Junkturgrenzen

Die von Sprechern des amerikanischen Englisch erhaltenen akustischen Spektren von [m] in verschiedenen [mV]-Kontexten wurden hinsichtlich koartikulatorischer Variabilität untersucht, wie sie durch das [V] bei unterschiedlichen Junkturgrenzen zwischen [m] und [V] auftritt. Die Dispersionsanalyse wurde zur Bewertung der koartikulatorischen Va­ riabilität in [m]-Spektren herangezogen. Junkturen niedriger Stufe, markiert durch kurze Pausen, unterbrechen die Nasal-Vokal-Koartikulation nicht. Junkturen hoher Ordnung, wie zwischen Parenthesen und dem Hauptteil einer Äußerung, sind häufiger durch lange Pausen markiert und durch damit einhergehende Verringerung der Nasal-Vokal-Koartiku­ lation.

Résumé

On a étudié les spectres acoustiques de /m/ en anglais américain dans différents con­ textes du type /mV/ où variaient les limites de jointure entre /m/ et /V/. L’analyse de la dispersion fut utilisée pour estimer la variation de la coarticulation dans les spectres. Les

Downloaded by: King's College London 137.73.144.138 - 3/7/2018 7:50:14 AM

Variations de la coarticulation linguale à certaines limites de jointure

Coarticulation at Juncture Boundaries

263

jointures peu accusées, caractérisées par une pause courte, ne brisent pas la coarticulation nasale-voyelle. Par contre, les jointures plus marquées, telles qu’elles apparaissent entre des parenthèses et le corps d’une phrase, sont souvent caractérisées par une longue pause et une réduction concomitante de la coarticulation nasale-voyelle.

References C hristie, W. M.: Some cues for syllable juncture perception in English. J. acoust. Soc.

Am. 55: 819-821 (1974). Daniloff, R. and H ammarberg, R . : On defining coarticulation. J. Phonet. I: 239-248

(1973). Daniloff, R. and M oll, K.: Coarticulation of lip rounding. J. Speech Hear. Res. 11:

707-721 (1968). F lanagan, J. L.: Speech analysis, synthesis and perception, pp. 19-20 (Academic Press,

New York 1965). F ujimura, O.: Analysis of nasal consonants. J . acoust. Soc. 34: 1865-1875 (1962). F ujimura, O. and Lindquist, J . : Sweep-tone measurements of vocal tract characteristics.

J . acoust. Soc. Am. 40: 541-558 (1971). G arding, E.: Laryngeal boundary signals; working paper No. 4, Lund (1971). H effnf.r , R. M. S.: General phonetics; 5th ed., p. 142 (The University of Wisconsin Press,

Madison 1969). J ones, D.: An outline of English; 9th ed., pp. 168-172 phonetics (Heffner & Sons, Cam­

bridge 1960). K ent, R. D .; Carney, P. J., and Severeid, L. R . i Velar movement and timing: evaluation

of a model for binary control. J. Speech Hear. Res. 17: 470-488 (1974). Lehiste, I.: An acoustic-phonetic study of internal open juncture. Phonetica, suppl. 5

Dr. L o -Soun S u , IBM Corporation: Systems Products Division, Building 300, Hopewell Junction, MY 12533 (USA)

Downloaded by: King's College London 137.73.144.138 - 3/7/2018 7:50:14 AM

(1960). Li, K. P.; H ughes, G. W., and H ouse, A. S.: Correlation characteristics and dimension­ ality of speech spectra. J. acoust. Soc. Am. 46: 1019-1025 (1969). M cC lean, M .: Forward coarticulation of velar movements at marked junctural boundaries. J. Speech Hear. Res. 16: 288-296 (1973). M oll, K. and D aniloff, R.: Investigation of the timing of velar movements during speech. J. acoust. Soc. Am. 50: 678-684 (1970). Su, L. S.: Automatic speaker identification using nasal spectra and nasal coarticulation as acoustic clues; Ph.D. diss. Purdue (1973). Su, L. S.; Li, K. P., and Fu, K. S.: Identification of speakers by use of nasal coarticulation. J. acoust. Soc. Amer. 56: 1876-1882 (1974). W ilks, S. S.: Mathematical statistics (Wiley & Sons, New York 1962).

Variation in lingual coarticulation at certain juncture boundaries.

The acoustic spectra of [m] produced by speakers of American English in various [mV] contexts was studied for coarticulatory variations caused by the ...
931KB Sizes 0 Downloads 0 Views