ISSN 00124966, Doklady Biological Sciences, 2014, Vol. 457, pp. 219–221. © Pleiades Publishing, Ltd., 2014. Original Russian Text © N.G. Andreeva, G.A. Kulikov, 2014, published in Doklady Akademii Nauk, 2014, Vol. 457, No. 1, pp. 114–116.

PHYSIOLOGY

Acoustic Parameters of TwoFormant Vowels in Different Speech Types N. G. Andreeva† and G. A. Kulikov† Presented by Academician A.D. Nozdrachev December 26, 2013 Received January 15, 2014

DOI: 10.1134/S0012496614040012

At present time, the first formant of a vowel is established as the main characteristic of its phonetic type in speech [1, 2]. At the same time, the “formant key” that plays an important role in the speech of adults cannot be applied to the speech signals with high fundamental frequencies, such as vowels in the speech of children or singing vowels. However, according to the theory of formant ratios [3], vowels are distinguished on the basis of the ratios of formant values, rather than formant values themselves. For mant ratios are supposed to decrease or prevent the influence of age and gender differences on the acoustic characterization of vowels [4, 5]. Previously, we dis covered [6, 7] that neither the absolute values of for mants nor the ratios of their frequencies could be used as distinctive parameters to identify the vowels [a], [o], and [u] with the fundamental frequency typical of children in their early years. The frequencydepen dent relationship of the amplitudes of spectral compo nents is crucial in this case. Importantly, this feature is similar in the speech of children and adults [8]. The aim of this study was to identify the acoustic parameters of twoformant Russian vowels [i], [y], and [e] and find common phonetic characteristics of a sound in the regular (voiced) and whispering speech, i.e., disregarding the way it is spoken. Our results showed that pitchdependent relationships of the fre quencies of the first two spectral maximums are important for the perception of these vowels. In this study, the recorded sounds of speech of 45 children at the age of 3–5 years and 240 women (18–20 years old), as well as sounds of whisper of 140 men and women at the age of 18–20 years were used. Recording, spectral analysis, and evaluation of the acoustic parameters were performed as described † Deceased.

St. Petersburg State University, St. Petersburg, Russia email: [email protected]

earlier [8, 9]. The frequency, amplitudes of the first and the second formant, and spectral maximums, including the maximum of the fundamental frequency (for vocal sounds), were evaluated for each vowel. In some cases, to determine the spectral maximum suffi cient for preservation of the phonetic quality of a vowel, additional analysis was performed by suppress ing the amplitudes of its individual spectral compo nents. The vowels assigned to specific phonetic cate gories with high significance were included in the study. The analysis of women’s sounds (n = 2142, includ ing 891 [i], 511 [y], and 740 [e]) showed that the abso lute values of formant frequencies did not always cor relate with the phonetic type of a vowel: the conven tional representation of vowels on the twoformant plane showed the overlaps [i]–[y] and [y]–[e]. At the same time, the frequency ratios of the first two spectral maximums of the vowels [i], [y], and [e] changed dif ferently depending on the fundamental frequency, so these ratios could be used to distinguish these sounds. Statistical analysis showed that the correlation between fundamental frequencies and ratios of fre quencies of the first two spectral maximums was statis tically significant for each type of vowel (p < 0.01). The same results were obtained for the vowels spoken by children. The fundamental frequency range was 220– 480 Hz in children’s speech. Similar to adults, the strong correlation of these parameters was found for each phonetic type of vowels (n = 427, including 192 [i], 78 [y], and 157 [e]). In whispering vowels (n = 376, including 175 [i], 65 [y], and 136 [e]), the distribution of spectral maxi mums on a twoformant plane was shifted to higher frequencies in comparison with corresponding sounds of children and adults, which is in agreement with other studies [5, 10, 11]. To compare the parameters of the studied voiced sounds with the whispering vowels that cannot be characterized by the fundamental tone [5], the ratio of the frequencies of the first two spectral maximums

219

220

ANDREEVA, KULIKOV max2/max1 18

max2/max1 18

16 14

14

12

12

10

10

8

8

6

6

4

4

2

2

0

200

400

(b)

16

(a)

600

800

0

1000

200

400

600

800

1000

max2/max1 18 (c)

16 14 12 10 8 6 4 2 0

200

400

600

800

1000 max1 (Hz)

Fig. 1. The relationship of frequency ratios of the first two spectral maximums in (a) adult speech, (b) children’s speech, and (c) whispering sounds. The frequency of the first maximum (Hz) is on the X axis; the ratio of the first and the second maximum frequency is on the Y axis. The symbols +, 䊐, and 䉫 indicate the regions representing the vowels [i], [y], and [e], respectively; lines within each region were calculated using the distanceweighed least squares method; the ellipses in (b, c) include 95% of sounds.

(Fmax2/Fmax1) was evaluated depending on the max1 value. All the studied types of pronunciation of the studied phonemes (voiced speech of children and adults, and whisper) had the same tendency: this parameter decreased with an increase in the first max imum frequency, and this decrease was more promi nent in sounds [i] and [y] than in [e] (Figs. 1a–1c). Considering the parameters of the recorded children’s speech sounds, the vowels of women’s speech with the fundamental frequency of 220–480 Hz (n = 1633, including 666 [i], 397 [y], and 570 [e]) were included in the analysis. In the analyzed fundamental frequency range, the frequency ratios of spectral maximums of the vowels of the same phonetic type were similar in children and adults (Figs. 1a, 1b). The maximum and minimum values were 16–6 and 17–6 for [i], 9–4 and 9–5 for

[y], and 4.2–2 and 4.3–2.4 for [e] in adults and chil dren, respectively. Interestingly, this parameter was sim ilar in whispering and voiced vowels. The frequency ratios were 12.7–5.7, 6.1–3, and 4.1–2 in whispering vowels [i], [y], and [e], respectively (Fig. 1c). Thus, the vowels [i], [y], and [e] had significantly different frequency ratios of their spectral maximums (formants). However, these results are not completely corresponding to the theory of formant ratios, since this parameter changed within a range typical of each sound. Frequency ratios can be used as the acoustic characteristics of the particular phonetic groups of vowels with the regard to the pitch of a sound. Thus, whispering sounds, despite their different frequencies of spectral maximums in comparison with voiced vow els, are similar to the corresponding vowels in the speech of adults with the same pitch (Fig. 2). DOKLADY BIOLOGICAL SCIENCES

Vol. 457

2014

ACOUSTIC PARAMETERS OF TWOFORMANT VOWELS

221

max2/max1 16 14 12 10 8 6 4 2 0 100

200

300

400

500

600

700

800

900 1000 max1 (Hz)

Fig. 2. Frequency ratios of spectral maximums of the vowels [i], [y], and [e] in voiced and whispering speech. Symbols 䉫, 䊐, and 䉭 represent mean values of the ratios in the first and the second maximum in different ranges in children’s speech (dashed line), adult speech (solid line), and whispering sounds (dotted line), respectively. Large symbols represent [i]; middlesized symbols represent [y]; small symbols represent [e]. Vertical bars show the standard deviations. The rest of designations are the same as in Fig. 1.

In conclusion, our results show that pitchdepen dent ratios of maximum spectral frequencies can be common parameters characterizing the phonetic type of a sound for the vowels [i], [y], and [e] in different types of speech. REFERENCES 1. Peterson, L.C. and Barney, H.L., J. Acoust. Soc. Am., 1952, vol. 24, no. 1, pp. 175–184. 2. Hillenbrand, J., Getty, J.A., Clark, M.J., and Wheeler, K., J. Acoust. Soc. Am., 1993, vol. 97, no. 5, pp. 3099–3111. 3. Lloyd, R.J., Phonet. Studien, 1890, vol. III, pp. 251– 278. 4. Potter, R.K. and Steinberg, J., J. Acoust. Soc. Am., 1950, vol. 22, no. 2, pp. 807–820.

DOKLADY BIOLOGICAL SCIENCES

Vol. 457

2014

5. Peterson, L.C., J. Speech Hear. Res., 1961, vol. 4, no. 1, pp. 10–29. 6. Andreeva, N.G., Kulikov, G.A., and Samokishchuk, A.P., Akust. Zh., 2002, vol. 48, no. 5, pp. 711–713. 7. Andreeva, N.G., Kulikov, G.A., and Samokishchuk, A.P., Sens. Sist., 2002, vol. 16, no. 3, pp. 230–237. 8. Andreeva, N.G. and Kulikov, G.A., Dokl. Biol. Sci., 2012, vol. 445, no. 1, pp. 207–209. 9. Andreeva, N.G. and Kulikov, G.A., Dokl. Biol. Sci., 2009, vol. 429, no. 1, pp. 487–489. 10. Tartter, V.C., Percept. Psychophys., 1991, vol. 49, no. 4, pp. 365–372. 11. Uplisova, K.O. and Sokolova, T.S., Sens. Sist., 2012, vol. 27, no. 3, pp. 230–237.

Translated by E. Suleimanova

Acoustic parameters of two-formant vowels in different speech types.

Acoustic parameters of two-formant vowels in different speech types. - PDF Download Free
237KB Sizes 0 Downloads 5 Views