Combined Use of Standard and Throat Microphones for Measurement of Acoustic Voice Parameters and Voice Categorization *Virgilijus Uloza, *Evaldas Padervinskis, *Ingrida Uloziene, †Viktoras Saferis, and ‡,§Antanas Verikas, *yzKaunas, Lithuania, and xHalmstad, Sweden

Summary: Objective. The aim of the present study was to evaluate the reliability of the measurements of acoustic voice parameters obtained simultaneously using oral and contact (throat) microphones and to investigate utility of combined use of these microphones for voice categorization. Materials and Methods. Voice samples of sustained vowel /a/ obtained from 157 subjects (105 healthy and 52 pathological voices) were recorded in a soundproof booth simultaneously through two microphones: oral AKG Perception 220 microphone (AKG Acoustics, Vienna, Austria) and contact (throat) Triumph PC microphone (Clearer Communications, Inc, Burnaby, Canada) placed on the lamina of thyroid cartilage. Acoustic voice signal data were measured for fundamental frequency, percent of jitter and shimmer, normalized noise energy, signal-to-noise ratio, and harmonic-tonoise ratio using Dr. Speech software (Tiger Electronics, Seattle, WA). Results. The correlations of acoustic voice parameters in vocal performance were statistically significant and strong (r ¼ 0.71–1.0) for the entire functional measurements obtained for the two microphones. When classifying into healthypathological voice classes, the oral-shimmer revealed the correct classification rate (CCR) of 75.2% and the throat-jitter revealed CCR of 70.7%. However, combination of both throat and oral microphones allowed identifying a set of three voice parameters: throat-signal-to-noise ratio, oral-shimmer, and oral-normalized noise energy, which provided the CCR of 80.3%. Conclusions. The measurements of acoustic voice parameters using a combination of oral and throat microphones showed to be reliable in clinical settings and demonstrated high CCRs when distinguishing the healthy and pathological voice patient groups. Our study validates the suitability of the throat microphone signal for the task of automatic voice analysis for the purpose of voice screening. Key Words: Oral contact microphones–Acoustic voice parameters–Voice categorization.

INTRODUCTION Acoustic voice analysis has been found to be essential and increasingly used both for research and objective assessment of voice disorders in clinical settings since 1960s. Consequently, acoustic measures of severity of dysphonia have been already commonly used in voice clinics because of their utility for the algorithms of automated voice analysis and screening, collection of objective noninvasive voice data, and feasibility to document and quantify dysphonia changes and outcomes of therapeutic and surgical treatment of voice problems.1–5 According to Titze,6 acoustic voice signals could be classified into three types. Type 1 signals are nearly periodic, type 2 signals contain intermittency, strong subharmonics, or modulations, and type 3 signals are chaotic or random. Therefore, different methods of voice analysis should be applied depending on the voice signal type. For type 1 signals, perturbation Accepted for publication October 14, 2014. Conflict of interest: None to report. From the *Department of Otolaryngology, Lithuanian University of Health Sciences, Kaunas, Lithuania; yDepartment of Physics, Mathematics & Biophysics, Lithuanian University of Health Sciences, Kaunas, Lithuania; zDepartment of Electric Power Systems, Kaunas University of Technology, Kaunas, Lithuania; and the xDepartment of Intelligent Systems, Halmstad University, Halmstad, Sweden. Address correspondence and reprint requests to Evaldas Padervinskis, Department of Otorhinolaryngology, Lithuanian University of Health Sciences, Eiveniu 2, LT-50009 Kaunas, Lithuania. E-mail: [email protected] Journal of Voice, Vol. -, No. -, pp. 1-8 0892-1997/$36.00 Ó 2014 The Voice Foundation http://dx.doi.org/10.1016/j.jvoice.2014.10.008

analysis has considerable utility and reliability; for type 2 signals, visual displays (eg, spectrograms, phase portraits, or next-cycle parameter contours) are most useful for understanding the physical characteristics of the oscillating system; and for type 3 signals, perceptual ratings of roughness (and any other auditory manifestation of aperiodicity) are likely to be the best measures for clinical assessment.6 However, factors influencing the accuracy and comparability of the measurements of acoustic voice parameters may arise from variations in data acquisition environment,7 microphone types or placements,7–9 recording systems, and methods of voice signal analysis.5,10–13 Microphones are the basic tools for registration of voice signals aiming to convert the sound pressure signal to an electric signal with the same characteristics.9 Consequently, the type and technical characteristics of the microphone may determine the final results of acoustic voice analysis. Despite the fact that voice and speech recordings and measurements are carried out routinely for clinical and research purposes, the subject of microphone selection reflects some controversies.7–9,14 During voice and speech production, vibrations from the vocal folds are transmitted through the vocal tract and through the body tissue to the skin surface. These skin surface vibrations can be sensed by contact microphones and/or accelerometers (ie, vibration sensors that convert mechanical energy into electrical energy in response to the stress applied to it and using piezoelectric effect), as opposed to through the air, and output signal mirroring the sound signal generated by the vocal fold

2

Journal of Voice, Vol. -, No. -, 2014

vibrations can be used to transmit voice signal into analysis systems,15,16 even revealing representation of the rapid subglottal pressure vibrations.17 As opposed to conventional acoustic microphones routinely used for voice recordings, contact microphones are less sensitive to background noise from the surrounding environment. Moreover, contact microphones and/or accelerometers have the potential to eliminate acoustic effects of the vocal tract, thus providing enhanced voice signal clarity in elevated ambient noise environments.1,16 Several studies investigated the applicability of contact microphones for research and practical utility and demonstrated relative insensitivity of contact microphones to background noise; however, they have revealed decreased speech signal intelligibility comparing with headset microphones.18–20 In other studies, accelerometers have been used and found to be useful for voice and speech measurements, that is, for detecting glottal vibrations, extraction of voice fundamental frequency (F0) and frequency perturbation measurements,1 evaluation of acoustic voice characteristics before and after intubation,21 voice accumulation/dosimetry,22,23 estimation of sound pressure levels of voiced speech,23 mapping of neck surface vibrations during vocalized speech,24 and measurement of facial bone vibration in resonant voice production.25,26 There is a lack of data in the literature concerning the comparative studies on applicability of contact microphones for acoustic voice measurements for voice screening purposes and/or for using combined use of standard and contact (throat) microphones. Therefore, the aim of the present study was to validate the suitability of the throat microphone signal for the task of voice screening purposes, evaluate reliability of acoustic voice parameters obtained simultaneously using oral and contact (throat) microphones, and investigate the utility of combined use of these microphones for voice categorization. MATERIALS AND METHODS A study group consisted of 157 individuals examined at the Department of Otolaryngology of the Lithuanian University of Health Sciences, Kaunas, Lithuania. The normal voice subgroup was composed of 105 selected healthy volunteer individuals who considered their voice as

normal. They had no complaints concerning their voice and no history of chronic laryngeal diseases or other long-lasting voice disorders. All of them were free from any known hearing problems and free from common cold or upper respiratory infections at the time of voice recording. The voices of this group of individuals were also evaluated as healthy voices by clinical voice specialists. Furthermore, no pathological alterations in the larynx of the subjects of the normal voice subgroup were found during video laryngostroboscopy (VLS). Digital highquality VLS recordings were performed with an XION EndoSTROB DX device (XION GmbH, Berlin, Germany) using a 70 rigid endoscope. Acoustic voice signal parameters of these normal voice subgroup subjects that were obtained using Dr. Speech software (Tiger Electronics, Seattle, WA; subprogram: voice assessment, version 3.0) were within the normal range.3 The pathological voice subgroup consisted of 52 patients who represented a rather common and clinically discriminative group of laryngeal diseases, that is, mass lesions of the vocal folds and paralysis. Mass lesions of vocal folds included in the study consisted of nodules, polyps, cysts, papillomata, keratosis, and carcinoma. Pathological voice group patients were recruited from consecutive patients who were diagnosed with the laryngeal diseases mentioned previously. The clinical diagnosis was based on typical clinical signs revealed during VLS and direct microlaryngoscopy. In all cases of mass lesions of the vocal folds, the final diagnosis was proven by the results of the histological examination of the removed tissue. Demographic data of the total study group and diagnoses of the pathological voice subgroup are presented in Table 1. These patients were serially enrolled and, therefore, likely represent the real incidence of pathologies in our series and can be considered to be clinically representative of the population of voice-disordered patients. VOICE RECORDINGS The mixed gender database of voice recordings used in this study contained 157 digital voice recordings of sustained phonation of the vowel sound /a/ (as in the English word ‘‘large’’). (Table 1). The subjects were asked to phonate sustained vowel /a/ at comfortable pitch and loudness level for at least 5 seconds’ duration. Voice samples obtained from each

TABLE 1. Demographic Data of the Study Group Gender Diagnosis Normal voice Nodules Polyps Carcinoma Reinke hyperplasia Papillomatosis Other (cyst, granuloma, monochorditis) Abbreviation: SD, standard deviation.

Age (y)

Total Number (n ¼ 157)

Female (n ¼ 102)

Male (n ¼ 55)

Mean

SD

105 7 14 6 9 7 9

71 7 8 0 7 3 6

34 0 6 6 2 4 3

46.2 25.4 41.1 62 50 40 45.7

6.70 6.00 11.70 7.00 7.10 13.50 8.10

Virgilijus Uloza, et al

3

Combined Use of Standard and Throat Microphones

subject were recorded in a soundproof booth simultaneously through two microphones: oral cardioid AKG Perception 220 (AKG Acoustics, Vienna, Austria) microphone placed at a 10.0 cm distance from the mouth (the subjects were seated with a head rest), keeping at about 90 microphone-to-mouth angle, and low-cost small contact (throat) microphone Stryker/Triumph PC (Clearer Communications, Inc, Burnaby, Canada) placed on the projection of lamina of thyroid cartilage and fixed with elastic bail. Localization of the throat microphone on thyroid lamina was chosen to acquire the strongest signal because the average magnitude of the acceleration tends to be greatest on and in the immediate vicinity of the larynx.24 The voice recordings were made in the ‘‘wav’’ file format on separate tracks using Audacity software (http://audacity. sourceforge.net/) at the rate of 44.100 samples per second. Sixteen bits were allocated for one sample. The external sound card M-Audio (Cumberland, RI) was used for digitization of the voice recordings.

ACOUSTIC ANALYSIS Segments of at least 5 seconds of duration of the sustained vowel /a:/ of separate voice samples from each recording session were analyzed using Dr. Speech software (subprogram: voice assessment, version 3.0). Acoustic voice signal data were measured for Fo, percent of jitter and shimmer, normalized noise energy (NNE), signal-to-noise ratio (SNR), and harmonic-to-noise ratio (HNR). According to the results of our previous study, no statistically significant differences between means of male and female acoustic voice parameters (except the mean F0) were revealed.3 Therefore, in this study, we did not separate parameters of acoustic voice analysis be-

tween males and females. However, the F0 parameter was analyzed separately considering gender of the subjects. STATISTICS Statistical analysis was performed using IBM SPSS Statistics software for Windows, version 20.0 (IBM Corporation, Armonk, NY). Data were presented as mean ± standard deviation (SD). The Student t test was used for testing hypotheses about equality of the mean. The size of the differences among the mean values of the groups was evaluated by estimation of type II error b. The size of the difference was considered to be significant if b  .2 (ie, the power of statistical test 0.8) as type I error a ¼ .05. Fisher discriminant analysis was performed to determine limiting values of acoustic voice parameters discriminating normal and pathological voice groups and selecting an optimum set of parameters for the classification task. Correct classification rate (CCR) was used to evaluate the feasibility of acoustic voice parameters classifying normal and pathological voice classes. The correlations among acoustic voice parameters were evaluated using Pearson correlation coefficient (r). The level of statistical significance by testing statistical hypothesis was 0.05. RESULTS The mean values and SDs of the acoustic voice parameters obtained both with oral and throat microphones in the total study group are presented in Table 2. Generally, no statistically significant differences (P > 0.05) between acoustic voice parameters obtained with oral and throat microphones were found for all parameters reflecting frequency and amplitude perturbations of voice signal. Some exception was revealed only for SNR and HNR parameters demonstrating slight, however,

TABLE 2. Comparison of the Means of Acoustic Voice Parameters Obtained From the Oral and Throat Microphones in the Total Study Group Paired Difference P*

by

1.00

0.801



0.017

0.59

0.853



0.432

4.76

0.172



1.380

5.64

0.000*

.345y

1.311

5.78

0.000*

.396y

0.455

0.33

0.079



0.167

0.08

0.366



Number of pair

Acoustic Voice Parameters

Mean

N

SD

Absolute

Pair 1

O-jitter T-jitter O-shimmer T-shimmer O-NNE T-NNE O-HNR T-HNR O-SNR T-SNR O-F0 T-F0 O-F0 T-F0

0.40 0.40 2.92 2.90 8.64 9.08 23.00 24.38 21.35 22.67 139.50 139.95 208.19 208.36

157 157 157 157 157 157 157 157 157 157 55 55 102 102

0.30 0.33 1.49 1.91 4.82 5.48 5.06 5.26 4.93 5.31 83.32 83.19 35.73 35.86

0.004

Pair 2 Pair 3 Pair 4 Pair 5 Pair 6, male Pair 7, female

Abbreviations: SD, standard deviation; O, oral microphone; T, throat microphone. * Statistically significant difference. y Computed as a ¼ .05.

%

4

Journal of Voice, Vol. -, No. -, 2014

TABLE 3. Comparison of the Means of Acoustic Voice Parameters Obtained From Oral and Throat Microphones in the Normal Voice Subgroup Paired Difference Number of pair

Acoustic Voice Parameters

Mean

N

SD

Absolute

%

P*

by

Pair 1

O-jitter T-jitter O-shimmer T-shimmer O-NNE T-NNE O-HNR T-HNR O-SNR T-SNR O-F0 T-F0 O-F0 T-F0

0.30 0.30 2.35 2.37 9.95 10.19 24.83 25.63 23.13 23.91 115.14 115.05 210.17 210.45

105 105 105 105 105 105 105 105 105 105 34 34 71 71

0.14 0.14 0.95 1.48 4.60 5.44 3.91 4.60 3.94 4.72 20.54 20.50 34.87 34.66

0.002 0.002 0.018 0.018 0.248 0.248 0.802 0.802 0.777 0.777 0.094 0.094 0.275 0.275

0.63 0.63 0.75 0.75 2.44 2.44 3.13 3.13 3.25 3.25 0.08 0.08 0.13 0.13

0.816 0.816 0.875 0.875 0.578 0.578 0.019*



Pair 2 Pair 3 Pair 4 Pair 5 Pair 6, male Pair 7, female

— .322y .000y

0.027*

.000y

0.668 0.668 0.116 0.116

.921y —

Abbreviations: SD, standard deviation; O, oral microphone; T, throat microphone. * Statistically significant difference. y Computed as a ¼ .05.

statistically significant differences between the microphone measurements. However, these differences were only within the range of 5.64–5.78%. The observed statistically significant difference between the HNR and SNR parameters of the two microphones could be because of the rather different frequency response curves of the microphones. In Table 3, the mean values and SDs of the acoustic voice parameters in normal voice subgroup are presented. Again, no statistically significant differences between acoustic voice parameters obtained with the oral and throat microphones were found for all parameters checked, except the SNR and HNR parameters showing statistically significant differences. However, these differences were only within 3.13–3.25% range. Table 4 presents the mean values and SDs of the acoustic voice parameters in pathological voice subgroup. No statistically significant differences between acoustic voice parameters obtained with the oral and throat microphones were revealed for all parameters checked, except the SNR and HNR parameters showing statistically significant differences. These differences were within 11.59–11.86% range. A statistically significant difference between oral and throat F0 in males of the pathological voice subgroup was also revealed. However, this difference consisted only 1.35 dB (0.75%). Results of correlation analysis are presented in Table 5. Generally, the statistical analysis showed significant moderate-to-strong correlations among the measured instrumental voice parameters obtained both from the oral and throat microphones. However, strong correlations were observed among acoustic voice parameters reflecting pitch and amplitude perturbations (r  0.70), whereas correlations among perturba-

tion measurements and measurements of voice signal turbulences (NNE, HNR, and SNR) were moderate to strong (r ¼ 0.52–0.89). F0 registered both with the oral and throat microphones was almost identical and demonstrated the perfect correlations (r ¼ 1.0; P < 0.01). Moreover, no statistically significant correlations between F0 and other acoustic voice parameters registered were revealed. In Figure 1, paired correlations between acoustic voice parameters obtained with the oral and throat microphones are disclosed. As follows from Figure 1, statistically significant strong correlations (r ¼ 0.71–0.86 and F0 ¼ 1.0) between identical voice measurements registered with different microphones were revealed. Table 6 presents results of classification of voice signal into two classes, that is, normal and pathological voice. As the outcome of Fisher discriminant analysis of the separate acoustic voice parameters, the optimum limiting values of the parameters discriminating the normal and pathological voice subgroups were determined, and consequent CCRs were calculated. As follows from Table 6, for the oral microphone, oral-shimmer (O-shimmer) was the most discriminative parameter and provided CCR of 75.2%. For the throat microphone, throat-jitter (T-jitter) was the most discriminative parameter and provided CCR of 70.7%. Fisher discriminant analysis using entire acoustic voice parameters selected an optimum set of parameters discriminating normal and pathological voice subgroups. This set included the following three acoustic voice parameters: T-SNR, O-shimmer, and O-NNE. Consequently, combination of both throat and oral microphones allowed identifying an optimum set of acoustic voice parameters providing CCR of 80.3%. Presumably, a higher CCR may be achieved using more complex feature sets.

Virgilijus Uloza, et al

5

Combined Use of Standard and Throat Microphones

TABLE 4. Comparison of the Means of Acoustic Voice Parameters Obtained From Oral and Throat Microphones in the Pathological Voice Subgroup Paired Difference Number of pair

Acoustic Voice Parameters

Mean

N

SD

Absolute

%

P*

by

Pair 1

O-jitter T-jitter O-shimmer T-shimmer O-NNE T-NNE O-HNR T-HNR O-SNR T-SNR O-F0 T-F0 O-F0 T-F0

0.59 0.61 4.05 3.96 6.01 6.82 19.32 21.86 17.77 20.16 178.92 180.27 203.67 203.59

52 52 52 52 52 52 52 52 52 52 21 21 31 31

0.41 0.47 1.74 2.24 4.16 4.88 5.14 5.65 4.81 5.60 124.14 123.41 37.82 38.61

0.014 0.014 0.087 0.087 0.803 0.803 2.533 2.533 2.391 2.391 1.349 1.349 0.079 0.079

2.32 2.32 2.21 2.21 11.79 11.79 11.59 11.59 11.86 11.86 0.75 0.75 0.04 0.04

0.707 0.707 0.592 0.592 0.011*



.322y

0.000*

.000y

0.000*

.000y

0.014*

.921y

0.866 0.866



Pair 2 Pair 3 Pair 4 Pair 5 Pair 6, male Pair 7, female



Abbreviations: SD, standard deviation; O, oral microphone; T, throat microphone. * Statistically significant difference. y Computed as a ¼ .05.

DISCUSSION Currently, an automated acoustic analysis of voice is increasingly used for the screening of laryngeal disorders.4,27–31 One of the most important factors determining reliability and practical utility of screening and categorization of voice disorders is voice recordings of acceptable quality. Therefore, choice of the appropriate microphone plays an important role in this matter. A study performed by Titze and Winholtz8 demonstrated that the type of microphone used in acoustic voice analysis has significant impact on quality of measurement outcome. The results showed that condenser microphones give better results than dynamic microphones, microphones with a balanced output perform better than those with unbalanced outputs, and microphone sensitivity and distance have the largest

effect on perturbation measures.32 An acoustic cardioid microphone has been considered by some investigators to be the best choice when voice is measured in clinical settings, especially if perturbation measurements are the main interest.14 However, because of the proximity effect of the microphone, even with these microphones, spectral measurements may be distorted, and inaccuracies of voice measurements may occur.9 Basically, measurements of acoustic signal perturbations represent measurements of noise and assess the nonstationary characteristics of the acoustic voice signals. Of note, deviations from voice signal stationary cyclic behavior can result either from the larynx or from the noise, either in the acoustic environment or in the data acquisition hardware.32 Therefore, it is of great importance to control noise level in the environment

TABLE 5. Correlation Coefficients (r) Among Acoustic Voice Parameters Obtained Using the Oral and Throat Microphones

O-jitter O-shimmer O-NNE O-HNR O-SNR T-jitter T-shimmer T-NNE T-HNR T-SNR

O-Jitter

O-Shimmer

O-NNE

O-HNR

O-SNR

T-Jitter

T-Shimmer

T-NNE

T-HNR

T-SNR

1.00 0.80 0.59 0.67 0.68 0.86 0.69 0.52 0.66 0.65

0.80 1.00 0.66 0.89 0.89 0.80 0.80 0.55 0.78 0.76

0.59 0.66 1.00 0.65 0.67 0.59 0.72 0.71 0.76 0.78

0.67 0.89 0.65 1.00 0.99 0.68 0.72 0.49 0.78 0.76

0.68 0.89 0.67 0.99 1.00 0.68 0.72 0.50 0.78 0.77

0.86 0.80 0.59 0.68 0.68 1.00 0.81 0.52 0.73 0.72

0.69 0.80 0.72 0.72 0.72 0.81 1.00 0.59 0.89 0.89

0.52 0.55 0.71 0.49 0.50 0.52 0.59 1.00 0.62 0.63

0.66 0.78 0.76 0.78 0.78 0.73 0.89 0.62 1.00 0.98

0.65 0.76 0.78 0.76 0.77 0.72 0.89 0.63 0.98 1.00

Notes: P < 0.05. Abbreviations: O, oral microphone; T, throat microphone.

6

Journal of Voice, Vol. -, No. -, 2014

O-Fo & T- Fo

1

O-SNR & T-SNR

0.77

O-HNR & T-HNR

0.78

O-NNE & T-NNE

0.71

O- Shimmer & T-Shimmer

0.8

r 0

0.2

0.4

0.6

0.8

1

1.2

FIGURE 1. Paired correlations between acoustic voice parameters obtained with the oral and acoustic microphones. and to select appropriate recording systems combined with microphones that provide high SNR.7 Consequently, all noisecontributing factors should result in an acoustic environment that has the SNR of at least 30 dB to produce valid results.32 These requirements could be fulfilled rather easily, if voice recordings are performed in a special soundproof booth. However, this could not be feasible for voice recordings occurring in an ordinary environment when voice recordings are carried out for a voice disorder screening task. On the other hand, contact microphones providing reduced sensitivity to environmental noise could be one of the solutions to preclude influence of background noise. Moreover, the waveform of a contact microphone is reasonably independent of the articulation because of high glottal impedance.15 Consequently, the waveform of a contact microphone is suitable for F0 measurements and frequently has been found useful for F0 detection and perturbation measurements.1 These circumstances were considered in this study investigating the suitability of use of throat microphone signal for voice categorization and screening purposes. Moreover, it was presumed that combination of oral and throat microphones would increase CCR discriminating normal and pathological voice groups. Horii1 used a contact microphone (accelerometer) to eliminate acoustic effects of the vocal tract and found no significant differences in jitter or shimmer measurements among eight vowels. However, the airborne voice signals had approximately twice as much shimmer as those from the accelerometer sig-

TABLE 6. CCR Achieved When Classifying Into Normal and Pathological Voice Classes Using Acoustic Voice Parameters Obtained From the Oral and Throat Microphones Microphones Oral Throat Oral and throat

Acoustic Voice Parameters O-shimmer T-jitter T-SNR O-shimmer O-NNE

CCR (%) 75.2 70.7 80.3

Limiting Value 3.20 0.45 22.03 3.20 7.98

Abbreviations: CCR, correct classification rate; O, oral microphone; T, throat microphone.

nals. Jitter values, on the other hand, showed only a slight tendency toward increase in the airborne signals. Results of the present study are in some discrepancy with the data of Horii because we found strong correlation between jitter and shimmer values measured with acoustic and throat microphones. Moreover, there were no statistically significant differences between the mean values of these voice perturbation parameters obtained using both acoustic and throat microphones. Generally, in the F0 data, there was a perfect agreement between the two microphones both in the male and female groups in our series. These findings are in some controversy with the results of the study by Askenfelt et al15 who found that the mean F0 tended to be slightly higher in the contact microphone measurements comparing with electroglottogram. However, running speech, F0 extraction algorithm, and several other factors could contribute to this. Therefore, the study by Askenfelt et al should be redone. To the best of our knowledge, the present study measurements of voice signal turbulences (SNR, HNR, and NNE) obtained from the throat microphone have been presented for the first time. Strong correlations among these acoustic voice parameters registered with different microphones (oral vs throat) were revealed (r ¼ 0.71–0.78) confirming acceptability of throat microphones for measurements in clinical settings and/or for screening purposes. Despite the statistically significant differences among the mean values of HNR and SNR that were found in this study with oral microphones showing a slight tendency to higher HNR and SNR values, these differences in the total study group were in the range only of 5.6– 5.8%. The observed statistically significant difference between the HNR and SNR parameters of the two microphones could be because of the rather different frequency response curves of oral and throat microphones. Further studies are required to assess possible differences in HNR and SNR measurements using two types of microphones in a more clinically realistic environment. In the present study, combined use of both oral and throat microphones revealed some benefits discriminating normal and pathological voice subgroups. The discriminant analysis determined an optimum set of acoustic voice parameters, including T-SNR, O-shimmer, and O-NNE, which provided CCR of 80.3% when categorizing normal and pathological voice samples. In comparison, from separate oral microphone recordings, O-shimmer provided CCR of 75.2% and from separate throat microphone recordings T-jitter provided CCR of 70.7%, respectively. Thus, the most discriminative parameter of the throat microphone, T-jitter, was not included in the combined set of parameters. Such behavior is often observed in variable selection because two individually best variables do not necessarily comprise the best subset of two variables. The combined set of variables, containing parameters related to both throat and oral microphones, indicates that the throat microphone may bring additional information useful for the task. Results presented in Table 6 also indicate that voice parameters computed from ‘‘throat records’’ may possess reasonable discriminative power and can be used in screening aiming to

Virgilijus Uloza, et al

Combined Use of Standard and Throat Microphones

distinguish between normal and pathological voices. Although, for the given data set, the discriminative power of the best ‘‘oral parameter’’ is noticeably higher than the discriminative power of the best ‘‘throat parameter,’’ the difference between discriminative ability of larger sets of parameters may be not so evident. This insight certainly needs deeper studies. Some limitations originating both from the design of the present study and usage of throat microphones with inherent restrictions of such type of devices for acoustic voice analysis must be considered. Throat microphones should not be very effective in transmitting consonant sounds and high frequencies.20 The intrinsic elasticity properties of underlying human body tissues acting as a low-pass filter with a 3KHz cutoff frequency limit the frequency representation of the signal.19 This feature of the throat microphone may influence the accuracy of voice signal turbulence noise measurements. Presumably, these features of the throat microphone determined some differences among the mean values of the voice signal turbulence measurements (SNR and HNR) carried out with microphones of the two types in the present study. Most of the voice signals recorded in our study can be attributed to the type 1 group, according to Titze.6 However, signals of type 2 and 3 are also present. As it is well known, per-cycle measurement of F0 cannot be done reliably for type 2 and type 3 signals. An acoustic microphone exhibiting a cardioid polar pattern, as the AKG Perception 220 does, is not an ideal microphone for performing spectral analysis because the frequency response reduces about 3 dB at 50 Hz. However, because the frequency range of the contact Stryker/Triumph PC microphone is limited to 100 Hz at the lower end, this deficiency of the acoustic microphone is not crucial for the pairwise comparison of voice parameters obtained using these two microphones. A relative discomfort of wearing a throat microphone during voice recording, as well as some difficulty of properly positioning the device and quantification of the effects of contact pressure on the skin frequency response should be considered in future studies.16,19 Also, it will be of great importance to analyze how well the throat microphone performs in an ordinary environment and in the presence of background noise. In this study, sustained phonation of vowel /a/ was chosen for analysis because the steady-state phonations (ie, time and frequency invariances) are simple, time effective, allow the reduction of the variances in sustained vowels, and provide reliable detection and computation of acoustic features.2,11,33 Moreover, sustained vowels are not influenced by speech rate and stress; they typically do not contain voiceless phonemes, fast voice onsets and terminations, and prosodic fluctuations in F0 and amplitude.13 Despite that sustained vowel phonations could not be a complete substitute for real-life phonation in acoustic analysis,33 they are relatively insulated from influences related to different languages and therefore could be considered as universal and suitable for voice screening purposes. Nevertheless, analysis of connected speech samples would be of interest in future research because symptoms of disordered voice quality are more typically revealed in continuous speech.13

7

In the present study, only the Dr. Speech system registering a rather limited number of acoustic voice parameters reflecting perturbation and turbulent noise variables in voice signal was used.10 This limitation of the analysis system reduces the accuracy of classification into normal and pathological voice classes. Therefore, future investigation should be concentrated on the utility of a large variety of voice signal feature types in categorizing the voice into healthy and different pathological voice classes, using contemporary and more sophisticated methods of automated voice analysis.4,31,34 CONCLUSIONS In summary, the measurements of acoustic voice parameters using a combination of oral and throat microphones showed to be reliable in clinical settings and demonstrated high CCR of distinguishing healthy and pathological voice patient subgroups. High correlation between acoustic voice parameters measured was observed using microphones of the two types. In situations when the use of conventional microphones because of background noise could be complicated or restricted, a contact (throat) microphone can be considered as a valuable and beneficial alternative and/or supplement for voice recordings and analysis. Our data validate the suitability of the throat microphone signal for the task of automatic voice analysis for voice screening purposes. Acknowledgments  This study was supported by VP1-3.1-SMM-10-V-02-030 grant from the Ministry of Education and Science of Republic of Lithuania. REFERENCES 1. Horii Y. Jitter and shimmer differences among sustained vowel phonations. J Speech Hear Res. 1982;25:12–14. 2. Zhang Y, Jiang JJ. Acoustic analyses of sustained and running voices from patients with laryngeal pathologies. J Voice. 2008;22:1–9. 3. Uloza V, Saferis V, Uloziene I. Perceptual and acoustic assessment of voice pathology and the efficacy of endolaryngeal phonomicrosurgery. J Voice. 2005;19:138–145. 4. Uloza V, Verikas A, Bacauskiene M, Gelzinis A, Pribuisiene R, Kaseta M, Saferis V. Categorizing normal and pathological voices: automated and perceptual categorization. J Voice. 2011;25:700–708. 5. Maryn Y, Corthals P, De Bodt M, Van Cauwenberge P, Deliyski D. Perturbation measures of voice: a comparative study between Multi-Dimensional Voice Program and Praat. Folia Phoniatr Logop. 2009;61:217–226. 6. Titze IR. Workshop on Acoustic Voice Analysis. Summary Statement. Salt Lake City, UT: National Center for Voice and Speech; 1995. 7. Deliyski DD, Evans MK, Shaw HS. Influence of data acquisition environment on accuracy of acoustic voice quality measurements. J Voice. 2005; 19:176–186. 8. Titze IR, Winholtz WS. Effect of microphone type and placement on voice perturbation measurements. J Speech Hear Res. 1993;36:1177–1190. 9. Svec JG, Granqvist S. Guidelines for selecting microphones for human voice production research. Am J Speech Lang Pathol. 2010;19:356–368. 10. Smits I, Ceuppens P, De Bodt MS. A comparative study of acoustic voice measurements by means of Dr. Speech and Computerized Speech Lab. J Voice. 2005;19:187–196. 11. Wormald RN, Moran RJ, Reilly RB, Lacy PD. Performance of an automated, remote system to detect vocal fold paralysis. Ann Otol Rhinol Laryngol. 2008;117:834–838.

8 12. Lin E, Hornibrook J, Ormond T. Evaluating iPhone recordings for acoustic voice assessment. Folia Phoniatr Logop. 2012;64:122–130. 13. Maryn Y, De Bodt M, Barsties B, Roy N. The value of the Acoustic Voice Quality Index as a measure of dysphonia severity in subjects speaking different languages. Eur Arch Otorhinolaryngol. 2014;271:1609–1619. 14. Baken RJ, Orlikoff RF. Clinical Measurement of Speech and Voice. San Diego, CA: Singular Publishing Group; 2000:610. 15. Askenfelt A, Gauffin J, Sundberg J, Kitzing P. A comparison of contact microphone and electroglottograph for the measurement of vocal fundamental frequency. J Speech Hear Res. 1980;23:258–273. 16. Munger JB, Thomson SL. Frequency response of the skin on the head and neck during production of selected speech sounds. J Acoust Soc Am. 2008; 124:4001–4012. 17. Neumann K, Gall V, Schutte HK, Miller DG. A new method to record subglottal pressure waves: potential applications. J Voice. 2003;17:140–159. 18. Graciarena M, Franco H, Sonmez K, Bratt H. Combining standard and throat microphones for robust speech recognition. IEEE Signal Process Lett. 2003;10:72–74. 19. Dupont S, Ris C, Bachelart D. Combined use of close-talk and throat microphones for improved speech recognition under non-stationary background noise. Proceedings of the COST278 and ISCA Tutorial and Research Workshop (ITRW) on Robustness Issues in Conversational Interaction, 30-31 August, 2004, Norwich, UK, 2004, International Speech Communication Association. 20. Acker-Mills BE, Houtsma AJ, Ahroon WA. Speech intelligibility in noise using throat and acoustic microphones. Aviat Space Environ Med. 2006; 77:26–31. 21. Horii Y, Fuller BF. Selected acoustic characteristics of voices before intubation and after extubation. J Speech Hear Res. 1990;33:505–510. 22. Cheyne HA, Hanson HM, Genereux RP, Stevens KN, Hillman RE. Development and testing of a portable vocal accumulator. J Speech Hear Res. 2003;46:1457–1467.

Journal of Voice, Vol. -, No. -, 2014 23. Svec JG, Titze IR, Popolo PS. Estimation of sound pressure levels of voiced speech from skin vibration of the neck. J Acoust Soc Am. 2005;117: 1386–1394. 24. Nolan M, Madden B, Burke E. Accelerometer based measurement for the mapping of neck surface vibrations during vocalized speech. Conf Proc IEEE Eng Med Biol Soc. 2009;2009:4453–4456. 25. Yiu EM, Chen FC, Lo G, Pang G. Vibratory and perceptual measurement of resonant voice. J Voice. 2012;26:675.e13–675.e19. 26. Chen FC, Ma EP, Yiu EM. Facial bone vibration in resonant voice production. J Voice. 2014;28:596–602. 27. Moran RJ, Reilly RB, de Chazal P, Lacy PD. Telephony-based voice pathology assessment using automated speech analysis. IEEE Trans Biomed Eng. 2006;53:468–477. 28. Linder R, Albers AE, Hess M, P€oppl SJ, Sch€onweiler R. Artificial neural network-based classification to screen for dysphonia using psychoacoustic scaling of acoustic voice features. J Voice. 2008;22:155–163. 29. Godino-Llorente JI, Fraile R, Saenz-Lechon N, Osma-Ruiz V, GomezVilda P. Automatic detection of voice impairments from text-dependent running speech. Biomed Signal Process Control. 2009;4:176–182. 30. Maier A, Haderlein T, Stelzle F, et al. Automatic speech recognition systems for the evaluation of voice and speech disorders in head and neck cancer. EURASIP J Audio Speech Music Process. 2010;2010:1–7. 31. Muhammad G, Mesallam TA, Malki KH, Farahat M, Mahmood A, Alsulaiman M. Multidirectional regression (MDR)-based features for automatic voice disorder detection. J Voice. 2012;26:817–819. 32. Deliyski DD, Shaw HS, Evans MK. Adverse effects of environmental noise on acoustic voice quality measurements. J Voice. 2005;19:15–28. 33. Moon KR, Chung SM, Park HS, Kim HS. Materials of acoustic analysis: sustained vowel versus sentence. J Voice. 2012;26:563–565. 34. Vaiciukynas E, Verikas A, Gelzinis A, Bacauskiene M, Uloza V. Exploring similarity-based classification of larynx disorders from human voice. Speech Comm. 2012;54:601–610.

Combined Use of Standard and Throat Microphones for Measurement of Acoustic Voice Parameters and Voice Categorization.

The aim of the present study was to evaluate the reliability of the measurements of acoustic voice parameters obtained simultaneously using oral and c...
270KB Sizes 1 Downloads 14 Views