Journalof Speech and HearingResearch, Volume 33, 245-254, June 1990
ELECTROGLOTTOGRAPHY AND VOCAL FOLD PHYSIOLOGY D. G. CHILDERS Department of ElectricalEngineering, University of Florida
G. P. MOORE
D. M. HICKS*
Department of Speech, University of Florida
Departmentof Electrical Engineering, University of Florida
The electroglottogram (EGG) is known to be related to vocal fold motion. A major hypothesis undergoing examination in several research centers is that the EGG is related to the area of contact ofthe vocal folds. This hypothesis is difficult to substantiate with direct measurements using human subjects. However, other supporting evidence can be offered. For this study we made measurements from synchronized ultra high-speed laryngeal films and from EGG waveforms collected from subjects with normal larynges and patients with vocal disorders. We compare certain features of the EGG waveform to (a) the instant of the opening of the glottis, (b) the instant of the closing of the glottis, and (c) the instant of the maximum opening of the glottis. In addition, we compare both the open quotient and the relative average perturbation measured from the glottal area to that estimated from the EGG. All ofthese comparisons indicate that vocal fold vibratory characteristics are reflected by features ofthe EGG waveform. This makes the EGG useful for speech analysis and synthesis as well as for modeling laryngeal behavior. The limitations of the EGG are discussed. KEY WORDS: electroglottography, vocal fold physiology, voice assessment, laryngeal function based on the collective evidence of numerous experiments but the model's features tend to be inferred from research observations and descriptive reports rather than from a compilation of statistical data extracted from experimental measurements. Some studies used only 1 or 2 subjects phonating only one or two vowels. The degree of variability and the causes of such variability between selected vocal fold vibratory events and features of the EGG waveform remain largely unknown. A mathematical model for the EGG waveform as a function of time was suggested in Childers, Hicks, Moore, and Alsaka (1986) as
Because the electroglottographic (EGG) device is relatively inexpensive, easy to use in research and clinical settings, and measures certain aspects of vocal fold motion and contact, it has achieved a certain popularity. Consequently, for almost 20 years, researchers have been working to establish relationships between vocal fold vibratory events and features of the EGG. (See Childers, 1977, and Childers & Krishnamurthy, 1985, for extensive reviews of the literature.) One of the most recent models depicting this relationship appears in Figure 1 (Childers, Hicks, Moore, & Alsaka, 1986). This model represents the combined work of many researchers (Baer, Lofqvist, & McGarr, 1983; Baer, Titze, & Yoshioka, 1983; Childers, 1977; Childers, Hicks, Moore, & Alsaka, 1986; Childers & Krishnamurthy, 1985; Childers, Naik, Larar, Krishnamurthy, & Moore, 1983; Childers, Smith, & Moore, 1984; Cranen & Boves, 1985; Dejonckere, 1981; Fant, Ondrackova, Lindqvist, & Sonesson, 1966; Fog-Pedersen, 1977; Fourcin, 1974, 1981; Gilbert, Potter, & Hoodin, 1984; Hirano, 1981; Kelman, 1981; Kitzing, Carlborg, & Lofqvist, 1982; Lecluse, 1977; Rothenberg, 1981). The methodologies employed by these researchers have varied, including laryngeal stroboscopy, ultra high-speed laryngeal films, photoglottography, inverse filtering, and supra- and subglottal pressure measurements. Specific details concerning the EGG waveform are outlined in Figure 1 and are still under investigation. Although many voice scientists presume that the description illustrated in Figure 1 has been confirmed by evidence from experimental research, in reality this is not altogether true (Childers & Krishnamurthy, 1985). The model in Figure 1 is certainly
EGG (t) = k/[A(t) + C]
where t represents time, A(t) is the vocal fold contact area, k is a scaling constant, and C is a constant proportional to the shunt impedance specified for the case when A(t) = 0. This mathematical model and its relationship to features of the EGG waveforms depicted in Figure 1 has been partially validated by simulation experiments in Childers et al. (1986) and by direct measurements with excised larynges (Scherer, Drucker, & Titze, 1988). Further, substantiation of the model is attempted in this paper by correlating features of the glottal area waveform with features of the EGG waveform. Specifically, we examined the relationships between features of the EGG waveform and the instants at which (a) the glottis opens, (b) the glottis closes, and (c) the glottal area is a maximum. METHODS
*Currently affiliated with The Cleveland Clinical Foundation. © 1990, American Speech-Language-Hearing Association
In previous research, we have used various methods to 245
246 Journal of Speech and Hearing Research
Idealized Descriptive EGG Waveform
Vocal folds maximally closed Maximum contact area Folds parting. usually from lower margins toward upper margins and posterior to anterior.
When this break point is present, this usually corresponds
to folds opening along upper margin. Upper fold margins continue to open Folds apart minimum contact area
Open phaw Folds in contact along lower margin. Glottal area zero. Folds closing from lower to upper margin and from anterior to posterior. Rapid increase in vocal fold contact
Closed phase ARTlSTlC RENDITION OF VOCAL FOLD MOTION FOR OPENING PHASE SUPERIOR VIEW OF ANTERIOR (TOP) TO POSTERIOR (BOlTOM)
FIGURE1. Idealized descriptive EGG waveform. The vocal fold events are labeled on the EGG waveform and correspondingly on the artistic rendition of the vocal fold motion. The vocal folds are stylized and depict only the anterior one-third segment of the folds. The upper and lower vocal fold margins are out of phase. record simultaneously the speech and EGG signals (Childers et al., 1983). However, the EGG and glottal area measures used in this study were taken directly from ultra high-speed laryngeal films because this procedure offered greater accuracy in relating the EGG to vocal fold physiology. The photographic equipment and configuration for this study have been described elsewhere (Childers, 1977; ChiIders et al., 1983; Moore, 1975). For this study, a Fastax camera model WF-14 was used with film speeds up to 5000 frames per second. Illumination was provided by an incandescent lamp having a color temperature of 3200" K. The light passed through two condenser lenses as well as a water cell to remove heat. A piano-convex lens focused the light onto a plane mirror that turned the converging light beam by 90",directing it to the laryngeal mirror. With the subject in place, a laryngeal mirror was located at the back of the pharynx; it directed the light 90"downward onto the vocal folds and reflected the image of the folds back through an opening in the center of the plane mirror and into the
camera lens. For each film, a dimensions grid was placed in the focal plane of the vocal folds and photographed. This grid allowed absolute measures of vocal fold length, displacement, and glottal area. A second lens, specifically designed for photographing an oscilloscope face, protruded from the side of the camera. A 5-kHz square wave timing signal, the speech waveform, and the EGG waveform were photographed through this lens. These first two traces were positioned on the oscilloscope so that they appeared on one edge of the film, while the EGG waveform was displayed on a third trace and was positioned on the other edge of the film. Because the two camera lenses exwose different places on the film, the three oscilloscope traces appear on a film frame that is displaced five frames behind the frame that contains the corresponding position of the vocal folds. An example of one of our film records appears in Childers and Krishnamurthy (1985). For our experiments, the subject, whether possessing a normal voice or a vocal disorder, attempted to phonate the vowel /it so that the epiglottis was out of the optical pathway of the vocal fold image. The required positioning of the tongue during filming resulted in the sound produced approximating an /a/. The subject's utterance was obtained using a hearing aid microphone that was attached to the laryngeal mirror handle at the point where the mirror frame joins with the handle. This location was in the oral cavity approximately 11cm from the vocal folds and 7 cm from the teeth, but varied slightly from subject to subject. The small hearing aid microphone was used at this particular location to minimize the effect of camera motor noise on the recordings of the phonation. The audio bandwidth of the microphone has been measured to be approximately 6 kHz with a slight peak at 4 kHz. The recorded phonation was sustained for about 3 s. The subiects were both male and female adults and were instructed to (a) phonate at a comfortable level and hold the fundamental frequency (Fo) constant, or to (b) vary Fo or (c) vary intensity. Various measurements were made from the ultra-high speed laryngeal films using a machine-assisted image processing system. The operator moved a cursor with the position of the cursor being read by the computer upon operator command. Measurements were taken of the superior view of the vocal folds including the length of contact from anterior to posterior, the width of the glottis at various locations. and so forth. Glottal area was calculated from these measurements and therefore was effectively sampled at 5 kHz. This measure was interpolated to 10 kHz. Similarlv., the EGG was measured from the trace on the same laryngeal films. These data were measured at two samples per film frame or 10 kHz, providing an accuracy similar to the interpolated glottal area measures. The algorithms for measuring from the EGG the instants of vocal fold opening and closing, the pitch period, and the instant of the maximum positive peak of the EGG are given in Krishnamurthy and Childers (1986). The algorithms for measuring from the glottal area the instants of vocal fold opening and closing, and the instant of the maximum positive peak of the glottal area are similar in principle to the EGG-based algorithms. All of the algo-
CHILDERS ET AL.:
rithms depend on the specification of threshold values, requiring operator interaction with the data. An examination of our glottal area data for the normal subjects revealed that 60% of our data files had complete glottal closure, whereas 40% did not have complete closure. The glottal area data with incomplete glottal closure had a dc-offset, which was measured and subtracted from the glottal area waveforms for each data record. We made all data measurements from the data files with complete glottal closure (60% of the data) as well as from the data files with incomplete glottal closure, but with the dc-offset removed from the glottal area. We report these results separately. The results will be presented first for the entire data set with the dc-offset removed from the glottal area data files. We will then present a summary of the same results for the 60% of the data files that had complete glottal closure and, thus, no dc-offset in the glottal area waveforms. The data waveforms, namely speech, EGG, differentiated EGG (DEGG), and glottal area were plotted in synchrony as shown in Figure 2. All figures in this paper present the data prior to measurement by the above algorithms. The results reported here used only the EGG and the glottal area measured from the same film. The purpose of restricting our study to this limited data set is its greater accuracy; the EGG and glottal area were in synchrony throughout the experiment, with only the known, constant five film frame displacement between the two signals. This displacement was corrected in the digitized data. Our objective was to examine the reliability of the EGG in assessing aspects of vocal fold physiology.
Subjects With Normal Larynges This experiment called for 4 male subjects to perform nine tasks each. These tasks consisted of three target fundamental frequencies (low: 120, medium: 170, and high: 340 Hz) and three target intensities (low: 66, medium: 70, and high: 74 dB SPL) for each Fo while phonating the sustained vowel /i/. The subjects attempted to match their F o to the target frequency provided by a sinewave function generator via headphones and the intensity on a sound level meter. Each subject's actual Fo for each task was measured from the data. A typical set of data is illustrated in Figure 2. The data records varied in length but lasted approximately 30 ms, representing 3 to 10 pitch periods, depending on F 0. The number of film frames for this time interval varied according to the exact film speed for the segment of film analyzed. The number of film frames was typically 130 to 150 per subject. The total number of film frames examined for normal subjects for this paper was approximately 3000, representing over 100 pitch periods of data from all normal subjects. Although this data base may seem small for acoustic data, it represents a major data base for vocal fold vibratory events via the EGG. Furthermore, this large sample was sufficient to pursue statistical relationships rather than being limited to stating empirical observations.