Exploring the Relationship Between Spectral and Cepstral Measures of Voice and the Voice Handicap Index (VHI) *Shaheen N. Awan, †Nelson Roy, and ‡Seth M. Cohen, *Bloomsburg, Pennsylvania, and ySalt Lake City, Utah, and zDurham, North Carolina

Summary: Objectives. The purpose of this study was to examine the strength of relationship between impairmentlevel acoustic measures derived from spectral- and cepstral-based analyses (including the cepstral peak prominence [CPP]; ratios of low vs high frequency spectral energy; and the respective standard deviations [SDs] for these measures) and a disablement measure (the total Voice Handicap Index [VHI] score) in a large and diverse group of voicedisordered and control subjects. The relationship between total VHI and the Cepstral Spectral Index of Dysphonia (CSID—a multivariate estimate of dysphonia severity) was also examined. Methods. Subjects were 332 adults (116 males and 216 females) comprised of voice-disordered subjects who presented to a physician with a voice-related complaint (n ¼ 258) and a group of nonvoice-disordered control subjects (n ¼ 74). AVHI 30-item score and speech/voice samples including the second and third sentences of The Rainbow Passage and productions of the sustained vowel /ɑ/ were obtained for each subject. Sentence and sustained vowel samples were analyzed using the Analysis of Dysphonia in Speech and Voice (ADSV) program (ADSV model 5109 v.3.4.2; KayPENTAX, Montvale, NJ). Results. Across all subjects, low-to-moderate strength Spearman rho (rs) correlations were observed between the total VHI and the CPP and the CSID in both speech and vowel contexts and for the CPP SD from continuous speech (rs’s ranging from 0.45 to 0.49 for VHI vs CPP; 0.47 for VHI vs CSID; 0.44 for VHI vs CPP SD). Several other measures obtained from spectral or cepstral analyses also were observed to correlate with total VHI, although increased variability in the strength, direction, and overall significance of these other variables was observed depending on gender and elicited context. Conclusions. Voice-related disablement occurs within a context. In contrast, impairment-level measures of phonatory function (like the spectral and cepstral measures included in this study) are by nature decontextualized and appear to correlate low-to-moderately with quality of life measures like the VHI. Therefore, spectral and cepstral acoustic measures and the VHI should be viewed as providing relatively unique, meaningful, and complementary information. Key Words: Voice Handicap Index (VHI)–Cepstrum–Spectrum–Cepstral peak prominence (CPP)–Cepstral Spectral Index of Dysphonia (CSID). INTRODUCTION In 1980, the World Health Organization (WHO) proposed the International Classification of Impairments, Disabilities and Handicaps (ICIDH) to describe disablement at three levels of experience as a consequence of a disease or disorder. The three levels were impairment, disability, and handicap. The impairment level referred to the impact of a disease/disorder on bodily function. For instance, unilateral vocal fold paralysis (UVFP) often produces glottic insufficiency associated with a variety of perceptual, acoustic, and aerodynamic consequences. Measures of phonatory function such as listener ratings of dysphonia severity, spectral noise level, and mean glottal airflow would be considered impairment-level measures. In contrast, disability referred to the impact on performance due to the impairment, and handicap was the impact of the impairAccepted for publication December 11, 2013. From the *Department of Audiology and Speech Pathology, Bloomsburg University of Pennsylvania, Bloomsburg, Pennsylvania; yDepartment of Communication Sciences and Disorders, The University of Utah, Salt Lake City, Utah; and the zDivision of Otolaryngology - Head and Neck Surgery, Duke Voice Care Center, Duke University Medical Center, Durham, North Carolina. Address correspondence and reprint requests to Shaheen N. Awan, Department of Audiology and Speech Pathology, Bloomsburg University, 400 East Second St, Bloomsburg, PA 17815. E-mail: [email protected] Journal of Voice, Vol. -, No. -, pp. 1-10 0892-1997/$36.00 Ó 2014 The Voice Foundation http://dx.doi.org/10.1016/j.jvoice.2013.12.008

ment or disability on social, environmental, or economic functioning. If a schoolteacher with UVFP cannot speak loudly enough to be heard in the classroom, the teacher could be considered to have a form of disability. If the same teacher is forced to leave the teaching profession because of an inability to project the voice and becomes socially withdrawn, these occupational, economic, and social effects are regarded as handicaps. Although the original WHO ICIDH framework implied a linear, positive, and causal relationship between impairmentlevel measures and resulting levels of disability and handicap, in clinical practice, it is not uncommon for two patients with similar levels of impairment to perceive different levels of disablement (depending on contextual factors such as the vocal demands and/or expectations of the individual). As an example, the degree of disability or handicap experienced by someone with UVFP who uses their voice as the primary tool of trade (eg, teacher, actor, clergy, broadcaster, politician, salesperson) likely differs substantially from a nonprofessional voice user. These two individuals may share similar levels of impairment (as determined by listener ratings of dysphonia severity, acoustic estimates of spectral noise, or mean glottal airflow); however, they will likely perceive and report very different levels of disability or handicap. Recognizing that disablement occurs within a context, the WHO revised the ICIDH in 1997.1 The

2 revised version preserved the meaning of impairment, but the terms ‘‘disability’’ and ‘‘handicap’’ were replaced by ‘‘activity limitations’’ and ‘‘participation restrictions,’’ respectively. Activity limitations are difficulties an individual may have in the performance of activities (eg, speaking loudly), whereas participation restrictions are problems an individual may have in the manner or extent of involvement in life situations (eg, teaching school). The revised framework no longer implies a linear relationship between impairment-level measures and levels of disablement. Instead, it highlights that activity limitations and participation restrictions can also be influenced by contextual factors such as environmental (eg, poor classroom acoustics, availability of amplification) and personal factors (eg, age, occupation, personality, coping style). In this view, the three levels of disablement can be affected independently and, therefore, any measurement of disablement should assess these three levels separately.2 In this regard, traditional voice evaluation has focused primarily on documenting the severity of vocal impairment, with clinicians relying on a variety of impairment-level assessment tools to analyze phonatory function. In the clinical domain, these impairment-level measures not only include perceptual judgments of dysphonia severity and voice quality but instrumental analysis techniques including acoustic, aerodynamic, electroglottographic, and videolaryngostroboscopic methodologies (to mention only a few). Of these impairmentlevel assessment measures, acoustic analysis of voice has received the most attention.3 In the past 15 years, however, an increased emphasis has been placed on better understanding and assessing the impact of a voice disorder on an individual’s quality of life, and a number of patient-based instruments have been developed, including the Voice Activity and Participation Profile2; the Voice Handicap Index (VHI)4; the VHI-105; the Voice-Related Quality of Life (V-RQOL) measure6; and the Voice Symptom Scale.7 Branski et al8 provide a complete review of these instruments and their development. Of these instruments, the VHI4 is one of the most studied and popular. The VHI is a psychometrically validated tool developed for the measurement of the psychosocial handicapping effects of voice disorders. According to the test authors, the VHI can be used to assess the patient’s judgment about the relative impact of their voice disorder on daily activities and also be used as a component of measuring the functional outcomes of behavioral, medical, and/or surgical treatments of voice disorders.4 Several studies have investigated the strength of relationship between the VHI-based ratings and impairment-level estimates of dysphonia severity such as acoustic measures, with considerable variation in the reported findings. Nonsignificant or weak significant correlations between the VHI and acoustic voice measures have been reported by Hsiung et al,9 Wheeler et al,10 Pribuisiene et al,11 Woisard et al,12 Cheng and Woo,13 Niebudek-Bogusz et al,14 Pavlidou et al,15 and Hanschmann et al.16 Many of these studies have concluded that VHI and voice laboratory measurements provide independent information regarding the patient’s voice status and have, therefore, advocated that assessment should include both patient-based

Journal of Voice, Vol. -, No. -, 2014

measures of voice-related handicap as well as instrumental measures of voice function.2,11–13 Furthermore, it has been speculated that a nonlinear relationship among impairment, disability, and handicap may explain the general lack of strong linear relationships between acoustic measures and VHI10 However, in contrast to the aforementioned studies, other investigators have reported moderate to strong correlations between VHI and measures such as frequency range in speech,17 the Dysphonia Severity Index (DSI),18 and measures of shimmer and harmonics-to-noise ratio (HNR).19 Schindler et al19 observed that stronger correlations between VHI and voice laboratory acoustic measures may be observed within particular subgroups of patients who have a common underlying disorder. Most studies have examined correlations between VHI scores and acoustic measures obtained from sustained vowel samples, with reports of significant correlations between VHI and acoustic measures from continuous speech samples limited to various measures of fundamental frequency.10,17 However, in recent years, spectral- and cepstral-based measures that estimate voice quality disruption in continuous speech have become available. These measures not only appear to be robust but are also strong predictors of dysphonia type and severity.20–27 Key measures from spectral- and cepstral-based analyses have included estimates of the relative amplitude of the cepstral peak referred to as the cepstral peak prominence (CPP); ratios of low versus high frequency spectral energy; and the respective standard deviations (SD) for these measures.26,28 In addition, Awan et al21,23 have reported on multiple regression-based mathematical estimates of dysphonia severity (recently referred to as the Cepstral Spectral Index of Dysphonia [CSID]), which use several of the aforementioned cepstral- and spectral-based measures. Because it may be argued that continuous speech provides a more ecologically valid voice context for assessing vocal impairment, spectral- and cepstral-based analyses of continuous speech may produce differential degrees of correlation with measures like the VHI than observed via sustained vowel analyses. Therefore, the purpose of this study was to examine the strength of relationship between spectral and cepstral acoustic measures (including the CSID-estimated dysphonia severity) with VHI scores in a large and diverse group of voice-disordered and control subjects. The following research questions were addressed: 1. Do measures obtained from the spectrum and the cepstrum correlate significantly with overall VHI score? 2. Is there a difference in the strength of correlation with total VHI score for spectral and cepstral measures obtained via continuous speech versus sustained vowel production? 3. Does the strength of acoustic versus VHI correlation vary as a function of gender? METHODS The study was approved by the Duke University Medical Center and Bloomsburg University Institutional Review Boards. Subjects were 332 adults (116 males and 216 females) with a mean age of 51.94 years (SD ¼ 16.22; range ¼ 15–87 years).

Shaheen N. Awan, et al

Relation Between Spectral and Cepstral Measures and VHI

The racial demographics of the sample included 76.69% Caucasian; 21.62% African-American; 1.01% Asian; and 0.68% Hispanic. Voice-disordered subjects presented to a physician with a voice-related complaint and had a stroboscopic evaluation. Stroboscopic positive diagnoses (n ¼ 258) included a wide range of signs and symptoms including vocal fold paralysis/paresis (bilateral and unilateral); vocal fold atrophy; tremor; leukoplakia; muscle tension dysphonia; vocal fold lesions including nodules, cysts, polyps, granuloma, and sulcus; vocal fold edema; spasmodic dysphonia; laryngeal papilloma; and acute laryngitis. A group of control subjects (n ¼ 74) were also included. Control (ie, vocally normal) subjects were seen for nonlaryngeal/nonvoice complaints (preoperative evaluations before thyroid or anterior cervical spine surgery, sleep apnea, or nasal or sore throat complaints), had a normal laryngoscopic examination, were deemed by the treating physician to have a perceptually normal voice, and had a VHI score 12. This particular VHI cutoff score was selected because Behrman et al29 had reported that the upper limit of the 95% confidence interval for VHI scores in subjects with healthy voices was 11.5. The proportion of disordered versus control subjects was similar within each male (78.45% disordered vs 21.55% control) versus female (77.31% disordered vs 22.69% control) subgroup. Results of a chi-squared analysis indicated no significant association between group (control vs disordered) and gender (chi-square ¼ 0.01; df ¼ 1; P ¼ 0.92). Obtained data for all subjects included the VHI 30-item score,4 and speech/voice samples included: (1) The second and third sentences of The Rainbow Passage30 and (2) productions of the sustained vowel /ɑ/. Acoustic analysis procedures Sentence and sustained vowel samples were analyzed using the Analysis of Dysphonia in Speech and Voice (ADSV) program (ADSV model 5109 v.3.4.2; KayPENTAX, Montvale, NJ). This program incorporates algorithms described in Awan et al21 and provides several key spectral and cepstral measures of the voice sample along with a graphic display of how these values change over time. The program also incorporates a summary measure referred to as the CSID, a multifactorial estimate of dysphonia severity that correlates with the labeled visual analog scale for severity used in the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V).31 The basic acoustic analysis process is as follows: 1. The speech/voice samples were downsampled to 25 kHz and divided into a series of 1024-point overlapping Hamming windows (75% overlap). For each analysis window/ frame, a 1024-point discrete Fourier transformation (DFT) was computed, converted to the log power spectrum, and followed by a second DFT. This procedure results in the cepstrum. 2. Cepstral averaging aids in smoothing the cepstrum before identification of the cepstral peak.26 Cepstral averaging was carried out over seven frames, followed by 11-bin quefrency averaging, in which each cepstral coefficient

3

(ie, data value observed on the abscissa of the cepstrum) was replaced by the average of the current coefficient with the five previous and five subsequent cepstral coefficients. 3. The aforementioned spectra and cepstra are used in the computation of four key measures: the CPP26,32 and its standard deviation (CPP SD) and the L/H Spectral Ratio and its standard deviation (L/H Spectral Ratio SD). L/H Spectral Ratio, a measure of spectral tilt, was calculated as the ratio of low (4 kHz) frequency spectral energy from the original unsmoothed spectra. From the smoothed cepstral frames, the ratio of the automatically identified cepstral peak to the expected amplitude of the cepstral peak estimated via linear regression was computed (referred to as the CPP). The search for the cepstral peak was restricted to quefrencies of 3.3–16.7 milliseconds (corresponding to the approximate frequency range of 60–300 Hz). In its default mode, the ADSV program removes normalized CPP values 0.70 are ‘‘high’’ to ‘‘very high’’ in terms of effect size magnitude). Summary statistics and male versus female comparisons Tables 1and 2 provide summary statistics (means and SDs) for all acoustic variables, as well as total VHI score and age data (Table 1). Although no significant difference was observed between males and females for total VHI score (mean VHI ¼ 32.68 vs 32.58, respectively; U ¼ 12 485.0,

z ¼ 0.05, P ¼ 0.96), males and females were observed to differ significantly on the majority of acoustic measures obtained from both continuous speech and sustained vowel production. In continuous speech, females were observed to have significantly lower mean CPP F0 SD, CPP, and L/H Ratio than males but (as expected) significantly higher CPP F0. No significant differences were observed in continuous speech samples between males versus females for the CPP SD and CSID measures (Table 2). Correlations between continuous speech measures versus VHI across all subjects A series of Spearman rho (rs) correlations were computed across all subjects between all acoustic variables obtained from continuous speech analyses and total VHI score. Lowto-moderate strength correlations were observed between VHI and CPP (rs ¼ 0.49, P < 0.01), CPP SD (rs ¼ 0.44, P < 0.01), and CSID score (rs ¼ 0.47, P < 0.01). Small but significant correlations were observed between VHI and CPP F0 (rs ¼ 0.20, P < 0.01), L/H Ratio (rs ¼ 0.18, P < 0.01), and L/H Ratio SD (rs ¼ 0.15, P < 0.01). Table 3 provides complete correlation results between VHI and acoustic measures obtained via continuous speech samples computed across all subjects.

TABLE 4. Spearman Rho (rs) Correlations for Spectral and Cepstral Acoustic Variables Obtained From the Sustained Vowel /ɑ/ Versus VHI Total Score Acoustic Variable CPP CPP SD L/H Ratio L/H Ratio SD CPP F0 CPP F0 SD CSID vowel

All Subjects (N ¼ 332), rs

Males (n ¼ 116), rs

Females (n ¼ 216), rs

0.45** 0.26** 0.26** 0.24** 0.09NS 0.39** 0.47**

0.56** 0.12NS 0.41** 0.06NS 0.28** 0.36** 0.48**

0.39** 0.34** 0.15* 0.33** 0.01NS 0.44** 0.45**

Abbreviations: CPP, Cepstral Peak Prominence; CPP F0, estimate of the mean fundamental frequency obtained from the cepstrum; CSID, Cepstral Spectral Index of Dysphonia computed the sustained vowel /a/; L/H Ratio, Low (4 kHz) Spectral Ratio; NS, Nonsignificant; SD, standard deviation. **P < 0.01; *P < 0.05.

6

Journal of Voice, Vol. -, No. -, 2014

TABLE 5. General Correlational Trends for Spectral and Cepstral Acoustic Variables Versus VHI in Continuous Speech and Vowel Contexts for Male Versus Female Subgroups Effect on VHI

Elicited Sample

Gender

CPP CPP SD L/H Ratio L/H Ratio SD CPP F0 CPP F0 SD CSID

As VHI total increases . Rainbow Passage Males Females Vowel

Males Females

Variable decreases;

Variable increases;

No significant relationship.

Notes: Correlation values and significant levels are provided in Tables 3and 4. Abbreviations: CPP, Cepstral Peak Prominence; CPP F0, estimate of the mean fundamental frequency obtained from the cepstrum; CSID, Cepstral Spectral Index of Dysphonia; L/H Ratio, Low (4 kHz) Spectral Ratio; SD, standard deviation.

A similar set of analyses was conducted using acoustic variables obtained from sustained vowel production. Although low-to-moderate strength correlations were again observed between VHI and CPP (rs ¼ 0.45, P < 0.01) and CSID score (rs ¼ 0.47, P < 0.01), the correlation between VHI and CPP SD was somewhat smaller and positive (rs ¼ 0.26, P < 0.01) versus that observed in continuous speech samples. In addition, a lowto-moderate significant relationship between CPP F0 SD and the VHI was observed in sustained vowel analyses that was not observed in continuous speech analyses (rs ¼ 0.39, P < 0.01), whereas a significant correlation was not observed between VHI and CPP F0 during vowel production. In addition, small but statistically significant correlations were observed between VHI and L/H Ratio (rs ¼ 0.26, P < 0.01) and L/H Ratio SD (rs ¼ 0.24, P < 0.01) (Table 4) Correlations between continuous speech measures versus VHI per gender Because means comparison analyses had shown that males and females differed significantly on many of the acoustic variables that were collected from the continuous speech and sustained vowel samples, it was also necessary to compute separate correlational analyses between the various spectral and cepstral acoustic measures and total VHI for each of the male and female subgroups. Within the male subgroup (n ¼ 116), the CPP and CSID (from both continuous speech and sustained vowel samples) were again observed to be low-to-moderately strong correlates of the VHI total score (Tables 3 and 4). However, although the CPP SD was a low-to-moderately strong correlate of the VHI when measured from continuous speech, it was observed to be a nonsignificant correlate when obtained from sustained vowel production (a similar result was also observed for the L/H Ratio SD). In addition, the estimates of vocal F0 and F0 variation (CPP F0 and CPP F0 SD) that had been small or nonsignificant correlates of VHI when computed across all subjects were now observed to be significantly increased in strength and statistically significant when

computed within the male subgroup. A similar result (ie, increased strength of correlation) was also observed for the L/ H ratio when computed within the male subgroup. Within the female subgroup (n ¼ 216), the CPP and CSID (from both continuous speech and sustained vowel samples) were again observed to be low-to-moderately strong correlates of the VHI total score (Tables 3 and 4), although the strength of the CPP versus VHI correlation was somewhat weaker when measured from sustained vowel production (rs ¼ 0.39) versus continuous speech (rs ¼ 0.50). In contrast to the correlations observed in the male subgroup, the L/H Ratio was observed to be a nonsignificant (in continuous speech) or very weak (in sustained vowel production) correlate of VHI. CPP SD was observed to be a significant correlate of VHI when measured from both speech and vowel productions, whereas this measure had only been significant within the male subgroup when measured from continuous speech contexts. CPP F0 was observed to be a much weaker correlate of VHI within the female versus the male subgroup and, interestingly, the CPP F0 SD correlation with VHI was observed to be opposite in direction (positive) in the female subgroup versus that observed in the male subgroup (negative) when measured from continuous speech samples. Table 5 provides a general overview of the acoustic versus VHI correlational trends for male versus female subgroups. DISCUSSION The results of this study show that two measures obtained from spectral/cepstral analyses (the CPP and the multivariate CSID) consistently correlate with the total VHI score, regardless of whether (1) calculated across all subjects or within male versus female subgroups or (2) whether calculated from continuous speech samples or sustained vowel productions. Several other measures obtained from spectral or cepstral analyses also were observed to correlate with total VHI, although increased variability in the strength, direction, and overall significance of these other variables was observed depending on gender

Shaheen N. Awan, et al

Relation Between Spectral and Cepstral Measures and VHI

and elicited context. Across all male and female subjects, the strongest correlations between acoustic measures and VHI were observed for the CPP and the CSID in both speech and vowel contexts and for the CPP SD from continuous speech. The low-to-moderate strength correlations observed for these measures (rs’s ranging from 0.45 to 0.49 for VHI vs CPP; 0.47 for VHI vs CSID; 0.44 for VHI vs CPP SD; Tables 3and 4) are similar to the r ¼ 0.49 correlation between VHI and jitter reported by Schindler et al19 and somewhat greater than the significant correlations ranging from r ¼ 0.31 to 0.43 reported between total VHI-10 score and measures of irregularity (based on measures of jitter, shimmer, and medial period correlation) and noise (determined via the glottal-to-noise excitation ratio) obtained from sustained vowel productions in Hanschmann et al.16 Correlations between VHI and L/H Ratio and L/H Ratio SD were also observed to be significant across all subjects in both vowel and speech contexts but fairly weak (rss ranging from 0.26 to 0.15; Tables 3and 4) and similar in strength to the significant correlation of r ¼ 0.23 for VHI versus normalized noise energy reported in Pribuisiene et al11 and significant r’s from 0.19 to 0.26 reported by NiebudekBogusz et al14 between total VHI and several frequency and amplitude perturbation measures. Although stronger correlations between VHI and acoustic or acoustic-based measures have been reported in the literature by Fulljames and Harris17 (r ¼ 0.74 between total VHI and F0 range in speech and r ¼ 0.83 between the VHI-Physical and shimmer), Schindler et al19 (r’s ¼ 0.73 and 0.69 correlations between total VHI and shimmer and HNR, respectively), and Henry et al18 (r ¼ 0.62 between total VHI and the DSI at 1–4 weeks post-thyroidectomy), differences in results between these aforementioned studies and the present study may be attributed not only to differences in acoustic measures but also to substantial differences in sample sizes (eg, Fulljames and Harris17 reported data from 10 females subjects vs the 332 subjects in the present study) and examination of subjects across a variety of disorder types (as in the present study) versus focused disordered groups (eg, Schindler et al19 strongest correlations were obtained in 18 subjects with vocal nodules, and Henry et al18 strongest correlation was observed in early postthyroidectomy patients). The strongest acoustic correlate of the VHI (whether across all subjects or within gender groups) was the CPP. The CPP is a measure of the relative amplitude of the dominant harmonic (in most cases, the F0). Various studies have reported that the CPP is a robust indicator of dysphonia and a strong correlate of dysphonia severity.20–22,25,26,28,32 As the CPP decreases with increased levels of dysphonia, the VHI total score was observed to increase (inverse correlation). Although the strength of the relationship between CPP and total VHI was quite consistent across males and females in speech contexts (r’s ¼ 0.49 and 0.50, respectively; Table 3), there were substantial differences in strength of correlation between these parameters in males versus females in sustained vowel contexts (rs’s ¼ 0.56 and 0.39, respectively; Table 4). Depending on the severity and type of dysphonia, it is certainly possible that some speakers may have differential levels of impairment

7

in speech versus vowel contexts,35 and differences in the strength of association between the total VHI and measures of the CPP by gender in this study emphasize the importance of documenting voice characteristics in a variety of contexts. Because the CPP is the strongest contributor to the multivariate CSID formulae,21,23 the CPP and CSID are highly related, and therefore, the CSID was also observed as one of the stronger correlates of VHI score in this study. Because the CSID was developed as a multivariate acoustic correlate of a 100-mm visual analog scale of severity as used in perceptual evaluation (eg, the CAPE-V), the CSID is observed to correlate directly with VHI such that increases in VHI score were associated with increases in the CSID. Slight differences in the strength of correlation with VHI between CPP and CSID observed in this study may be due to the fact that the multivariate CSID combines stronger VHI correlates (eg, CPP) with weaker (or nonsignificant) correlates (eg, L/H Ratio SD) in a common formula and, therefore, may have its overall correlational strength with VHI somewhat weakened. The CPP SD is a measure of the average variability in CPP amplitude over time (ie, a measure of the average variability of the CPP in relation to the amplitude [dB] ordinate). During vowel production, the CPP SD is expected to be very low in typical voice production, indicative of consistency and steadiness in pitch/F0, loudness/intensity, and vocal quality. Therefore, it may be expected that, as VHI increases, the CPP SD from vowel production would also increase. Although this type of correlation was observed in female speakers, the correlation between CPP SD and VHI was nonsignificant in male vowel production, indicating that male speakers may not be as concerned by voice disturbances that are characterized by the presence of increased variability in the relative amplitude of the dominant harmonic (CPP). In sentence production, the CPP SD has been reported to be increased in typical speech (perhaps indicative of typical vocal capability to produce a wide range of variation from voiced to unvoiced productions, and vice versa) and to vary with the pitch/F0 and loudness/intensity of the voice during prosodic variation. In cases of dysphonia, the CPP SD in speech contexts has been observed to decrease with increased levels of dysphonia21,36 and, in the present study, decrease with increased VHI in both males and females. The L/H Ratio (spectral energy < 4 kHz vs spectral energy > 4 kHz) is a measure of spectral tilt and is responsive to the highfrequency additive noise observed in breathy voices.21,22,26,32 As mentioned, a decrease in the L/H Ratio is indicative of increased noise in the higher frequency region of the spectrum and/or reduced energy in the lower frequency region of the spectrum in the vicinity of the F0. This measure was observed to decrease with increases in VHI in both males and female in vowel production, but this relationship was only observed in males for continuous speech samples. It may be that breathiness was either not a consistent quality deviation in the female subgroup or, if present, was not as troubling an influence on patient self-reported handicap as observed in the male speakers. As with the CPP SD, the L/H Ratio SD may be expected to be relatively increased in sustained

8 vowel production but decreased in speech samples elicited from dysphonic speakers. These correlational trends were observed between total VHI and L/H Ratio SD for both males and females in continuous speech but only for females in sustained vowel samples. As with CPP SD, this measure was not a significant correlate with VHI in males and may indicate that (1) the dysphonic male voices in this study were relatively consistent in terms of spectral energy distribution in the vowel context or (2) any changes in voice characteristics related to this acoustic parameter were not perceived at a level that affected the selfrating of voice handicap. The CPP F0 is an estimate of the mean F0 of the voice calculated by computing the inverse of the quefrency value (in time) of the identified cepstral peak. Results showed positive correlations between CPP F0 and VHI for both males and females in continuous speech but only for males in sustained vowel production. It may be that increases in vocal F0 are less disconcerting to female speakers (the correlation with VHI was nonsignificant in vowels and quite weak as compared with males for sentence production) than males and, therefore, does not relate as well to their overall handicap index. In contrast, lower vocal F0s in males have been documented as being socially acceptable and beneficial37–39; therefore, increased F0 (as may occur with certain types of dysphonia) may result in increased distress and increased VHI scores in male speakers. As with the CPP SD and L/H Ratio SD, the CPP F0 SD (a measure of the average variability of the CPP in relation to the quefrency abscissa) is typically observed to be quite low during sustained vowel production (ie, reflecting consistency of F0 control) versus increased in continuous speech (ie, reflecting increased variability in vocal characteristics due to variations in intonation) in typical speakers. Therefore, as expected, increases in VHI were associated with increased CPP F0 SD in both males and females in sustained vowel production. However, the expected correlational trend for a decrease in CPP F0 SD during speech with increased VHI was only observed in male speakers, whereas female speakers showed the opposite correlational trend (increased CPP F0 SD with increased VHI in speech context). The CPP F0 is determined from the cepstral peak (ie, the highest amplitude peak identified within a 60– 300 Hz region). Because voice signals obtained from speakers with roughness may have the dominant cepstral peak occurring at a subharmonic, this can result in increased variability in F0 tracking (particularly if the rough quality is intermittent). It may be that the female dysphonic voice samples in this study were characterized by this type of intermittent rough voice quality that may have resulted in increased CPP F0 variability with increased voice handicap. Future studies that focus on the effect of particular voice quality deviations of estimates of F0 and F0 variability will be useful in interpreting results such as that observed in this study. As discussed, numerous significant correlations were observed between the spectral and cepstral measures (in both speech and vowel contexts) and the total VHI score across all subjects and in male versus female subgroups. However, most of the significant correlations observed tended to be in the low-to-moderate range, with the CPP and CSID most consistently observed as having stronger correlations with VHI. These

Journal of Voice, Vol. -, No. -, 2014

types of low-to-moderate strength correlations with VHI are not entirely unexpected given that the ICIDH-2 framework asserts that severity of impairment (as assessed in this study by various spectral- and cepstral-based measures) does not necessarily strongly predict the quality of life (as measured by the VHI). Based on the strongest correlations reported in this study (rs’s in the z0.45–0.50 range), voice characteristics described via acoustic measures only account for z20–25% of the variance in ranks (rs2)40 shared with quality of life measures such as the VHI. Although 20–25% of shared variance accounted for is not trivial, these results show that the quality of life impact of a dysphonic voice will be inadequately captured by acoustic analysis methods alone because measures such as the VHI and voice laboratory measures provide relatively independent information regarding the patient’s voice impairment versus disablement status.9,11–13,16,17 In a similar manner, listener ratings may also be expected to be relatively weak correlates of VHI scores.41–43 Ma and Yiu2 have concluded that, ‘‘quantification of dysphonic severity using acoustic and perceptual measurements does not necessarily reflect the impact of voice disorders on an individual .. This points to the need for assessing voice disorders from a functional perspective’’ (p. 522). Overall, patient-centered measurement of voice associated quality of life, listener perceptual voice assessment, and instrumental voice laboratory measures should be viewed as complementary measures that provide valuable contributions to voice assessment.11–13,16,18,41 CONCLUSION The results of this study revealed that low-to-moderate strength correlations exist between measures obtained from the cepstrum and the spectrum in both speech and vowel contexts and the total VHI score. However, the strength and direction of these correlations may vary depending on the elicited context (continuous speech vs vowel production) and subject gender. Although it is true that both acoustic analysis and self-report methodologies may be sensitive metrics of voice impairment and disability, the linear relation between these two approaches does not appear to be strong.10 Understanding this nonlinear relationship between impairment-level measures and quality of life seems essential in clinical practice. For instance, relatively small acoustic changes (either negative or positive) in voice can create significant effects on quality of life depending on personal and environmental contextual factors. Disablement occurs in a context, and impairment-level measures like acoustics, aerodynamics, and listener ratings of dysphonia severity do not consider the context of the person who suffers with the voice disorder. Because all impairment-level measures of phonatory function are by their nature decontextualized, it is highly unlikely that they will ever strongly correlate with quality of life measures like the VHI. The results of this study clearly support this viewpoint. Instead, it appears that impairment-level measures (such as the spectral and cepstral measures included in this study) and VHI appear to provide relatively unique, meaningful, and complementary information. These measures inform the clinician regarding different aspects of a voice disorder that reflect its multidimensional

Shaheen N. Awan, et al

Relation Between Spectral and Cepstral Measures and VHI

nature. Thus, a multidimensional assessment of voice and voice treatment outcomes is necessary and should use both impairment-level and quality of life measures to be valid.2 Acknowledgments Disclosure: Dr S.N.A. is a consultant to KayPENTAX (Montvale, NJ) for the development of commercial computer software including cepstral analysis of continuous speech algorithms. KayPENTAX licenses the algorithms that form the basis of the Analysis of Dysphonia in Speech and Voice (ADSV) program from Dr S.N.A. REFERENCES 1. International Classification of Impairments, Activities, and Participation (ICIDH-2). A Manual of Dimensions of Disablement and Functioning. Beta-1 Draft for Field Trials. Geneva, Switzerland: World Health Organization; 1997. 2. Ma EP, Yiu EM. Voice activity and participation profile: assessing the impact of voice disorders on daily activities. J Speech Lang Hear Res. 2001;44:511–524. Available at: http://www.ncbi.nlm.nih.gov/pubmed/ 11407557. Accessed August 7, 2013. 3. Roy N, Barkmeier-Kraemer J, Eadie T, et al. Evidence-based clinical voice assessment: a systematic review. Am J Speech Lang Pathol. 2013;22: 212–226. Available at: http://ajslp.asha.org/cgi/content/abstract/22/2/212. Accessed August 7, 2013. 4. Jacobson B, Johnson A, Grywalski C, et al. The voice handicap index (VHI): development and validation. Am J Speech Lang Pathol. 1997;6: 66–70. Available at: http://ajslp.asha.org/cgi/content/abstract/6/3/66. Accessed February 13, 2013. 5. Rosen CA, Lee AS, Osborne J, Zullo T, Murry T. Development and validation of the voice handicap index-10. Laryngoscope. 2004;114:1549–1556. Available at: http://www.ncbi.nlm.nih.gov/pubmed/15475780. Accessed August 7, 2013. 6. Hogikyan ND, Sethuraman G. Validation of an instrument to measure voice-related quality of life (V-RQOL). J Voice. 1999;13:557–569. Available at: http://www.ncbi.nlm.nih.gov/pubmed/10622521. Accessed August 7, 2013. 7. Deary IJ, Wilson JA, Carding PN, MacKenzie K. VoiSS: a patient-derived Voice Symptom Scale. J Psychosom Res. 2003;54:483–489. Available at: http://www.ncbi.nlm.nih.gov/pubmed/12726906. Accessed August 7, 2013. 8. Branski RC, Cukier-Blaj S, Pusic A, et al. Measuring quality of life in dysphonic patients: a systematic review of content development in patient-reported outcomes measures. J Voice. 2010;24:193–198. Available at: http://www.jvoice.org/article/S0892-1997(08http://www. jvoice.org/article/S0892-1997(08)00076-3/abstract. Accessed August 7, 2013. 9. Hsiung MW, Pai L, Wang HW. Correlation between voice handicap index and voice laboratory measurements in dysphonic patients. Eur Arch Otorhinolaryngol. 2002;259:97–99. Available at: http://www.ncbi.nlm.nih.gov/ pubmed/11954941. Accessed August 7, 2013. 10. Wheeler KM, Collins SP, Sapienza CM. The relationship between VHI scores and specific acoustic measures of mildly disordered voice production. J Voice. 2006;20:308–317. Available at: http://www.ncbi.nlm.nih. gov/pubmed/16126368. Accessed August 8, 2013. 11. Pribuisiene R, Uloza V, Kupcinskas L, Jonaitis L. Perceptual and acoustic characteristics of voice changes in reflux laryngitis patients. J Voice. 2006; 20:128–136. Available at: http://www.ncbi.nlm.nih.gov/pubmed/1592 5484. Accessed August 7, 2013. 12. Woisard V, Bodin S, Yardeni E, Puech M. The voice handicap index: correlation between subjective patient response and quantitative assessment of voice. J Voice. 2007;21:623–631. Available at: http://www.ncbi.nlm. nih.gov/pubmed/16887329. Accessed April 11, 2012. 13. Cheng J, Woo P. Correlation between the Voice Handicap Index and voice laboratory measurements after phonosurgery. Ear Nose Throat J. 2010;

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26.

27.

28.

29.

9

89:183–188. Available at: http://www.ncbi.nlm.nih.gov/pubmed/ 20397148. Niebudek-Bogusz E, Woznicka E, Zamyslowska-Szmytke E, SliwinskaKowalska M. Correlation between acoustic parameters and Voice Handicap Index in dysphonic teachers. Folia Phoniatr Logop. 2010;62(1–2):55–60. Available at: http://www.ncbi.nlm.nih.gov/pubmed/20093845. Accessed March 22, 2013. Pavlidou E, Printza A, Hirani SP, Triaridis S, Vital V, Epstein R. Multidimensional evaluation of voice via subjective, acoustic and electroglottographic analyses in patients with LPR. Otorhinolaryngol Head Neck Surg. 2010;40:25–32. Hanschmann H, Lohmann A, Berger R. Comparison of subjective assessment of voice disorders and objective voice measurement. Folia Phoniatr Logop. 2011;63:83–87. Available at: http://www.ncbi.nlm.nih.gov/ pubmed/20938187. Accessed August 8, 2013. Fulljames N, Harris S. Voice outcome measures: correlations with patients’ assessment of their condition and the effectiveness of voice therapy. Logoped Phoniatr Vocol. 2006;31:23–35. Available at: http://www.ncbi. nlm.nih.gov/pubmed/16531288. Accessed August 8, 2013. Henry LR, Helou LB, Solomon NP, et al. Functional voice outcomes after thyroidectomy: an assessment of the Dsyphonia Severity Index (DSI) after thyroidectomy. Surgery. 2010;147:861–870. Available at: http://www.ncbi. nlm.nih.gov/pubmed/20096434. Accessed August 7, 2013. Schindler A, Mozzanica F, Vedrody M, Maruzzi P, Ottaviani F. Correlation between the Voice Handicap Index and voice measurements in four groups of patients with dysphonia. Otolaryngol Head Neck Surg. 2009;141: 762–769. Available at: http://www.ncbi.nlm.nih.gov/pubmed/19932851. Accessed August 7, 2013. Maryn Y, Roy N, De Bodt M, Van Cauwenberge P, Corthals P. Acoustic measurement of overall voice quality: a meta-analysis. J Acoust Soc Am. 2009;126:2619–2634. Available at: http://www.ncbi.nlm.nih.gov/pubmed/ 19894840. Accessed January 30, 2012. Awan S, Roy N, Jette M, Meltzner G, Hillman R. Quantifying dysphonia severity using a spectral/cepstral-based acoustic index: comparisons with auditory-perceptual judgements from the CAPE-V. Clin Linguist Phon. 2010;24:742–758. Available at: http://www.ncbi.nlm.nih.gov/pubmed/ 20687828. Accessed August 2, 2013. Awan S, Roy N. Acoustic prediction of voice type in women with functional dysphonia. J Voice. 2005;19:268–282. Available at: http://www. ncbi.nlm.nih.gov/pubmed/15907441. Accessed April 17, 2012. Awan S, Roy N, Dromey C. Estimating dysphonia severity in continuous speech: application of a multi-parameter spectral/cepstral model. Clin Linguist Phon. 2009;23:825–841. Available at: http://www.ncbi.nlm.nih. gov/pubmed/19891523. Accessed February 9, 2012. Peterson EA, Roy N, Awan SN, Merrill RM, Banks R, Tanner K. Toward validation of the cepstral spectral index of dysphonia (CSID) as an objective treatment outcomes measure. J Voice. 2013;27:401–410. Available at: http:// www.ncbi.nlm.nih.gov/pubmed/23809565. Accessed August 7, 2013. Awan S, Solomon N, Helou L, Stojadinovic A. Spectral-cepstral estimation of dysphonia severity: external validation. Ann Otol Rhinol Laryngol. 2013; 122:40–48. Hillenbrand J, Houde R. Acoustic correlates of breathy vocal quality: dysphonic voices and continuous speech. J Speech Hear Res. 1996;39:311–321. Available at: http://www.ncbi.nlm.nih.gov/pubmed/8729919. Accessed April 3, 2012. Heman-Ackah YD, Michael DD, Goding GS. The relationship between cepstral peak prominence and selected parameters of dysphonia. J Voice. 2002;16:20–27. Available at: http://www.ncbi.nlm.nih.gov/pubmed/ 12008652. Accessed April 3, 2012. Awan S, Roy N. Toward the development of an objective index of dysphonia severity: a four-factor acoustic model. Clin Linguist Phon. 2006;20:35–49. Available at: http://www.ncbi.nlm.nih.gov/pubmed/ 16393797. Accessed April 3, 2012. Behrman A, Rutledge J, Hembree A, Sheridan S. Vocal hygiene education, voice production therapy, and the role of patient adherence: a treatment effectiveness study in women with phonotrauma. J Speech Lang Hear Res. 2008;51:350–366. Available at: http://jslhr.asha.org/cgi/content/ abstract/51/2/350. Accessed April 7, 2013.

10 30. Fairbanks G. Voice and Articulation Drillbook. 2nd ed. New York, NY: Harper & Row; 1960. 31. Kempster GB, Gerratt BR, Verdolini Abbott K, Barkmeier-Kraemer J, Hillman RE. Consensus auditory-perceptual evaluation of voice: development of a standardized clinical protocol. Am J Speech Lang Pathol. 2009; 18:124–132. Available at: http://www.ncbi.nlm.nih.gov/pubmed/18930908. Accessed February 9, 2012. 32. Hillenbrand J, Cleveland R, Erickson RL. Acoustic correlates of breathy vocal quality. J Speech Hear Res. 1994;37:769–778. Available at: http:// www.ncbi.nlm.nih.gov/pubmed/7967562. Accessed April 3, 2012. 33. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: L. Erlbaum Associates; 1988. Available at: http://www. worldcat.org/title/statistical-power-analysis-for-the-behavioral-sciences/ oclc/17877467. Accessed March 19, 2012. 34. Hinkle D, Wiersma W, Jurs S. Applied Statistics for the Behavioral Sciences. Chicago, IL: Rand McNally College Publishing; 1979. 35. Roy N, Gouse M, Mauszycki SC, Merrill RM, Smith ME. Task specificity in adductor spasmodic dysphonia versus muscle tension dysphonia. Laryngoscope. 2005;115:311–316. Available at: http://www.ncbi.nlm.nih.gov/ pubmed/15689757. Accessed February 9, 2012. 36. Watts CR, Awan SN. Use of spectral/cepstral analyses for differentiating normal from hypofunctional voices in sustained vowel. J Speech Lang Hear Res. 2011;54:1525–1538. 37. Re D, O’Connor J, Bennett P, Feinberg D. Preferences for very low and very high voice pitch in humans. PLoS One. 2012;7:1–8. Available at:

Journal of Voice, Vol. -, No. -, 2014

38.

39.

40. 41.

42.

43.

http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid¼3293852 &tool¼pmcentrez&rendertype¼abstract. Accessed September 22, 2013. Collins S. Men’s voices and women’s choices. Anim Behav. 2000;60: 773–780. Available at: http://www.ncbi.nlm.nih.gov/pubmed/11124875. Accessed October 2, 2013. Klofstad C, Anderson R, Peters S. Sounds like a winner: voice pitch influences perception of leadership capacity in both men and women. Proc Biol Sci. 2012; 279:2698–2704. Available at: http://www.pubmedcentral.nih.gov/ articlerender.fcgi?artid¼3350713&tool¼pmcentrez&rendertype¼abstract. Accessed September 28, 2013. Field A. Discovering Statistics Using SPSS. 3rd ed. London, UK: Sage Publications Ltd; 2009. Murry T, Medrado R, Hogikyan ND, Aviv JE. The relationship between ratings of voice quality and quality of life measures. J Voice. 2004;18: 183–192. Available at: http://www.ncbi.nlm.nih.gov/pubmed/15193651. Accessed August 7, 2013. Webb AL, Carding PN, Deary IJ, MacKenzie K, Steen IN, Wilson JA. Optimising outcome assessment of voice interventions, I: reliability and validity of three self-reported scales. J Laryngol Otol. 2007;121: 763–767. Available at: http://www.ncbi.nlm.nih.gov/pubmed/17391574. Accessed August 7, 2013. Eadie TL, Kapsner M, Rosenzweig J, Waugh P, Hillel A, Merati A. The role of experience on judgments of dysphonia. J Voice. 2010;24:564–573. Available at: http://www.ncbi.nlm.nih.gov/pubmed/19765949. Accessed August 7, 2013.

Exploring the relationship between spectral and cepstral measures of voice and the Voice Handicap Index (VHI).

The purpose of this study was to examine the strength of relationship between impairment-level acoustic measures derived from spectral- and cepstral-b...
2MB Sizes 0 Downloads 5 Views