The Effect of Experience on Perceptual Spaces When Judging Synthesized Voice Quality: A Multidimensional Scaling Study *Jessica Sofranko Kisenwether and †Robert A. Prosek, *Albany, New York, and yState College, Pennsylvania Summary: Objectives/Hypothesis. The purpose of this study was to determine the effect of experience on the perceptual space of listeners when judging voice quality. Study Design. This was a within-subjects group design. Method. Speech-language pathologists, singing voice teachers, speech-language pathology graduate students with and without experience with a voice client, graduate students who have completed a voice pedagogy course, and inexperienced served as listeners. Each participant rated the similarity of pairs of synthesized stimuli with systematically altered measurements of jitter, shimmer, and noise-to-harmonics ratio on a visual analog scale ranging from no similarity to extremely similar. Results. Results showed that participants with different levels and types of experience used different perceptual spaces (of additive noise and perturbation measures) when judging the similarity of stimulus pairs. Conclusion. The conclusion was that perceptual spaces differ among individuals with different levels and types of experience when judging the similarity of pairs of stimuli with systematically altered acoustical measurements. Key Words: Voice perception–Experienced listener–Multidimensional scaling–Acoustical measures–Synthesized stimuli.

INTRODUCTION A recent study showed a difference in judgment of voice quality among individuals with different types of experience.1 Individuals with a singing background, individuals with a speechlanguage pathology background, and inexperienced individuals were asked to judge a synthesized sustained vowel on a visual analog scale (VAS) ranging from mild to severe for overall severity, roughness, breathiness, strain, and pitch. Overall, individuals with a singing voice background rated signals more severely than individuals with a speech-language pathology background. Inexperienced listeners (IEs) did not follow a consistent pattern. The authors contributed these results to a possible effect of type of experience on listener judgment; however, due to moderate agreement levels during this task, the authors concluded that the use of specific voice quality terms may have biased the scale of measurement which forced the listeners to make a unidimensional judgment of voice quality. Frequent questions of reliability with voice quality perceptions may be due to the multidimensional nature of voice stimuli. Listeners’ judging/rating voice qualities are often using more than one parameter throughout their classification/rating tasks.2–5 For instance, common perceptions of breathiness and roughness are used during voice quality rating.6 Also, speech-language pathology graduate students’ ratings of breathiness, hoarseness, and nasality encompassed many diAccepted for publication January 20, 2014. From the *Communication Sciences and Disorders, The College of St. Rose, Albany, New York; and the yCommunication Sciences and Disorders, The Pennsylvania State University, State College, Pennsylvania. Address correspondence and reprint requests to Jessica Sofranko Kisenwether, Communication Sciences and Disorders, The College of St. Rose, 432 Western Ave., Albany, NY 12203. E-mail: [email protected] Journal of Voice, Vol. 28, No. 5, pp. 548-553 0892-1997/$36.00 Ó 2014 The Voice Foundation http://dx.doi.org/10.1016/j.jvoice.2014.01.012

mensions of the signal including: airflow, glottal periodicity, noise, and second formant frequency rise/fall, accounting for 48% of the variance during rating tasks.5 These multiple dimensions can include but are not limited to intensity, noise-toharmonics ratio (NHR), fundamental frequency, jitter, shimmer, and so forth.2–5 Because listeners will focus on multiple aspects of each signal to make a perceptual judgment of voice quality, some researchers feel that continuous scales are more suitable for rating.7–10 Equal-appearing interval scales force a listener to make a unidimensional judgment on a multidimensional signal,10 impacting listener agreement. In fact, even a training session did not help listeners obtain an agreement greater than a 0.80 when using a seven-point equal-appearing interval (EAI) scale for a variety of vocal qualities.2 Through the use of multidimensional scaling (MDS), listeners are only asked to rate the similarity between a pair of stimuli, minimizing individual bias.11 This allows the researcher to explore the dimensions within the acoustical signal used to make judgments.6,11 The difficulty in choosing an appropriate rating scale is eliminated as dimensions are determined by the stimuli and not the scale. INDSCAL,12 or individual differences scaling, is often used because it extracts the dimensions which represent the underlying judgments made by each participant as well as the judgments made by each group of listeners.5,13 The results reveal the perceptual space being used for judgment of voice quality by each group of listeners. This space is a visual representation of the differences in domain and range for those perceptions. Although continuous scales have been found to have better agreement, researchers continue to use EAI scales during the MDS task.5,6,10,13–16 In turn, ratings may be skewed or participants may not be using the entire scale during perceptual tasks. In summary, speech-language pathology graduate students, or groups of listeners with mixed levels and types

Jessica Sofranko Kisenwether and Robert A. Prosek

of experience, are often asked to make unidimensional judgments on a multidimensional signal. Research has shown that experience can affect judgments of voice quality.1,17 Results are then correlated to obtain acoustical measures to determine possible relationships among subjective and objective measures of voice quality. As a result, not only do results vary in regard to perceptual judgments of voice but research regarding objective measures related to these perceptions also remains contradictory. Research suggests that listener agreement for rating tasks may be weak due to listeners making single-dimension judgments for voice quality, such as classification tasks on a complex signal that contains multiple dimensions.18 Speech-pathology graduate students used three dimensions consistently when judging voice quality: fundamental frequency, intensity, and perturbation.3 However, researchers stated that listeners may also attend to other properties of the acoustic signal resulting in varied limited agreement. For example, listeners’ correlations have been found to vary from 0.33 to 0.78 when rating breathy voice quality and varied from 0.17 to 0.73 when rating hoarseness.19 Also, interjudge agreement for ratings of hoarseness was 0.51 and interjudge agreement for ratings of breathiness was 0.55. Judges were said to have experience in voice disorders; however, speech-language pathology graduate students also participated in rating voice qualities indicating a significant difference in experience among judges. In addition, listeners’ correlations, when judging roughness, also had a large variation across groups of listeners with and without experience in voice and/or voice disorders, resulting in a lack of a significant difference between groups.20 Experienced listeners included speech-language pathologists (SLPs) with at least 2 years of postgraduate experience in the area of voice and four otolaryngologists, which again demonstrates a difference in experience. As discussed earlier and shown in the literature,1 IEs have been shown to demonstrate a difference in judgment of voice quality as compared with experienced listeners. In summary, rating scale and group selection should be carefully selected for perceptual voice studies. Although there have been many studies in the area of perceptions of voice quality, variables such as the use of anchors, type of rating scale, type and length of stimuli, and level and type of experience can affect perceptual judgments of voice quality and then, in turn, affect correlations with acoustical measurements of voice. Despite our knowledge of these factors affecting perceptions of voice quality, there are very few studies controlling for all the above variables simultaneously. Most importantly, there are very few studies that address the differences between experienced and IEs for perceptions of voice quality. In addition, generalizations are made for appropriate rating scales and correlations among acoustical measurements and perceptions through the common use of speech-language pathology graduate students as judges. One must determine the differences between experienced and IEs, if this factor has been found to affect internal standards, before generalizing results. A consistent lack of control for variables affecting perceptions may be the reason for frequent disagreement among authors in the literature. In turn, careful group selection may yield different results for correlations among

Effect of Experience on Perceptual Spaces

549

acoustical components of the signal and perceptions using a MDS task to eliminate bias. The purpose of this study was to determine the perceptual space being used across groups with different levels and types of experience when judging synthesized sustained vowels using a MDS task so as to remove listener bias. METHODS Stimuli The same stimuli used in Sofranko and Prosek1 were used for this study. One sample of sustained vowel /ɑ/ with normal voice quality obtained from a female, aged 23 years, was synthesized using the UCLA synthesizer.21 This sample, originally recorded at the University of Utah, was chosen because of its widespread use in other studies as an anchor to control for internal standards. Also, the sample was judged to be ‘‘normal’’ by SLPs who have experience in the area of voice and voice disorders on the basis of quality, pitch, and loudness.22–24 Using the UCLA Voice synthesizer,21 this voice sample was synthesized with a duration of 1 second and a constant fundamental frequency and amplitude. The newly synthesized file was systematically altered by changing measurements of jitter, shimmer, and NHR. Jitter was altered in increments of 0.75 microseconds (0–3 microseconds) and shimmer was altered in increments of 0.5 dB (0–2 dB) for a total of 25 variations. NHR was altered in evenly spaced intervals of 12.5 dB resulting in five stimuli (50 to 0 dB). This resulted in 435 pairs of stimuli to be presented during the study. Listeners The same listeners used in Sofranko and Prosek1 were used for this study. There were six groups with 10 listeners in each group (n ¼ 60). Groups consisted of SLPs, singing voice teachers (SVTs), speech-language pathology graduate students who had completed a voice disorders course and had not had a voice client (SLPGRADs), speech-language pathology graduate students who had completed a voice disorders course and had treated one or more voice clients (SLPGRADVs), graduate students in the music department who had completed a voice pedagogy course (SVTGRADs), and IEs. Group 1 consisted of seven females and three males who were American Speech Language Hearing Association certified and state licensed SLPs. Ages ranged from 29 to 67 years (M ¼ 45.7, standard deviation [SD] ¼ 12.92). They had a range of 5–35 years of experience in voice disorders (M ¼ 19, SD ¼ 11.01) and spent 10–40 hours per week treating voice disorders (M ¼ 23.4, SD ¼ 12.21). All participants in group 1 reported no history of a hearing loss, a language disorder, a speech impairment, and/or a neurologic disorder. Group 2 consisted of eight females and two males, ages ranging from 48 to 69 years (M ¼ 59.6, SD ¼ 6) who were tenured singing voice faculty and full members of the National Association of Teachers of Singing (NATS). Individuals holding a full membership in NATS, with either a Master’s Degree or Doctor of Musical Arts, teach an average of six or more singing voice students weekly, and have 2 years of experience.25 The criterion of tenure implies at least 6 years of full-

550 time faculty work in which the individual mentors undergraduate and graduate students throughout their academic degree of study. All participants in group 2 reported no history of a hearing loss, a language disorder, a speech impairment, and/or a neurologic disorder. Group 3 consisted of 10 females, ages ranging from 21 to 24 years (M ¼ 22, SD ¼ 0.943), who were current graduate students in a speech-language pathology program and had completed a voice disorders course. Group 4, although similar, consisted of 10 females, ages ranging from 21 to 42 years (M ¼ 26.1, SD ¼ 6.33), who were also current graduate students in a speech-language pathology program, had completed a voice disorders course, but these students had also had one or more voice client(s) in clinic. Students had a range of one to eight voice clients in their clinical experience (M ¼ 2.5, SD ¼ 2.321). All participants in groups 3 and 4 reported no history of a hearing loss, a language disorder, a speech impairment, and/or a neurologic disorder. Group 5 consisted of six females and four males with ages ranging from 22 to 46 years (M ¼ 27.9, SD ¼ 7.4). These individuals were current graduate students in either voice pedagogy or vocal performance who had completed a voice pedagogy course in their graduate work. The students taught a range of 1–20 singing voice students weekly (M ¼ 5.6, SD ¼ 5.48). All participants in group 5 reported no history of a hearing loss, a language disorder, a speech impairment, and/or a neurologic disorder. Finally, group 6 consisted of five females and five males, ages ranging from 24 to 56 years (M ¼ 35, SD ¼ 12.18), with no previous training in voice and/or voice disorders, including singing lessons and voice treatment. This group included individuals from various backgrounds including: nursing, real estate, chemistry, culinary arts, fashion, architecture, cosmetology, engineering (mechanical and electrical), and law. All participants in group 6 reported no history of a hearing loss, a language disorder, a speech impairment, and/or a neurologic disorder. Procedures Approval from the institutional review board at The Pennsylvania State University was obtained before running participants. INDSCAL12 was used to examine dimensions within the signal, used to make voice quality judgments. Listeners were randomly presented with pairs (n ¼ 435) of the generated stimuli auditorily via noise-reduction headphones. All intraorder pairs were not included (ie, AB and BA) as research has shown that order does not influence the judgments of listeners.26,27 Detailed instructions were provided before the experiment. Participants were asked to rate the similarity between voice samples on a VAS, ranging from no similarity to extremely similar, covering a range from 1 to 1000. Participants were not permitted to replay samples during this task and there was no orientation task or training session before the experiment, following an experimental procedure for MDS in the area of voice perception.6,15,28 The stimuli were presented and the results were collected using Alvin2.29 Playback level was adjusted to a comfortable level as judged by the participant.

Journal of Voice, Vol. 28, No. 5, 2014

FIGURE 1. Scree plot. The amount of variance accounted for per dimension.

RESULTS INDSCAL generated a group space calculated by the angle of separation between vectors for individual responses. In this experiment, results were based completely on perceptions of each group. INDSCAL was chosen from the ALSCAL algorithm in SPSS (version 17; SPSS Inc., Chicago, IL). INDSCAL solutions were obtained in two through six dimensions. The values of stress and the amount of variance accounted for each solution are shown in Figure 1. Based on these results, a three dimensional solution, accounting for 92.67% of the variance, was chosen because the increase in variance accounted for by adding a fourth dimension was very small. Dimension 1 (D1) accounted for 42.57% of the variance, dimension 2 (D2) accounted for 39.61% of the variance, and dimension 3 (D3) accounted for 10.47% of the variance. Iterations were stopped at four because the S-stress improvement was less than 0.001 (Table 1). Individual stress levels and R2 values for each group matrix are outlined in Table 2. Dimensions were determined by calculating Pearson r correlations, tested against zero at an alpha level of .01, between stimulus coordinates and acoustical measures. A relationship was considered significant at an alpha level of .01 and when the correlation accounted for 50% of the variance. D1 was correlated with jitter (r ¼ 0.705, P < 0.00) and shimmer (r ¼ 0.708, P < 0.00). These are both perturbation measures, indicating variability in the signal, and were the bases of listener perceptions for D1. D2 correlated with NHR (r ¼ 0.701, P < 0.00), Pearson r at autocorrelation peak (RPK) (r ¼ 0.706, P < 0.00), and vocal turbulence index (VTI) (r ¼ 0.738, P < 0.00). These are all additive noise measures indicating that strength of voicing

TABLE 1. S-Stress Improvement Iteration

S-Stress

Improvement

0 1 2 3 4

0.17215 0.16331 0.14636 0.14496 0.14438

0.01695 0.00140 0.00058

Jessica Sofranko Kisenwether and Robert A. Prosek

Effect of Experience on Perceptual Spaces

551

TABLE 2. Group Stress Levels and R2 Values Matrix

Stress

R2

1 (SLPs) 2 (SLPGRADs) 3 (SLPGRADVs) 4 (SVTs) 5 (SVTGRADs) 6 (IEs)

0.148 0.138 0.144 0.111 0.131 0.135

0.913 0.922 0.923 0.949 0.936 0.917

was the basis of listener perceptions for D2. D3 did not correlate to any acoustical measures and remains unknown. Table 3 reports the weirdness and individual weights for each group. Weirdness values close to zero indicate that the participant has dimension weights proportional to the average.6 Group weights report the importance of each dimension to each group. D1 (perturbation measures) was weighted heavily for SVTGRADs and D2 (additive noise) was weighted heavily for SLPs. IEs weighed D1 (perturbation measures) and D2 (additive noise) the same. D3, undetermined, was not weighed heavily for any one group above D1 and D2. A two-dimensional (2D) representation of each dimension shows a partial separation among groups. Figure 2 indicates that SLPs and SVTGRADs are significantly different from the other groups across the first and second dimensions with SVTs, IEs, and both SLP graduate student groups clustering together. SLPs weighed additive noise more than any other group and perturbation measures less than any other group. SVTGRADs were the opposite; weighing perturbation measures more than any other group and additive noise less than any other group. All other groups remained more toward the center of the plot, weighing perturbation and additive noise measures similarly. Figure 3 shows that D3 had different grouping with SVTGRADs, SLPGRADVs, and IEs separating from the other groups and SLPs, SLPGRADs, and SVTs clustering together. IEs weighed D3 heavily, whereas SVTGRADs and SLPGRADVs, similarly, did not weight D3 heavily across their judgments of similarity. A more significant separation of groups across all three dimensions (3Ds) can be seen in Figure 4. Although SVTs and SLPGRADs are grouped together, SLPs, SLPGRADVs, SVTGRADs, and IEs are separated from this small cluster

FIGURE 2. Perceptions of each group plotted as D1 versus D2. 1, SLPs; 2, SLPGRADs; 3, SLPGRADVs; 4, SVTs; 5, SVTGRADs, and 6, IEs.

and one another. These results show a difference in perceptual spaces being used among SLPs, SLP graduate student groups, SVTGRADs, and IEs. Although SLP graduate students who had not had a voice client and SVTs are grouped similarly across 3Ds, these two groups separated from all others when judging similarity. Again, none of the groups weighed the third dimension more heavily than D1 or D2.

DISCUSSION This experiment examined the perceptual spaces of the listeners with varying levels and types of experience when judging the similarity among pairs of stimuli with systematically altered acoustical measurements. Previous literature discusses the multidimensional nature of voice quality.2–5,8 A 3D configuration was chosen and, when plotted, showed that the groups did separate from one another (with the exception of

TABLE 3. Group Matrices Weirdness Values and Weights Weight Group 1 (SLPs) 2 (SLPGRADs) 3 (SLPGRADVs) 4 (SVTs) 5 (SVTGRADs) 6 (IEs) Average

Weirdness

D1

D2

D3

0.3798 0.0961 0.2418 0.0466 0.5546 0.2198

0.30 0.58 0.74 0.63 0.91 0.60 0.43

0.82 0.68 0.59 0.74 0.28 0.60 0.40

0.38 0.35 0.20 0.31 0.14 0.45 0.10

FIGURE 3. Perceptions of each group plotted as D2 versus D3. 1, SLPs; 2, SLPGRADs; 3, SLPGRADVs; 4, SVTs; 5, SVTGRADs; and 6, IEs.

552

FIGURE 4. Perceptions of each group as a 3D plot. 1, SLPs; 2, SLPGRADs; 3, SLPGRADVs; 4, SVTs; 5, SVTGRADs; and 6, IEs. SVTs and SLPGRADs grouping more closely together) indicating that individuals with different types and levels of experience are using different perceptual spaces when judging similarity of stimulus pairs with systematically altered acoustical measurements. Most importantly, SLPs, who treat disordered voice quality, consistently separated from all other groups for D1 (perturbation) and D2 (additive noise). D1 correlated with jitter and shimmer, more generally stated, perturbation measures or signal instability. These two perturbation measures were systematically altered to generate the stimuli and were tracked by all listener groups. Perturbation measures have been shown to be highly correlated with dysphonic voice quality.30–40 SLPs weighed this dimension less than any other group. This may be due to an overexposure to instability in voice quality, as compared with any other group, secondary to treating dysphonic voices. SVTGRAD students, weighing this dimension more heavily than the other groups, may be overly sensitive to instability in voice as they are just beginning their focus on voice quality perception through a singing pedagogy course. Voice qualities that come as a result of increased perturbation measures may be a stark contrast to their new academic coursework and new experiences. SLP graduate student groups and SVTs have had some exposure, but not enough to parallel that of SLPs, leaving them to weigh D1 in the midrange. IEs may have weighed D1 also in the midrange secondary to uncertainty regarding the information embedded within the signal rather than a similarity with more experienced groups. These findings would be consistent with the uncertainty seen in the judgment of IEs.1 D2 correlated with NHR, RPK, and VTI. NHR, a common measure for breathiness, was systematically altered to generate the stimuli. Alterations in NHR would subsequently lead to alterations in other measurements for strength of voicing. For instance, RPK correlated with D2. RPK, based on the funda-

Journal of Voice, Vol. 28, No. 5, 2014

mental period, is a correlation of the signal and a delayed version of itself.41 Periodic signals (normal voices) will have more prominent peaks for correlation than breathy signals, leading to weaker correlations for less prominent signals (breathiness).41 VTI also correlated with D2. VTI is a measure of turbulence caused by incomplete glottal closure during phonation, again breathiness.42 All these measures are related to strength of voicing/additive noise and were a result of systematic changes in NHR when creating the stimuli. The results show that SLPs weighed this dimension more heavily than any other group. This is an indication of a particular sensitivity or ability to easily detect to strength of voicing that comes after 3 years of experience with dysphonia. SVTGRAD students, again opposite of SLPs, did not weigh this dimension heavily, which separated them from all other groups. Perhaps these students, just learning to focus on the balance between airflow and phonation, make allowances for additive noise in regard to similarities between stimuli. Also, SVTGRADs may not view breathiness as an extreme voice quality secondary to their extensive focus on breathing technique and breath support in their coursework. All other groups (IEs, SVTS, and both SLP graduate students groups) demonstrated an equal distribution of weight between additive noise and variability in voice quality when rating pairs of stimuli. Again, SVTs and SLPGRAD student groups have had exposure to noisy signals but not enough to parallel SLPs. Only IEs weighed both D1 and D2 exactly the same, and as stated, this may be due to uncertainty rather than a comparison with the more experienced groups again consistent with previous findings.1 All systematically altered measures (perturbation and additive noise) were accounted for in D1 and D2. Listeners consistently tracked these changes when judging similarity. Perturbation and additive noise were the only altered components of the generated stimuli and as a result, D3 did not correlate to any acoustical measures. This indicates that D3 is unrelated to acoustical measures. This dimension may be a representation of a top-down effect where listeners are bringing their own components to the task rather than a manipulation of the signal affecting judgment (bottom-up effect). It has been explained that ‘‘Our perception of speech and other communicative events is not determined by the signal alone. It is shaped by an interaction between the signal on the one hand and information stored in our brains on the other.’’43 D3, unrelated to the signal, may be related to factors each participant brought to the perceptual rating task. For example, D3 could be related to physiology, attention, memory, and so forth. With this experience, any bias created by commonly used rating scales was removed. The listeners were only asked to rate similarity among pairs of stimuli and dimensions were determined by the perceptions of each group and not the scale. Listeners were using perceptual dimensions of perturbation and additive noise when judging similarity and differences were evident, using a 3D representation, regarding varying levels and types of experience. Group differences were less evident with 2D representations, but a consistent separation with SLPs and SVTGRADs occurred for both D1 and D2. This stresses the importance of using SLPs with experience as

Jessica Sofranko Kisenwether and Robert A. Prosek

judges, rather than SLPGRAD students, to better generalize results of perceptual studies to the field. A consistent difference in perceptual spaces can be seen between SLPs and SLP graduate students (who have had a voice client and those who have not had a voice client). Also, although individuals with a background in singing voice have extensive experience with perceptions of voice quality, the study reveals that their experience does not equate to that of individuals with a background in speech-language pathology. In conclusion, these perceptual spaces or dimensions are nearly completely explained by perturbation and noise measures. It is clear, given the visual 3D representation of those perceptions, that varying levels and types of experience will lead to differences in how listeners will weigh those quantitative measures when judging voice quality. Finally, future research must carefully consider choosing listeners for perception as all bias regarding definitions of voice quality was removed and the groups in this study still tracked the controlled systematic changes to synthesized stimuli differently. There was one limitation to the study that should be noted. Just as described in Sofranko and Prosek,1 although participants reported the absence of a hearing impairment, hearing thresholds were not obtained before the study. Participants were able to adjust the playback volume to a comfortable level, this is of concern as many participants were aged older than 30 years and hearing thresholds may have been diminished. Considering the possibility of advanced ages when including expert populations, future research should include hearing screenings before listening tasks to assure hearing thresholds are within normal limits. REFERENCES 1. Sofranko JL, Prosek RA. The effect of levels and types of experiences on judgment of synthesized voice quality. J Voice. 2014;28:24–35. 2. Bassich CJ, Ludlow CL. The use of perceptual methods by new clinicians for assessing voice quality. J Speech Hear Disord. 1986;51:125–133. 3. Kempster GB, Kistler DJ, Hillenbrand J. Multidimensional scaling analysis of dysphonia in two speaker groups. J Speech Hear Res. 1991;34:534–543. 4. Kreiman J, Vanlancker-Sidtis D, Gerratt B. Defining and Measuring Voice Quality. From Sound to Sense: MIT; 2004. 5. Murry T, Singh S, Sargent M. Multidimensional classification of abnormal voice qualities. J Acoust Soc Am. 1977;23:361–369. 6. Kreiman J, Gerratt BR, Berke GS. The multidimensional nature of pathologic voice quality. J Acoust Soc Am. 1994;96:1291–1302. 7. Eadie TL, Doyle PC. Direct magnitude estimation and interval scaling of pleasantness and severity in dysphonic and normal speakers. J Acoust Soc Am. 2002;112:3014–3021. 8. Kreiman J, Gerratt BR, Ito M. When and why listeners disagree in voice quality assessment tasks. J Acoust Soc Am. 2007;122:2354–2364. 9. Yiu EML, Ng C. Equal appearing interval and visual analogue scaling of perceptual roughness and breathiness. Clin Linguist Phon. 2004;18: 211–229. 10. Zraick RI, Liss JM. A comparison of equal-appearing interval scaling and direct magnitude estimation of nasal voice quality. J Speech Hear Res. 2000;43:979–988. 11. Wolfe VI, Martin DP, Palmer CI. Perception of dysphonic voice quality by na€ıve listeners. J Speech Hear Res. 2000;43:697–705. 12. Carroll JD, Chang JJ. Analysis of individual differences in multidimensional scaling via an n-way generalization of ‘‘Eckart-Young’’ decomposition. Psychometrika. 1970;35:283–319. 13. Shrivastav R. Multidimensional scaling of breathy voice quality: individual differences in perception. J Voice. 2006;20:211–222.

Effect of Experience on Perceptual Spaces

553

14. Kreiman J, Gerratt B. The perceptual structure of pathologic voice quality. J Acoust Soc Am. 1996;100:1787–1795. 15. Kreiman J, Gerratt B, Precoda K. Listener experienced and perception of voice quality. J Speech Hear Res. 1990;33:103–115. 16. Kreiman J, Gerratt B, Precoda K, Berke G. Individual differences in voice quality perception. J Speech Hear Res. 1992;35:512–520. 17. Sofranko JL, Prosek RA. The effect of experience on classification of voice quality. J Voice. 2012;26:299–303. 18. Kreiman J, Gerratt BR. Sources of listener disagreement in voice quality assessment. J Acoust Soc Am. 2000;108:1867–1876. 19. Shipp T, Huntington DA. Some acoustic and perceptual factors in acutelaryngitic hoarseness. J Speech Hear Disord. 1965;30:350–359. 20. Kreiman J, Gerratt B, Kempster G, Erman A, Berke G. Perceptual evaluation of voice quality: review, tutorial, and a framework for future research. J Speech Hear Res. 1993;36:21–40. 21. Kreiman J, Gerratt BR, Anto~nanzas-Barroso N. Analysis and Synthesis of Pathological Voice Quality. Los Angeles, CA: University of California; 2006. 22. Awan SN, Roy N. Acoustic prediction of voice type in adult females with functional dysphonia. J Voice. 2005;19:268–282. 23. Awan SN, Lawson L. The effect of anchor modality on the reliability of vocal severity ratings. J Voice. 2009;23:341–352. 24. Awan SN, Roy N. Outcomes measurement in voice disorders: an acoustic index of dysphonia severity. J Speech Lang Hear Res. 2009;52: 482–499. 25. National Association of Teachers of Singing. (n.d.) Membership Qualifications. Available at: http://www.nats.org/index.php?option¼com_ content&view¼article&id¼137&Itemid¼102. Accessed May 10, 2010. 26. Mohr B, Wang WSY. Perceptual difference and the specification of phonological features. Phonetic. 1968;18:31–45. 27. Walden BE, Montgomery AA, Gibeily GJ, Prosek RA, Schwartz DM. Correlates of psychological dimensions in talker similarity. J Speech Hear Res. 1978;21:265–275. 28. Kreiman J, Gerratt B. Perception of aperiodicity in pathological voice. J Acoust Soc Am. 2005;117:2201–2211. 29. Hillenbrand J. Getting Started with Alvin2. 2005. Available at: http:// homepages.wmich.edu/hillenbr/. Accessed January 30, 2010. 30. Bhuta T, Patrick L, Garnett JD. Perceptual evaluation of voice quality and its correlation with acoustic measurements. J Voice. 2004;18:299–304. 31. Deal RE, Emmanuel FW. Some waveform and spectral features of vowel roughness. J Speech Hear Res. 1978;21:250–264. 32. Eskenazi L, Childers DG, Hicks DM. Acoustic correlates of vocal quality. J Speech Hear Res. 1990;33:298–306. 33. Hecker MHL, Kruel EJ. Descriptions of the speech of patients with cancer of the vocal folds. Part I: measures of fundamental frequency. J Acoust Soc Am. 1970;49:1275–1282. 34. Leiberman P. Some acoustic measures of the fundamental periodicity of normal and pathologic larynges. J Acoust Soc Am. 1963;15:344–353. 35. Hillenbrand J. Perception of aperiodicities in synthetically generated voices. J Acoust Soc Am. 1988;83:2361–2371. 36. Martin D, Fitch J, Wolfe V. Pathologic voice type and the acoustic prediction of severity. J Speech Hear Res. 1995;38:765–771. 37. Smith BE, Weinberg B, Feth LL, Horii Y. Vocal roughness and jitter characteristics of vowels produced by esophageal speakers. J Speech Hear Res. 1978;21:240–249. 38. Wolfe V, Martin D. Acoustic correlates of dysphonia: type and severity. J Commun Disord. 1997;30:403–416. 39. Wolfe V, Steinfatt TM. Prediction of vocal severity within and across voice types. J Speech Hear Res. 1987;30:230–240. 40. Yumoto E, Gould WJ, Baer T. Harmonics-to-noise ratio as an index of the degree of hoarseness. J Acoust Soc Am. 1982;71:1544–1550. 41. Hillenbrand J, Cleveland RA, Erickson RL. Acoustic correlates of breathy voice quality. J Speech Hear Res. 1994;37:769–778. 42. Xue SA, Deliyski D. Effects of aging on selected acoustic voice parameters: preliminary normative data and educational implications. Educ Gerontol. 2001;27:159–168. 43. Lindblom B. On the communication process: speaker-listener interaction and the development of speech. Augment Altern Commun. 1990; 220–230.

The effect of experience on perceptual spaces when judging synthesized voice quality: a multidimensional scaling study.

The purpose of this study was to determine the effect of experience on the perceptual space of listeners when judging voice quality...
311KB Sizes 2 Downloads 3 Views