NIH Public Access Author Manuscript J Med Speech Lang Pathol. Author manuscript; available in PMC 2013 August 08.

NIH-PA Author Manuscript

Published in final edited form as: J Med Speech Lang Pathol. 2011 December 1; 19(4): 25–36.

The Effects of Topic Knowledge on Intelligibility and Lexical Segmentation in Hypokinetic and Ataxic Dysarthria Rene L. Utianski, M.S., Kaitlin L. Lansford, M.S., Julie M. Liss, Ph.D., and Tamiko Azuma, Ph.D. Arizona State University, Department of Speech and Hearing Science

Abstract

NIH-PA Author Manuscript

Benefits to speech intelligibility can be achieved by enhancing a listener’s ability to decipher it. However, much remains to be learned about the variables that influence the effectiveness of various listener-based manipulations. This study examined the benefit of providing listeners with the topic of some phases produced by speakers with either hypokinetic or ataxic dysarthria. Total and topic word accuracy, topic-related substitutions, and lexical boundary errors were calculated from the listener transcripts. Data were compared with those who underwent a familiarization process (reported by Liss, Spitzer, Caviness, & Adler, 2002) and with those inexperienced with disordered speech (reported by Liss Spitzer, Caviness, & Adler, 2000). Results revealed that listeners of ataxic speech provided with topic knowledge obtained higher intelligibility scores than naïve listeners. The magnitude of benefit was similar to the familiarization condition. However, topic word and word substitution analyses revealed different underlying perceptual mechanisms responsible for the observed benefit. No differences attributable to listening condition were discovered in lexical segmentation patterns. Overall, the results support the need for further study of listener-based manipulations to elucidate the mechanisms responsible for the observed perceptual benefits for each dysarthria type.

Keywords intelligibility; lexical segmentation; hypokinetic dysarthria; ataxic dysarthria; perceptual learning; signal-complementary information

NIH-PA Author Manuscript

INTRODUCTION Intelligibility of dysarthric speech often is compromised by the underlying neuromuscular condition of the speaker. The vast majority of treatment strategies for improving intelligibility focus on speech modifications (e.g., Lee Silverman Voice Treatment [LSVT®], rate reduction) (Duffy, 1995). However, intelligibility depends not only on the quality of the acoustic speech signal but also on factors that are independent of the signal, known as signal-complementary information (Lindblom, 1990a). This information includes the listener’s experience or knowledge about a specific speaker or topic of conversation, as well as semantic, syntactic, and phonotactic probabilities; lexical knowledge; and word frequency (e.g., Luce, Goldinger, Auer, & Vitevitch, 2000; Luce & Large 2001). The perceptual benefits offered by the presence of signal-complementary information in difficult listening environments, such as in understanding dysarthric speech, have been exploited

Copyright © 2011 Delmar Cengage Learning Address Correspondence to Rene L. Utianski, M.S., Arizona State University, Department of Speech and Hearing Science, Coor Hall 2211, 10th Street and Myrtle, P.O. Box 870102, Tempe, AZ 85287-0102, [email protected].

Utianski et al.

Page 2

NIH-PA Author Manuscript

both therapeutically (e.g., alphabet boards and other augmentative and alternative communication devices) and experimentally (e.g., perceptual learning paradigms, semantic and syntactic priming). The idea is that the intelligibility of a degraded signal can be augmented without modifying the degraded signal at all; rather, the manipulations are listener based.

NIH-PA Author Manuscript

Therapeutic interventions that manipulate signal-complementary information are attractive options, particularly when traditional treatments that target the speaker are not indicated. This is often the case because of the degenerative nature of many of the diseases that cause dysarthria. For example, priming lexical candidates to facilitate word recognition can be accomplished through a variety of manipulations, including alphabet cuing, topic cuing, and word likeliness based on semantic and syntactic probabilities (Hustad, Auker, Natale, & Carlson, 2003a; Hustad, Jones, & Daily, 2003b; Luce et al., 2000; Luce & Pisoni, 1998). When given relevant cues, the pool of lexical candidates is narrowed, and the activation threshold of likely lexical candidates is decreased, mitigating the detrimental effects that result from the loss of reliable acoustic information (Beliveau, Hodge, & Hagler, 1995; Dongilli, 1994; Garcia & Dagenais, 1998; Jones, Mathy, Azuma, & Liss, 2004; Yorkston, Strand, & Kennedy, 1996). Hustad and her colleagues (2003b) have reported alphabet cuing to improve listener performance, and sentence and word-level improvements in intelligibility have been reported when listeners were presented with a semantically related cues (Dongilli, 1994; Hammen, Yorkston, & Dowden, 1991) or sentence topic (Jones et al., 2004). Familiarizing the listener with the speaker or the type of dysarthria also has been found to produce intelligibility benefits (Hustad & Cahill, 2003; Liss et al., 2002; Tjaden & Liss, 1995a, 1995b).

NIH-PA Author Manuscript

Despite the therapeutic and experimental import of signal-complementary information, very little is known about the mechanisms underlying the perceptual benefits of different forms of this information. For example, an alphabet board presumably primes words in the cohort of options that begin with the same letter, allowing the listener to narrow the field of lexical candidates. This facilitates the processes of lexical activation (Cole & Jakimik, 1978; Nooteboom, 1981). A similar mechanism for improved listener performance can be proposed for providing the listener with knowledge of the topic of the degraded utterance (semantic priming). This facilitates lexical activation and lexical competition (Luce & Pisoni, 1998; Luce et al., 2000), thereby improving the listener’s chance of mapping the degraded acoustic cues onto the intended word. In contrast, providing a listener with guided experience in deciphering a particular pattern of degraded speech (i.e., familiarization procedure) does not serve to narrow the field of lexical candidates or facilitate lexical competition. Instead, the mechanism for improved listener performance is presumed to be perceptual learning, in which the exposure and feedback allows listeners to recognize segmental or suprasegmental regularities (or both) that may be useful for subsequent deciphering. This is regarded as a signal-complementary process because nothing changes with the speech signal, only with the listeners’ ability to map the degraded acoustic information onto their existing canonical acoustic-phonetic representations or more accurately identify word boundaries in the degraded acoustic stream (Clarke & Garrett, 2004; DePaul & Kent, 2000; Dupoux & Green, 1997; Liss et al., 2002; Mattys, White, & Melhorn, 2005; Tjaden & Liss, 1995a, 1995b). It may be expected, then, that different manipulations have different magnitudes of beneficial effect. Indeed, there are reports of differential benefits of topic knowledge (TK) and alphabet cuing depending on the dysarthria severity and the listener’s age (c.f., Hustad et al., 2003a, 2003b; Jones et al., 2004) and differential benefits of familiarization depending on dysarthria subtype (Liss et al., 2002). It remains to be determined whether there is a predictive relationship between the nature of the speech degradation and the type of

J Med Speech Lang Pathol. Author manuscript; available in PMC 2013 August 08.

Utianski et al.

Page 3

NIH-PA Author Manuscript

manipulation used, such that some forms of signal- complementary information may be more beneficial for certain degradation patterns. Understanding how listeners deploy their cognitive-perceptual processes is not only of basic science import in developing models of intelligibility deficits but also in optimally exploiting signal-complementary information to enhance perceptual performance therapeutically.

NIH-PA Author Manuscript

In a previous investigation (Liss et al., 2002), we reported significant benefits of a brief listener familiarization procedure on intelligibility of ataxic speech and to a much lesser degree on hypokinetic dysarthric speech. Furthermore, there was a significant benefit when listeners were familiarized with one type of dysarthria and tested on the other, although less impressive than the same- dysarthria condition. Finally, there was no evidence that the familiarization procedure improved intelligibility by way of an improved ability to segment the phrases into discrete words (lexical segmentation), but the patterns of lexical boundary errors (LBEs) again differed by dysarthria type. This led us to wonder whether the ataxic speech samples used in that study were offering information not available from the hypokinetic samples. If the source of perceptual benefits of familiarization lies in learning about the acoustic signal, we would expect the differential improvements for dysarthria types to disappear if listeners were offered signal- complementary information that serves to narrow the field of lexical candidates by way of semantic priming (topic knowledge). We examined error patterns for three groups of listeners: a control group (who were neither familiarized with dysarthric speech nor provided knowledge of topic), a familiarization group (who followed along with a written transcript while listening to either ataxic or hypokinetic speech), and a topic knowledge group (TK) who were not familiarized but told that some of the words in the phrases would be political in nature. Data from the control and familiarization groups were taken from Liss et al. (2002), and those from the TK condition have not been previously reported. The following hypotheses were set forth: (1) TK will result in an increase in overall intelligibility relative to the control group, as well as a higher percentage of correctly identified topic-related words and higher number of topic-related word substitutions compared with the control and familiarization conditions; (2) the order of magnitude of improved intelligibility will be similar for the two dysarthria groups, unlike the pattern elicited by the previously reported familiarization data; and (3) TK LBEs will mirror the control group because the knowledge is not expected to allow listeners to “learn” about the acoustic signal to facilitate the use of syllabic strength in segmenting the acoustic stream.

METHOD NIH-PA Author Manuscript

Study Overview A between-groups design was selected to compare listener performance for the two dysarthria types (hypokinetic and ataxic). This was necessary to minimize the opportunity for cross- contamination, or the effects of implicit familiarization that may have occurred during the transcription task. Previous research has shown that evidence of perceptual learning are present even with brief exposure to different patterns of degraded speech, but the effects are not as robust (Liss et al., 2002). Therefore, although it is preferable to use a within-subjects design, establishing the isolated effects for a given group is imperative. After receiving knowledge of the topic of some of the phrases in the task, two groups of 40 listeners transcribed 60 phrases produced by speakers with either hypokinetic or ataxic dysarthria, allowing us to explore main and interaction effects of speaker type in this condition, and compared it with previously published control and familiarization data.

J Med Speech Lang Pathol. Author manuscript; available in PMC 2013 August 08.

Utianski et al.

Page 4

Participants

NIH-PA Author Manuscript

The data from 200 participants were analyzed in the present study. All listeners had normal hearing; standard American English as their native language; and little to no experience with dysarthric speech, as per self-report. The listener groups contained equal numbers of Arizona State University undergraduate men and women whose ages ranged from 18 to 50 years old. All were compensated for their participation in this study. Data from two groups of 40 individuals were collected for this investigation (TK group; n = 80). Data were compared with those of two control groups of 20 listeners unaware of the target topic and received no additional familiarization to disordered speech (control group; n = 40) (reported in Liss et al., 2000) and with those of two groups of 40 listeners who underwent a familiarization process (reported in Liss et al., 2002) (familiarization group; n = 80). The study protocol and consent procedures were approved by the Institutional Review Board of Arizona State University. Speech Stimuli

NIH-PA Author Manuscript

Details on the creation of the stimulus sets are described in previous reports (Liss, Spitzer, Caviness, Adler, & Edwards, 1998; Liss et al., 2000). Briefly, phrases were recorded from six speakers with hypokinetic dysarthria and six speakers with ataxic dysarthria. The perceptual characteristics of the phrases for each speaker group were consistent with our operational definitions, derived from the Mayo Classification System (Darley, Aronson, & Brown, 1969; Duffy, 1995). All speakers were deemed to have moderate to severe impairments in intelligibility by certified speech-language pathologists and were selected to create relatively homogenous groups for each dysarthria type, with highly similar segmental and suprasegmental characteristics. The characteristics of speakers classified with hypokinetic dysarthria included a perceptually rapid speaking rate with monopitch and monoloudness; imprecise articulation, leading to a perceived blurring of phonemes and syllables; and a breathy, hoarse, or harsh voice. Speakers classified with ataxic dysarthria were noted to have an equal and even syllable duration pattern, perceptually slow rate, and excessive loudness variation.

NIH-PA Author Manuscript

The perceptual impressions were supported by acoustic measures of phrase duration, strongto-weak vowel duration calculations, vowel formant frequencies and point-vowel quadrilateral areas, as well as fundamental frequency and amplitude variation (see Liss et al., 2000, Tables I and II). Briefly, the hypokinetic phrases were significantly shorter in duration and had a significantly smaller range of fundamental frequency variation than those of the speakers with ataxic dysarthria. These characteristics correspond with the perceived rapid rate and monotone speech of hypokinetic speech. The vowel durations for ataxic speech were similar for those within adjacent strong and weak syllables. This corroborated the perception of slow, equal, and even speech. The vowel quadrilateral areas for both hypokinetic and ataxic speakers were approximately 50% smaller than those of the neurologically normal control speakers (see Figure 3 in Liss et al., 1998). As reported in Liss et al. (2000), the phrases recorded for the two dysarthria samples were of equivalent intelligibility by design. The intelligibility (as measured by mean words-correct score) for the ataxic set was 43.2%, and the mean for the hypokinetic set was 41.8%. The phrases, modeled after Cutler and Butterfield (1992), were designed for the assessment of the quality of lexical segmentation using patterns of LBEs. The phrases each consisted of six syllables that alternated in phrasal strength patterns, with strong syllables identified as containing full vowels of relatively longer duration, that may or may not receive prosodic stress, and weak syllables identified as containing reduced vowels, and do not receive prosodic stress1 (Cutler & Carter, 1987). Half of the phrases alternated strong– weak (SWSWSW), and the other half alternated weak–strong (WSWSWS). The phrases ranged in

J Med Speech Lang Pathol. Author manuscript; available in PMC 2013 August 08.

Utianski et al.

Page 5

NIH-PA Author Manuscript

length from three to five words, and all words were one or two syllables in length. The phrases contained all English words but were of low interword predictability to reduce the contribution of contextual information to word activation and recognition. None of the words in the phrases was repeated, with the exception of articles and auxiliary verbs. The complete list of test phrases can be found in Spitzer, Liss, and Mattys (2007). The stimulus sets consisted of 60 phrases, with 10 productions from each of the six speakers in each group. A neurologically healthy female speaker stated the phrase number immediately before each phrase. After each phrase, there was 12 seconds of silence during which the listeners transcribed the phrase. Of the 60 phrases, 36 phrases contained one target word that was political in nature (e.g., caucus, voter), henceforth referred to as topic words. The phrases containing political words were developed in a series of pilot studies to eliminate ambiguous words.

PROCEDURE

NIH-PA Author Manuscript

The listeners were seated in individual cubicles. The audiotapes were presented via the Tandberg Educational sound system in the ASU Language Laboratory over high-quality Tandberg supra-aural headphones. Equivalent sound pressure levels across headphones were verified with a headphone coupler sound level meter (Quest 215 Sound Level Meter). Listeners were instructed to adjust the loudness to a comfortable listening level in 4-dB increments up or down during the preliminary instructions. They were told not to alter the loudness after the stimulus phrases had begun. The listeners transcribed three practice phrases, which were read by a neurologically normal female speaker. No listeners made more than one word error in the practice transcriptions. Before the transcription task, all listeners in the TK condition were informed that some of the phrases contained words that are political in nature. They were told that all phrases consisted of real words in the English language produced by several different male and female speakers and that some of the phrases may be difficult to understand but that they should guess if they did not know what the speaker was saying.

ANALYSES The TK corpus consisted of 4800 phrase transcriptions (80 listeners × 60 phrases). The following dependent measures were analyzed. Intelligibility

NIH-PA Author Manuscript

To determine overall intelligibility, a words- correct score (number of words correct divided by total words) was calculated for each listener. A word was counted as correct when it exactly matched the target or when it differed only by tense (-ed) or plural (-s) without changing its syllabic structure. Substitutions between “a” and “the” were also regarded as correct. A 2 × 3 between-groups analysis of variance (ANOVA) was conducted. The first factor was speaker group (ataxic or hypokinetic dysarthria), and the second factor was condition (control, familiarization, or TK). Post-hoc pairwise multiple comparisons, with Bonferroni’s adjustment, were conducted to detect differences in mean intelligibility scores between speaker groups and listening conditions.2

1This definition of syllabic strength is distinguished from that of prosodic phonology, which holds that all strong syllables are stressed and weak syllables are unstressed (Halle & Keyser, 1971). The relative importance of vowel quality versus prosodic stress in the perceptual designation of syllabic strength remains to be determined (Fear, Cutler, & Butterfield, 1995; Gow & Gordon, 1995). 2Recognizing the unmatched size of the groups with 40 listeners in the TK and familiarization groups and 20 listeners in the control groups, a conservative alpha of 0.005 was selected. This was intended to minimize the potential for an overpowered study in which small between-group differences may attain statistical significance without being clinically or perceptually relevant. J Med Speech Lang Pathol. Author manuscript; available in PMC 2013 August 08.

Utianski et al.

Page 6

Topic Words Correct

NIH-PA Author Manuscript

Topic words (i.e., words that are political in nature) appeared in 36 of the 60 phrases. The percentage of topic words correctly identified was calculated for each listener and subjected to a 2 × 3 between-groups ANOVA. The first factor was speaker group (ataxic or hypokinetic dysarthria), and the second factor was condition (control, familiarization, or TK). To evaluate the specific effects of speaker group and listening condition on the percentage of topic words correct, post-hoc pairwise multiple comparisons were conducted. Bonferroni’s adjustment was used to correct for the number of comparisons made. Topic-Related Substitutions

NIH-PA Author Manuscript

Word substitutions that were political in nature (e.g., civil for simple) were tabulated and summed independently by two trained judges, for which interrater reliability (Cronbach’s alpha) was computed to be 98.9%. The tabulations of the first judge (RU) were used in the remaining analyses. The proportion of political substitutions to total words incorrectly identified was calculated for each listener. Evidence of lexical priming is supported by a larger proportion of political word substitutions in the TK condition relative to the other listening conditions. A 2 × 3 between-groups ANOVA was conducted. The first factor was speaker group (ataxic or hypokinetic dysarthria), and the second factor was condition (control, familiarization, or TK). Post-hoc pairwise multiple comparisons were also made, with Bonferroni’s adjustment, to evaluate this potential relationship. Lexical Boundary Errors Two trained judges independently coded the listener transcripts for the presence and type of LBEs to obtain the quantity of agreed upon errors. Lexical boundary violations were defined as erroneous insertions or deletions of lexical boundaries. Four error types were possible: insert boundary before a strong syllable (IS), insert boundary before a weak syllable (IW), delete boundary before a strong syllable (DS), and delete boundary before a weak syllable (DW). Each phrase had the possibility of containing more than one LBE (for examples, see Table 1). If syllabic strength is an important cue for segmenting the speech stream, we expect to see a larger proportion of LBE insertions before strong syllables and deletions before weak syllables because most English words begin with a strong syllable (Cutler & Carter, 1987). A χ2 test of independence was conducted to determine the presence of this dependency structure between LBE type (insertion or deletion) and syllabic strength (strong or weak) for each dysarthria group.

NIH-PA Author Manuscript

To measure the overall quality of lexical segmentation for each condition, the metrical segmentation strategy (MSS) ratio, the number of predicted errors divided by total LBEs (IS + DW/Total LBEs), was calculated for each listener. An MSS ratio greater than 0.50 is taken as evidence of strength-based segmentation. To evaluate the effects of speaker group and listening conditions on the MSS ratio, a 2 × 3 between-groups ANOVA was conducted. To further elucidate the relationship between syllabic strength and LBE type in the TK condition, IS/IW and DW/DS ratios were calculated for each dysarthria group. These ratios permitted a comparison with previously published data regarding the strength of adherence to predicted error patterns. Ratio values of 1 indicate that insertions and deletions occur equally as often before strong and weak syllables. Therefore, the greater the positive distance from 1, the greater the strength of adherence to the predicted pattern. These descriptive data were not treated statistically but were calculated to complement the results of the χ2 tests of independence and the ANOVA.

J Med Speech Lang Pathol. Author manuscript; available in PMC 2013 August 08.

Utianski et al.

Page 7

RESULTS Intelligibility

NIH-PA Author Manuscript

Results of the 2 × 3 between-groups ANOVA revealed significant main effects for both the dysarthria group [F (1, 195) = 32.75; P < .005; ηp2 = .144] and listening condition [F (2, 195) = 16.548; P < .005; ηp2 = .145]. The interaction effect was not significant [F (2, 195) = 2.872; P = .059; ηp2 = .029]. Planned comparisons revealed a significant difference between the TK and control conditions (P < .005). Intelligibility scores from the TK and familiarization conditions did not differ significantly (P = .537), indicating that both methods of training produced similar magnitudes of facilitative effects. Planned comparisons of dysarthria type by condition revealed both speaker groups were of equivalent intelligibility in the control conditions, confirming the design of the speech stimuli. However, greater perceptual benefit to intelligibility was revealed for ataxic than for hypokinetic speakers for both listener manipulations (see Figure 1 for group means). Although there was some variability among listeners, there was no noticeable difference in this variability among conditions (see Table 2 for descriptive statistics of listener variability). Topic Words Correct

NIH-PA Author Manuscript

The 2 × 3 between-groups ANOVA revealed significant main effects of speaker group [F (1, 194) = 16.803; P < .005; ηp2 = .080] and listening condition [F (2, 194) = 16.155; P < .005; ηp2 = .143], as well as a significant interaction [F (2, 194) = 7.357; P < .005; ηp2= .071]. Planned comparisons, with Bonferroni’s correction (alpha set at P < .005), revealed significant differences between the control and both the TK and familiarization groups. The difference between the TK and familiarization conditions failed to reach significance (P = . 025). In the ataxic listening group, significant differences were revealed between the percentage of topic words correctly identified between the TK condition relative to the control (t(58) = −6.214; P < .005) and familiarization (t(78) = −3.157; P < .005) conditions and between the familiarization and control conditions (t(58) = −4.082; P < .005). In the hypokinetic listening group, the percentage of topic words correctly identified did not differ significantly between the TK condition relative to control (t(58) = −1.362; P = .178) or familiarization (t(78) = −.475; P = .636) conditions or between control and familiarization (t(58) = −.966; P = .338) conditions. See Figure 2 for group means. Again, there was no noticeable difference in variability of listener performance among conditions (see Table 2 for descriptive statistics of listener variability). Topic-Related Substitutions

NIH-PA Author Manuscript

The proportion of substitutions to total number of incorrectly transcribed words was calculated, revealing a larger proportion of topic-related substitutions in the TK condition for both speaker groups relative to the control and familiarization conditions (Figure 3). Results of the 2 × 3 between-groups ANOVA revealed a significant main effect for dysarthria group [F (1, 195) = 13.311; P < .005; ηp2=0.064] and listening condition [F (2, 195) = 25.433; P < .005; ηp2=0.208] and a nonsignificant interaction [F (2, 195) = 2.631; P = .075; ηp2= .026]. Planned comparisons demonstrated that listeners of hypokinetic speech made significantly higher proportions of political word substitutions in the TK condition than in the control condition (t(58) = −3.44; P < .005) or familiarization condition (t(78) = −4.362; P < .005). There was no difference between the control and familiarization conditions (t(58)= −.484; P = .630). However, the findings for listeners of ataxic speech were somewhat different. Listeners made significantly higher proportions of political word substitution errors in the TK condition than in the familiarization condition (t(58) = −5.517; P

The Effects of Topic Knowledge on Intelligibility and Lexical Segmentation in Hypokinetic and Ataxic Dysarthria.

Benefits to speech intelligibility can be achieved by enhancing a listener's ability to decipher it. However, much remains to be learned about the var...
352KB Sizes 1 Downloads 3 Views