JOURNAL

OF COMMUNICATION

DISORDERS

9 (1976), 317-330

PERCEPTUAL AND ACOUSTIC ANALYSIS OF REPETITIONS IN STUTTERED SPEECH ALLEN Army Audiology andspeech

A. MONTGOMERY

Center, Walter Reed Army Medical Center, Washington, D. C. 20012

PAUL Purdue

Universily,

A. COOKE West Lafayette,

Indiana

This study presents perceptual and acoustic data on a carefully selected set of part-word repetitions from the speech of adult stutterers. Results indicated that the schwa vowel was perceived in only 25% of the repetitions, far less than previously indicated. Spectrographic analysis showed that although abnormal consonant duration and C-V formant transitions characterized the initial segment of the stuttered word, the remainder of the word is identical to its fluently produced counterpart. The results were interpreted to mean that for the type of dysfluency selected, the articulatory breakdown is confined to the initial consonant, and it is likely that abnormal formant transitions from initial consonant to vowel, when present, are due to deviant formation of the consonant rather than to faulty transition dynamics.

.

Introduction

The part-word repetition is one of the universally recognized symptoms of stuttering. In this type of block, for words beginning with a consonant-vowel combination (CV+), the stutterer typically initiates the speech attempt with the correct phoneme, releases the phoneme into a vowel that is abruptly terminated, and returns his articulators to the posture of the initial consonant. On the second attempt the word may actually be said, or the abrupt termination may occur again and the process begin over as the third of many such trials. The part-word repetition is a basic and convenient form of stuttering for analysis. It is basic in that repetitions (along with prolongations) are the primary characteristic that distinguishes the developing stutterer from the nonstutterer. As Wingate (1962, p. 111) points out about the data of Johnson et al. (1959), it is essentially in terms of sound and syllable repetitions and prolongations that stutterers differ from nonstutterers. Moreover, repetitions are easy to analyze spectrographically since extreme distortions are not as prevalent, and easy to prepare for perceptual analysis since the pauses provide convenient locations for tape splicing. Accordingly, we have chosen the part-word repetitions of CV+ words as the initial focus of study of the advanced stutterer’s articulatory dynamics. Throughout the years speech pathologists have reported that many stutterers 0 American Elsevier Publishing

Company,

Inc.,

1976

317

318

ALLEN

A. MONTGOMERY

and PAUL

A. COOKE

produce the schwa vowel’ during CV+ repetitions instead of approximating the actual vowel. Van Riper (197 l), for example, feels that it is essential to determine whether young children employ the schwa in their syllabic repetitions. If they do, he feels it indicates the probability of developing stuttering on a more permanent basis, because the proper formant transitions are not present and the required coarticulation has not been achieved. This conclusion is based on his own observations and on Stromsta’s (1965) spectrographic findings, which showed that children who became stutterers had anomalies in formant transitions in their earlier dysfluencies, whereas the children who “grew out” of their earlier nonfluencies had normal transitional movements. Van Riper (1971, p. 24) states: Almost universally the schwa vowel can be heard in the stutterer’s abortive speech attempts. When the stutterer has repetitive syllables on the word “paper” he seldom says [pe-pe-pep I, instead he uses the schwa vowel in his repetition [p A -pA -PI -pepal.

Acoustic manifestations of the stutterers’ articulatory processes, however, have received relatively little study. Agnello ( 1966) indicated that the acoustic characteristics of the stuttering dysfluencies of stutterers were~different from their normal speech dysfluencies. It is interesting that some of the acoustic differences were undetectable by ear and were demonstrated only by spectrographic analysis. Specifically, the shifts of the second formant, which reflect normal forward and backward coarticulatory dynan&s, were not characteristic of the stuttering moments . The present research is concerned with demonstrating an approach to studying the process of articulation during stuttering, describing some perceptual and acoustic aspects of part-word (CV+) repetitions, and advancing some tentative conclusions concerning the pattern of articulatory breakdown present in the act of stuttering.

Method Listeners Five speech pathologists (two professors and three doctoral candidates in speech pathology) served as sophisticated listeners for this study. All had normal hearing and extensive experience in evaluating stuttered speech for research as well as clinical purposes.

‘The word “schwa” is used here to mean both the unstressed form/a /and the stressed form /A/ of the neutral vowel. The stressed form is used throughout the report for convenience.

PERCEPTUAL

AND ACOUSTIC

ANALYSIS

OF STUTTERING

319

Stimuli Thirty experimental samples from 16 English-speaking adult stutterers (15 male and 1 female) were selected for use in the perceptual part of the study. The 16 stutterers were selected from a set of 82 adult stutterers whose recordings were available from two previous studies (Kroll, 1970; Hinkle, 197 1). The subjects were all receiving or had received speech therapy for stuttering at one of five midwestern university speech clinics. Their overall severity ratings ranged from mild to severe. However, because of the selection criteria listed below, the actual samples selected were clearly examples of mild stuttering. The subjects were (or had been) receiving symptomatic therapy for their stuttering and none gave evidence of being exposed to “bounce” therapy (in which an unnatural form of repetition is used to break up the stutterer’s existing pattern). The utterances were taken from conversational speech and adaptation sequences. The samples were restricted to perceived part- word repetitions on the initial CV+ utterance of words containing at least three phonemes. (This eliminated whole-syllable and whole-word repetitions such as “rep-rep” or “to-to. “) To be considered, the samples must have had a perceivable pause between the repetition and the actual utterance of the stuttered word. In addition, no apparent struggle or excessive tension was accepted when the word was finally spoken. Very brief repetitions with little or no vowel included were disregarded because the judges might not be sensitive enough to produce reliable interpretations of the utterance. The actual CV+ repetitions (not including the actual utterance of the stuttered word) were then spliced out to form a master tape for the perceptual analysis. Preparation of Tape for Perceptual Analysis The appropriate words, which contained the CV+ repetitions as defined above, were recorded onto Ampex 414 low noise tape at 7% ips. The tape recorders used for this dubbing procedure were Ampex two-channel recorders, model AG 6OOB-2. Those samples that had a corresponding fluent utterance of that word in the same context, from another reading in the adaptation sequence, were also recorded. These matched pairs of words (the stuttered and the fluent) were then used as the master tape for the spectrographic analysis. Fourteen such pairs were recorded. (See section below on preparation of tape for acoustic analysis.) Using the same equipment as described above, the stuttered samples were then duplicated. With this duplicated tape the master tape for the listeners was made by splicing out just the syllable repetition before the stuttered word was spoken. If more than one such repetition occurred, all were included in the sample. This was accomplished by employing an apparatus that had a reproduce head, which was attached to a preamplifier and a loudspeaker, so that one could turn the tape reels by hand and determine where the repetition had ended and production of the actual

320

ALLEN

A. MONTGOMERY

and PAUL

A. COOKE

word began. Great care was taken to insure that the start of the word was not included in the sample and yet that the repetition was not artificially truncated. As a control group of items, 26 nonstuttered CV+ samples were recorded. They were made from a random selection of consonants followed by one of the 13 vowels used on the listener’s response sheet. Each vowel was represented one to three times in the 26 control items. The vowels contained in these utterances were intentionally slightly longer than the experimental samples to insure better performance and remind the listeners of the entire list of possible vowels. The spliced experimental samples and the control samples were then interspersed, with 5 set of silence between each utterance, to form the master listening tape. Procedure for Stimulus Presentation The five listeners were seated in a sound treated room 5 to 6 feet from the loudspeaker (Electrovoice SP12) of a high fidelity tape reproduction system (Revox Type A77). They were then subjected to two conditions. During Condition I, the master tape was played and the listeners were told to write down the vowel they heard in each utterance. The consonant was supplied for each sample and the listeners were forced to choose from the list of 13 vowels also supplied on the response sheet. In the case of multirepetition samples the listener was to fill in a vowel for each repetition. In Condition I, the listeners were not aware that the samples were from stutterers ,2 only that abruptly terminated CV syllables were presented. For Condition II, however, the listeners heard the same tape again but were told that the utterances were actually part-word repetitions or “bounces” from stutterers’ speech. They were instructed, this time, topredict the vowel of the word that the stutterer was going to produce. Regardless of a multirepetition sample, only one response was to be elicited from the listeners. As in Condition I, the same vowels were available for selection and a response was required. Procedure for Perceptual

Analysis

Confusion matrices were generated under both conditions. The experimental samples and control samples were separated for individual analysis. The confusion matrices were developed by listing along the vertical axis all possible vowels that could have been present in the repetition (or predicted to be in the stuttered word, which of course was not included on the tape). The same list was arrayed horizontally, and represented the intended vowel, i.e., the vowel actually contained in the stuttered word when finally produced. % is possible of course, that the listeners suspected that the samples were of stuttering, but the sam pies, consisting of only very brief, almost nonspeech sound segments, were difficult to perceive and closely resembled electronically gated stimuli often used in phonetic research in our department.

PERCEPTUAL

AND ACOUSTIC

ANALYSIS

OF STUTTERING

321

c* Fig. 1.

The main diagonal is the number of “correct” responses, i.e., the number of times that the intended and the perceived (or predicted) vowel were the same. A row total is the frequency that the intended vowel was presented (times five listeners), and a column total is the number of times that vowel was actually

322

ALLEN

A. MONTGOMERY

and PAUL

A. COOKE

perceived. A total matrix was made across both conditions. for the experimental and control samples separately. Preparation

of Tape for Acoustic

These steps were done

Analysis

The speech samples for acoustic analysis consisted of 14 pairs of utterances in which one member of the pair was a dysfluency characterized by one or more part-word repetitions and the stuttered word, and the other member of the pair was a fluent production of that word by the same speaker obtained from another reading of the passage in the adaptation sequence. (Since some of the dysfluencies used in the perceptual analysis came from spontaneous speech, it was not possible to find fluent utterances of the same word in the same context to match them. Accordingly, only 14 of the original 30 stuttered words and their fluent counterparts were used in the acoustic analysis.) Procedure for Acoustic

Analysis

The acoustic analysis was concerned with making three spectrographic ures that reflect the timing and control of the process of articulation. measures are illustrated in Fig. 1, and were defined as follows:

measThese

1. Duration of consonant in stuttered word, and fluent word. This measure, in milliseconds is the duration of the consonant that began the actual production of the stuttered word or the production of the fluent word. 2. Duration of stuttered word, and fluent word minus the initial consonant. This measure, in milliseconds referred to as vowel-to-end, is the duration from the onset of the vowel following the initial consonant to the end of the word. 3. Pattern of first and second formant frequency transitions in the vowel following the initial consonant in the stuttered word and the fluent word. Sound spectrograms were made of all pairs of utterances using a Voiceprint sound spectrograph with normal settings (300-Hz wide filter, O-7000-Hz frequency scale, 1 cm = 79 msec , with high frequency emphasis). Results and Discussion Perceptual

Analysis

Table 1 contains the two matrices of the control samples associated with Condition I in which the listeners were presumably not aware that the samples were stuttering and Condition II in which samples had been identified as repetitions of actual stuttering. The control samples, as noted earlier, were artificially produced CV+ utterances, which were included to help in evaluation of listener

i;,

W h,

a:

o

0

3

u

u

4

3

ai

1

12

13

29

4

3

0 Row totals that deviate from ten are due to unequal presentations

Percent correct: 88/130 = 68% Percent I I responses: 291130 = 22%

Cal. total



13

I

6

er

12

E

7

a:

10

a

11

0

Perceived

6

3

Condition II Control samples

Percent correct: 89/130 = 68% Percent / / responses: 91130 = 7%

12

i

of specific vowel samples.

130

ai

6

5

3

12

5

A

6

9

oi

5

13

A

15



8

U

10

U

13

3

10

3

10

0

10

0

Cal. total

a

15

a

iE

10

a:

E

el

I

i

10

10

10

E

I

er

Presented

E

el

1

Row total u

15

i

Perceived

i

Presented

Condition I Control samples

TABLE 1 Confusion Matrices for Control Samples from Conditions I and II. Vertical Axis in Each Matrix Represents the Vowel Produced Control Puruoses. Horizontal Axis Represents the Vowel Actually Perceived by the Listeners

8

u

17

u

Artificially

9

A

6

3

13

ai

130

Row total a

in CV Context for

324

ALLEN

A. MONTGOMERY

and PAUL

A. COOKE

performance. It can be noted in Table 1 that under both conditions 68% of the presented vowels were correctly identified, with a substantial number of errors occurring only under Condition I where / o / was presented but identified correctly only once out of 10 chances. Of special interest is the percentage of/r\/ responses in the two conditions. In the first condition 22% of the responses were /A/, while in the second condition only 7% of the responses were perceived as /A/. It is difficult to attribute this reduction in /A/ responses to a learning effect since the overall percent correct remained almost identical for both conditions. It is more likely that the listeners simply realized that they had overresponded with /A/ in the first condition, which is a common tendency since the neutral vowel in unstressed form is by far the most commonly occurring vowel in English. Therefore, when in doubt, it is natural to guess the /A/, and they simply resolved to be more accurate on their use of the /A/as a response in the second condition. A second explanation is that since the listeners now knew that the samples were of stuttering when undergoing Condition II, they were able to “relax” and bring their substantial past experience more naturally into play and perform more accurately on this vowel. With the exception of the change in neutral vowel response frequency just noted, the two matrices are reasonably similar, and they were combined into one overall control sample matrix presented in Table 3. The results of hearing the control samples (which with their somewhat longer and more carefully produced vowel segments were considerably easier to identify than the genuine partword repetitions) suggest that the listeners were able to respond accurately and without consistent bias to stimuli of this type. Table 2 presents the results of hearing the experimental samples and is of central interest in the perceptual part of the study. Here, due to the very short vowel segments in most of the part-word repetitions, the percent correct was substantially lower than in the control conditions, attaining 20% in Condition I and 26% in Condition II. This is an above chance performance of around 6% but indicates that the perceptual task was very difficult. One of the major contributors to the low percent agreement was the /I/ intended vowel, which agreed only 5 out of 40 times, with the other responses being distributed over a wide range of vowels in Condition I. Also note that as with the control samples a reduction of the frequency of /A/ responses occurred from the first to the second conditions. With the experimental samples the change was from 32% to 18% /A/ responses from Condition I to Condition II. Probably the most interesting aspect concerning the comparison of Conditions I and II is that the matrices again appear very similar. We conclude that the effect of Condition II, in which the listeners were told that the samples were examples of actual stuttering, was minimal with the exception of the drop in /A/ responses, and that the listeners were simply responding with what they thought they heard in the samples. Once again, because of the general similarity, the two matrices were combined to form an overall experimental matrix, which is presented in Table 3. On the basis of the matrix for the experimental samples in Table 3 we offer some

PERCEPTUAL

AND ACOUSTIC

ANALYSIS

OF STUTTERING

327

comments concerning the presence of the schwa in stutterers’ part-word repetitions. It is clear that the stutterers used in this study did not produce the schwa vowel, at least a perceptual form of it, with anywhere near the frequency that previous observations would have led us to expect. The o;erall percent of schwa samples perceived was 25% with the vast majority of the schwa confusions coming when either / C+/ or /aI / was intended. If these two vowels are omitted, only 14% schwa confusions remain. It should be noted that the listeners were not biased against the schwa vowel, because when the schwa was actually the intended vowel (as in “muscle”) they responded correctly 31 out of 59 times, for the highest percent correct of any intended vowel. We are forced to conclude that previous reports were based on clinical impressions rather than rigorous analysis and reflect the natural tendency of a listener, when faced with short duration vowel segments, as in part-word repetitions or, in general, when faced with uncertainty, to conclude that the neutral vowel had been uttered. However, the results of the present study as compared with past observations may not be as discrepant as they appear. If Van Riper ( 197 1) is correct about the stutterer trying to locate an appropriate articulatory transition, it is possible that the schwa vowel is produced initially during the course of a block containing many part-word repetitions and that the final repetition is closer to the intended vowel. Since most of the samples employed in the present study consisted of one or two repetitions, it is possible that these samples were not severe enough, that is, do not contain enough repetitions to bring out the tendency toward initially inserting the schwa vowel on long trains of part-word repetitions. The findings, however, are consistent with our clinical impression that stutterers, when exhibiting clonic, or especially tonic blocks, will preform the consonant or vowel to follow the initial phoneme. Accordingly, one will often see lip rounding when the stutterer is struggling with the /t/ in /tu/, for example. That is, the stage is set for a normal utterance of the remainder of the word once the difficulty with the initial element is overcome. Acoustic Analysis The spectrographic analysis was performed to compare 14 pairs of stuttered and fluent words on selected acoustic measures. The results of this comparison are available in Table 4, which contains the means and standard deviations of the initial consonant and vowel-teend duration measures. Also included are the means of the absolute differences between corresponding stuttered and fluent segments. This table is the source of several thought-provoking implications. First, and most striking, is the almost perfect agreement between the vowel-to-end measures for the stuttered and fluent words. Specifically, a difference of only 2 msec, between the means of 321 and 3 19 msec, and a mean absolute difference of only 15 msec were obtained. Furthermore, although not shown in Table 4,73% of

328

ALLEN

A. MONTGOMERY

and PAUL

A. COOKE

TABLE 4 Means and Standard Deviations of Duration Measures in Milliseconds Obtained from Spectrograms of Stuttered Words, Fluent Words, and the Two-Syllable Comparison Phrase. Standard Deviations Are Presented in Parentheses under Each Mean. In Addition, the Means of the Absolute Value of the Differences, between the Fluent and Stuttered Segments and between the Two Comparison Segments, Arc Displayed. The Number of Stuttered and Fluent Segments Differ from 14 because Reliable Measurements Could Not Be Made on Certain Pairs Initial consonant (msec) Total length (msec) (N = 12) (N = 11, control = 26)

Vowel-to-end (msec) (N = 11)

Stuttered

434 (166)

102 (70)

321 (164)

Fluent

397 (186)

(66d7)

319 (161)

/Difference/ (ii) Comparison-sample

1

241 (52)

Comparison-sample

2

237 (60)

(::)

/Difference/

the pair members fell within the chosen limit of 22 msec of each other (and 100% were within 40 msec). It should be noted this value of 22 msec, as the maximum difference that would be considered within normal variation, is a very strict criterion, less than 7% of the total duration in some cases, and reflects our desire to avoid falsely concluding that deviant samples were the same in duration. The value was obtained by measuring a two-syllable phrase in a set of pairs of spectrograms obtained from 26 normalspeaking females who, as part of another study, recorded two utterances of a test sentence approximately i/z hr apart. The means that form the basis of this criterion are contained in Table 4, designated as comparison samples. It is apparent that, once the initial consonant in the stuttered word was finished, the remainder of the stuttered word was almost identical in duration to the corresponding portion of the fluent word. An example of this phenomenon is given in Fig. 1, which shows the spectrograms for the stuttered word and its preceding repetitions (bottom) and the fluent word (top) lined up according to vowel onset. The almost perfect agreement between the words once the initial consonant has been produced can be noted. This finding of course would not be surprising if the so-called stuttered words were simply fluent productions. That is, it could have been that the stutterers had completed their blocking with the production of the last repetition and then simply uttered the word itself fluently. That this is not the case, however, is documented by the fact that in 8 out of 12 samples the duration of the initial consonant was appreciably longer, i.e., 20 msec or more, than the corres-

PERCEPTUAL

AND

ACOUSTIC

ANALYSIS

OF STUTTERING

329

ponding consonant in the fluent word. That is, there was obviously some remaining stuttering still present when the stuttered word was uttered in the majority of cases. The data in Table 4 support this finding, and indicate that a mean difference in length of initial consonant of almost 40 msec (102 compared to 63 msec) was obtained. Moreover, a mean absolute difference of nearly 50 msec was found. In other words, the initial consonant of the stuttered words was almost 62% longer than the corresponding fluent consonant, on the average, while the vowel-to-end portions differed by less than 0.6%. These findings, (a) that there is measurable stuttering present on the initial consonant of the actual production of the stuttered word and (b) that in spite of the presence of the stuttering on the initial consonant, the remaining portion of the stuttered word and the comparable fluent word are essentially identical, lead us to conclude that for the carefully selected type of repetitive dysfluency employed in the study, stuttering is essentially confined to the initial consonant. A third, more subjective set of observations was made on the pairs of spectrograms used in the acoustic analysis. These observations were based on visual comparisons of the transition patterns of the first and second formants in theregion of the C-V juncture in the stuttered and fluent words. Essentially, a judgment of same or different was rendered, and where the samples were different, a description of the differences was made. In 62% of the pairs of samples, a difference in the rate and/or extent of formant movement was noted. This relatively high percentage indicates, again, that the production of the stuttered word, following the repeitions, was generally not free of stuttering. It should be noted that these distortions in the production of the word are subtle, and generally not audible, which is in agreement with Agnello ( 1966). The fact that the formant transitions from the consonant to the vowel were different from the corresponding fluent utterance in many of the stuttered words is not inconsistent with our observation on the essential similarity of the vowel-toend portion of the stuttered and fluent words. The formant deviations may simply reflect the fact that the consonant of the stuttered word was produced with an abnormal posture and the articulatory (and therefore acoustic) “path” from this posture to a normally produced vowel was correspondingly disturbed. These observations, however, are not entirely consistent with Agnello (1966) who, as cited by Van Riper (1971), found evidence of reduced second formant movement associated with stuttering movements. The stuttered words in the present study showed noticeable formant transitions in 6% of the cases, but they often were not in the same pattern as the corresponding fluent words. Differences in the nature of the stuttering samples employed in the two studies undoubtedly accounts for much of the apparent discrepancy. In conclusion, this study presented the results of perceptual and acoustic approaches to the study of part-word repetitions in the speech of adult stutterers. The samples employed in the study represent almost every nontense

330

ALLEN

A. MONTGOMERY

and PAUL

A. COOKE

part-word repetition of initial CV+ form in the set of 82 3-min recordings that was screened. Therefore, for this specific type of dysfluency, we conclude that the articulatory breakdown is confined to the initial consonant and that the vowel portion of the repetition is normally not the neutral vowel, and often approximates the intended vowel of the word being produced. Any extension of these observations to other types of dysfluencies, or even to other types of repetitions, must be made with great caution. Accordingly, no implications for understanding the general nature of stuttering can be drawn until a more comprehensive picture of the articulatory breakdown underlying the moment of stuttering has been developed. However, the methods demonstrated in the present study would appear to be useful approaches to describing the stutterer’s articulatory dynamics. References Agnello, J. G. Some acoustic and pause characteristics of nonfluences in the speech of stutterers. Technical report, National Institute of Mental Health, Grant No. 11067-01, 1966. Hinlde, W. A study of subgroups within the stuttering population. Unpublished Ph.D. dissertation, Purdue University, 197 1. Johnson, W., et al. The onset of stuttering.Minneapolis: University of Minnesota Press, 1959. Kroll, A. The differentiation of stutterers into interiorized and exteriorized groups. Unpublished Ph.D. dissertation, Purdue University, 1970. Stromsta, C. A. A spectrographic study of dysfluencies labeled as stuttering by parents. Ther. Vocis Loquellae. 1965, 1, 317-320. Van Riper, C. The nature of stunering. Englewood Cliffs, N.J.: Prentice-Hall, 1971. Wingate, M.E. Evaluation and stuttering, part I: speech characteristics of young children. J. Speech Hearing Dis., 1962, 27, lpi 15.

Perceptual and acoustic analysis of repetitions in stuttered speech.

This study presents perceptual and acoustic data on a carefully selected set of part-word repetitions from the speech of adult stutters. Results indic...
837KB Sizes 0 Downloads 0 Views