© 1991 S. Karger AG. Basel 02 54-4962/91 /0242-0088S2.7 5/0

Psychopathology 1991;24:88-105

Speech Characteristics in Depression1 II. H. Siassen, G. Bom ben, E. Gunther Research Department. Psychiatric University Hospital Zurich. Switzerland

1 The study comprised 192 persons stratified ac­ cording to sex, age and education. 3 different texts and repeated measurements from the same individu­ als. This sample serves as a reference with respect to the ‘natural’ variability of speech parameters and allows for deciding upon the significance of voice quality changes [Stassen et al.. 1988a],

It is worth noting that there exist for a consider­ able number of speakers (about 40% of the general population) additional resonance points in the imme­ diate neighborhood of F0, e.g. the terz. quart below F0 or the terz. quint above F0. All subsequently presented results are based on the counting task and on Spearman correlation cocffi-

Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM

Abstract. This study examined the relationship between speech characteristics and psy­ chopathology throughout the course of affective disturbances. Our sample comprised 20 depressive, hospitalized patients who had been selected according to the following criteria: (1) first admission: (2) long-term patient; (3) early entry into study; (4) late entry into study: (5) low scorer; (6) high scorer, and (7) distinct retarded-depressive symptomatology. Since our principal goal was to model the course of affective disturbances in terms of speech parameters, a total of 6 repeated measurements had been carried out over a 2-week period, including 3 different psychopathological instruments and speech recordings from automatic speech as well as from reading out loud. It turned out that neither applicability nor efficiency of single-parameter models depend in any way on the given, clinically defined subgroups. On the other hand, however, no significant differences between the clinically defined subgroups showed up with regard to basic speech parameters, except for the fact that low scorers seemed to take their time when producing utterances (this in contrast to all other patients who, on the average, had a considerably shorter recording time). As to the relationship between psycho­ pathology and speech parameters over time, we found significant correlations: (1) in 60% of cases between the apathic syndrome and energy/dynamics; (2) in 50% of cases between the retarded-depressive syndrome and energy/dynamics; (3) in 45% of cases between the apathic syndrome and mean vocal pitch, and (4) in 71 % of low scores between the somatic-depres­ sive syndrome and time duration of pauses. All in all, single parameter models turned out to cover only specific aspects of the individual courses of affective disturbances, thus speaking against a simple approach which applies in general.

Introduction In 1964. Hargreaves and Starkweather published an outstanding, pioneering study on voice quality changes in depression. They reported an attempt to track changes in mood in hospitalized psychiatric patients utilizing direct measurement of voice spec­ tra. For this purpose, the authors inter­ viewed 8 patients about 4 times per week throughout the course of hospitalization. During the interviews, speech samples were recorded and subsequently used to compute long-term spectra at a resolution of thirdoctave bands. Based on these spectra, multi­ ple regression on two psychopathological rat­ ing scales was carried out and yielded mod­ erately good prediction for some patients. Design and realization of the investiga­ tion relied on the authors’ experiences in the field of speaker recognition [Hargreaves and Starkweather, 1963] and. on the other hand, on the well-known fact that the quality of a patient’s voice holds several cues for the psy­ chiatrist’s intent on obtaining a diagnosis. However, the chosen design with repeated measurements on the same individual over the whole hospitalization period has not been taken up by other investigators. Rather, subsequent studies have concentrated on the analysis of single, scalar speech parameters, in particular ‘pause duration’ and ‘funda­ mental frequency’, and have primarily aimed at differences between parameter val-

cients. Even though correlations turned out to be essentially consistent over the speech tasks under investigation, we decided in favor of the counting task as basis of our analyses in order to be compatible with the majority of earlier studies in the literature. More­ over. correlation coefficients derived from the read­ ing task did not always reach significance.

89

ues derived from the beginning of a patient’s hospitalization and those measured after im­ provement [Alpert, 1983; Avery and Silverman. 1984; Blackburn. 1975: Bouhuys and Mulder-Hajonides, 1984; Bouhuys and Al­ berts, 1984; Godfrey and Knight, 1984; Greden et al., 1981: Hardy et al., 1984; Helfrich et al., 1984: Hinchcliffe et al., 1971; Hollien and Darby, 1979: Johnson et al.. 1986; Klos and Ellgring, 1984: Newman and Mather. 1938; Nilsonne, 1987, 1988; Nilsonne et al.. 1988: Pope et al.. 1970: Rice et al.. 1969: Roessler and Lester, 1976: Saxman and Burk, 1968; Szabadi et al., 1976; Szabadi and Bradshaw, 1980: Teasdale et al.. 1980: Tolkmitt et al., 1982; Weintraub and Aronson, 1967]. Indeed, speech dysfunctions, such as slow, delayed or monotonous speech, are prominent features of severe depression, ma­ nia and schizophrenia. ‘The patients speak in a low voice, slowly, hesitatingly, monoto­ nously. sometimes stuttering, whispering, try several times before they bring out a word, become mute in the middle of a sentence. They become silent, monosyllabic, can no longer converse' [Kraepelin, 1921]. Accord­ ingly, clinicians routinely monitor speed of talking among affectively disturbed patients for diagnostic purposes and as indicators of clinical change [Greden and Carroll. 1980]. Moreover, clinicians frequently observe that the speech of depressed patients is uni­ form and sometimes exhibits a ‘regular repe­ tition of gliding intervals' [Darby and Hol­ lien, 1977] and that the pitch alterations of these patients are narrowed, giving the voice a monotonous quality [Leff and Abberton, 1981]. Based on such experiences, re­ searchers regarded the parameters ‘speech pause time’ and ‘fundamental frequency’ together with related quantities - as most

Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM

Speech Characteristics in Depression

promising and as having a great deal to offer, not just for the very sophisticated assess­ ment of affective range and quality, but for assessment at a sufficiently accurate level. As a consequence, a considerable number of in­ vestigations into these speech characteristics has been carried out during the past 2 de­ cades in order to quantify the perceptual observations in an observer-independent, re­ producible way. The respective results, how­ ever. are controversial: ‘While these studies suggest that depression is associated with distinctive speech patterns, cross compari­ son of the research has not been possible because wide variations in diagnosis and methodology exist between the various stud­ ies. Furthermore, the findings have been less than dramatic, suggesting that precise de­ scription and sensitive measuring scales are required in order to distinguish speech pat­ terns and measure differences' [Darby et al„ 1984], The authors themselves mostly regarded their results as preliminary and open to im­ provement. Andreasen et al. [1981], for ex­ ample, relativized the results of their study on flat affect as follows: ‘This study is best regarded as a pilot or preliminary investiga­ tion that should be retested with additional refinements. It contains several problem ar­ eas that, although they do not negate the sig­ nificant findings, should be adequately worked through in future research.’ Also, re­ cent studies in the field of speech pause time in depression yielded no real break through: ‘Measuring pause times in speech is a com­ plex task, and the relationship between dif­ ferent pauses and measures of retardation needs further study’, [Nilsonne. 1988]. Then the author continues: ‘The measures of fun­ damental frequency changeability were lower in the depressed patients than in the

Stassen/Bomben/Giinther

control subjects: these measures could possi­ bly be used to differentiate between de­ pressed and non-depressed groups’. Discrim­ ination between psychiatric patients and normal controls on the basis of single-pa­ rameter models, however, also did not fully succeed, for example, in the case of schizo­ phrenia by means of a traditional discriminance analysis [Clemmer. 1980]. In summary, even though a wide range of clinical judgements on affective disorders can be derived from speech samples, no ob­ jective. sufficiently powerful approach to the measurement of talking behavior through acoustic variables is currently available. The main reason for this is that the underlying processes are too complex to be grasped in­ tuitively. In particular, measurements may be contaminated by external factors, such as diurnal variation or motor retardation, amongst others, whose effects seem to be superimposed on the speech production pro­ cess. Our knowledge about these phenome­ na. at its present state, is insufficient, and available results are more or less preliminary and sometimes inconsistent. Greden and Carroll [ 1980], for example, pointed out that ‘the shift between morning and evening goes in the opposite direction in the endogenous depressives than it does in normal subjects or in recovered depressives’, whereas Hoff­ mann et al. [1985] reported: ‘A diurnal vari­ ation in speech pause time was found in con­ trol subjects, but not in depressed patients, whether retarded or nonretarded’. With re­ gard to motor retardation. Szabadi and Bradshaw [1983] argued: ‘Thus one cannot escape the conclusion that speech pause time and other psychomotor tests do not measure the same thing'. Future research in this field therefore involves systematically exploring all the different sources of variation in talk­

Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM

90

ing behavior of psychiatric patients as well as of normal controls in order to reliably assess the properties inherent to human speech.

Representing Speech Characteristics Speech characteristics can be roughly de­ scribed by a few major features: speech flow, loudness, intonation and intensity of over­ tones. Speech flow comprises the speed at which utterances are produced as well as the number and duration of temporary breaks in speaking. Loudness reflects, on the one hand, the amount of energy used to articu­ late utterances and, when regarded as a time-varying quantity, the speakers dy­ namic expressiveness on the other hand. In­ tonation is the manner of producing utter­ ances with respect to rise and fall in pitch. Accordingly, intonation leads to tonal shifts in either direction of the speaker’s mean vocal pitch. Overtones are the higher tones which faintly accompany a fundamental tone and are responsible for the tonal diver­ sity of sounds. The question, however, of how to operationalize speech characteristics is controversely discussed in the literature and obviously depends on the specific prob­ lem. Indeed, models for speech recognition or speech perception necessarily require dif­ ferent sets of parameters than do ap­ proaches to psychoacoustic sensations or psychiatric applications. Principally, one distinguishes between time domain parame­ ters (concerning speech flow, energy, dy­ namics) and frequency domain parameters (concerning intonation and overtones). In what follows, we will give a short overview of speech parameters relevant to psychiatric applications.

91

Speech Parameters o f the Time Domain

Speech rate is defined either by the amount of time used to produce utterances or by the number of syllables uttered during a given time. Mean utterance duration and variability of utterance duration describe the statistical properties of utterances within a given text. These parameters are used to test the hypothesis that patients speak more slowly during depression than they do after recovery. Pause time is the summed dura­ tion of between-utterance breaks or the num­ ber of temporary breaks during a given time. Mean pause duration and variability of pause duration describe the statistical prop­ erties of temporary breaks within a given text. These parameters allow for testing the hypothesis that number and duration of tem­ porary breaks decrease with recovery from depression. Energy measures the total amount of en­ ergy used to articulate the utterances of a given text. Mean energy and the correspond­ ing variability of energy, the latter repre­ senting the speaker's dynamic expressive­ ness, describe the statistical properties of loudness within a given piece of speech. These parameters make it possible to test the hypothesis that ‘flat affect’ can be defined in terms of speech parameters as a signal with low average energy per second and a stan­ dard deviation very much smaller than that of the norm. Besides the above-mentioned, most widely used parameters of the time domain, we also analyze (1) utterances per second (mean value and variability) which measure the percent proportion of utterances within each second; (2) pauses per second (mean value and variability) which measure the percent proportion of temporary breaks within each second, and (3) energy per sylla­

Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM

Speech Characterislics in Depression

Stassen/Bomben/Günther

ble (mean value and variability) which mea­ sure the amount of energy used to articulate a syllable as well as the speaker’s dynamic expressiveness at a syllable level. Speech Parameters o f the Frequency Domain

Spectral analyses which yield a transfor­ mation of the time domain into the fre­ quency domain result in spectra which can be directly interpreted as power-frequency distributions. However, the time interval length on which a spectral analysis is to be based is a critical value if the processes un­ der investigation are essentially nonstationary, as is the case for speech signals. The prob­ lem of how to appropriately define time in­ tervals for psychiatric purposes will be dis­ cussed in the following paragraph. The fundamental frequency FO is defined as the first maximum of a given power-fre­ quency distribution. For a sufficiently long time interval (1-2 s), FO is a good estimator of the speaker’s mean vocal pitch. The vari­ ability of FO can be measured either by the standard deviation of FO (estimated from a sufficiently representative speech sample) or by the time derivative of FO. The specific form of the FO distribution curve is called the FO contour. Parameters of the frequency domain particularly allow for testing the hy­ pothesis that fundamental frequency change­ ability is lower during depression than after recovery.

Material and Methods Sample Our sample consisted of 12 male patients (mean age 40.8 years, standard deviation 12.9 years) and 8 female patients (mean age 46.8 years, standard devia­ tion 13.8 years), who had been recently hospitalized

at the Psychiatric University Hospital Zürich with the following diagnoses: 11 affective psychoses (ICD9: 296.1,296.3). 3 schizoaffective psychoses in de­ pressed state (ICD9: 295.7). I neurotic depression (ICD9: 300.4). 2 alcohol dependences (ICD9: 303). and 3 reactive depressions (ICD9: 309.0. 309.1). There w'ere 15 entries into the study within the first 2 weeks after hospitalization (75%) and 5 entries after at least 1 full month of hospitalization (25%). This sample also comprised a subgroup of 5 first admissions, a subgroup of 5 long-term patients with more than 6 months' hospitalization, and another subgroup of 6 patients who scored high on the re­ tarded-depressive AMDP scale. Furthermore. wfe sub­ divided our sample with respect to psychopathology into low scorers (n = 7), high scorers (n = 5). and aver­ age scorers (n = 8), on the basis of the average AMDP and Hamilton scores at entrance into the study. A detailed description of the patients who joined this calibration study is given in table 1. Rating Instruments The experimental setup followed the same design as in the previous pilot study [Stassen, 1988; Stassen et al.. 1989]: patients were rated by their psychiatrists by means of the AMDP and Hamilton instruments during a psychiatric exploration at a fixed time in the morning each Monday. Wednesday, and Friday throughout 2 weeks. Immediately after psychiatric exploration, patients were asked to fill out the Zung affective state self-rating test. Speech recordings w'ere carried out in an acoustiely shielded room, digitized on-line, and comprised three pieces of spoken texts: ( I) counting out loud from I to 30: (2) reading out loud a fixed text selected for its simplicity and emo­ tional neutrality, and (3) counting out loud again from 1 to 30. Based on the AMDP and Hamilton instruments, psychopathology was operationalized through the af­ fect-specific AMDP scales ‘apathic syndrome’, ‘so­ matic-depressive syndrome", ‘retarded-depressive syn­ drome’ and ‘manic-depressive syndrome’ (the latter being a second-order scale), and through the ‘Hamil­ ton 17 score. Signal Processing All speech signals were inspected visually and marked with an artifact code if necessary so that dis­ turbed intervals could be removed prior to data anal­ yses. In a next step, segmentation tables were set up in

Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM

92

Speech Characteristics in Depression

93

Table 1. Composition of calibration sample with respect to sex, age. age of onset, diagnosis, psychopathological score, entry into study, hospitalization time, type of depression Sex

Age years

Onset years

25 3 1 4 23 7 11 9 26 5 12 6 18 14 2 8 21 24 17 22

M F M F M M F M M M M M F F F M M F F M

40 54 42 55 49 46 36 34 33 67 57 43 49 21 52 28 27 65 41 23

19 53 37 51 47 28 29 34 15 67 38 24 17 20 32 26 24 53 29 22

Admission

first

first first

first

first

ICD diagnosis

AMDP score

Entry day

Depression Hospital time, days

295.7 296.3 309.1 300.4 296.1 303.0 303.0 309.0 295.7 296.1 296.1 296.3 296.1 296.1 296.1 309.1 296.1 296.1 296.1 295.7

47 49 41 39 48 37 40 42 51 47 53 38 41 43 50 45 52 48 40 52

H

2 3 4 4 4 5 5 7 7 8 8 9 11 15 18 35 35

L H

47 130

36 41 41 16 178 62 30 91 26 325 74 31 41 44 184 59 61 81 101 322

L L L L H H L L H

oc-, r*

Patient No.

ret

ret

ret ret

ret

ret

order to identify pauses and utterances, whereby pauses of less than 250-ms duration were skipped. Finally, we calculated long-term spectra on the basis of 1-second epochs by means of a discrete Fourier transformation (‘pure’ utterances with pauses having been eliminated for spectral analysis) [a more de­ tailed description of the different data processing steps is given in Stassen et al. 1988a]. As outlined before, the lime interval length on which spectral analyses are based is a crucial point in the field of nonstationary processes since the theory of ‘local' time-dependent spectra requires a problem-oriented definition of time intervals. Indeed, for speech sig­ nals the following properties apply: the shorter the time interval length the more strongly spectra follow the ‘tonal- composition of the underlying speech enti­ ties (c.g. vowels, diphthongs, consonants, plosives, fricatives). Accordingly, successive short-time spec­

tral analyses with a sliding time window of about 20100 ms are best suited in the field of automatic speech recognition. On the other hand, formant reso­ nances move during continuous speech relatively slow because of the physical limitations on how quickly tongue, lips and jaw can be moved. Thus, spectra derived from time intervals of 1—2 s length most clearly reveal the distinct individuality of a speaker's overtone distribution (automatic speaker recognition problem). Investigations into the talking behavior of affec­ tively disturbed patients focus on modelling phe­ nomena like apathy and depression (reflected, for example, by speech characteristics like flat timbre and lack of intonation) or aggressivity and stress (re­ flected. for example, by a shift of mean vocal pitch). Obviously, such phenomena are related to longer persisting processes which have to be separated

Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM

L = Low; H - right; ret - retarded-depressive.

94

Stassen/Bomben/Giinther

b ____________________________________________________________________________________________

(1) from short-time fluctuations due. for example, to the spoken text or to interactions with the immediate environment, and (2) from circadian variations. Ac­ cording to the findings of our investigation into the reproducibility and sensitivity of speech parameters in the general population, the pronounced individu­

ality of a speaker's fundamental frequency (FO) char­ acteristics can be determined in a sufficiently repro­ ducible way from a time interval of about I-second length. Due to the fact that spectral analysis yields a pow­ er-frequency distribution with spectral intensities

Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM

Fig. 1. Typical distributions of the fundamental frequency FO for male speakers (a) and female speakers (b) after smoothing.

Speech Characteristics in Depression

Results Our sample comprised depressive, hospi­ talized patients who had been selected accord­ ing to the criteria: first admission (n = 5). long­ term patient (n = 5), early entry into study (n = 15). late entry into study (n = 5). low psycho­ pathology scorer (n = 7), high psychopathol­ ogy scorer (n = 5), and distinct retardeddepressive symptomatology (n = 6). With re­ spect to psychopathology, significant differ­ ences between first registration and last regis­ tration 14 days later showed up for various subgroups of patients as well as for the total sample (table 2). No significant changes over the period of observation could be found, however, for the apathic syndrome. There are indications that a considerable number of patients did not improve in this regard during therapy. On the other hand, differences in the other scores under investigation (somaticdepressive syndrome, Hamilton 17 score) turned out to be significant at the 1% level for the total sample, indicating good response to treatment. Since the principal goal of this investigation was to test the efficiency of sin­ gle-parameter models, we analyzed all major speech parameters proposed in the literature independently of each other. Our analyses included the following speech parameters: (1) average pause duration; (2) number of pauses; (3) average pause duration per sec­ ond; (4) average utterance duration; (5) aver­ age energy/second. (6) variation of energy/ second, (7) average energy/syllable; (8) varia­ tion of energy/syllable; (9) registration time; (10) total length of pauses; (11) total length of utterances. (12) average vocal pitch; (13) vari­ ation of vocal pitch; (14)F0 narrowness, (15) FO contour. The comparison of clinically defined sub­ groups yielded, with respect to speech behav-

Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM

proportional to amplitudes and time durations of the underlying partial tones, the FO contour as well as the FO variability can be reliably estimated from a 1 to 2-second spectrum, rather than from a sequence of short-time Fourier transforms. Figure 1 show's typical FO distributions for male and female speakers (after smoothing). The first maximum designates in each case the fundamental frequency FO: whereas the high­ er-frequency maxima designate the higher formants F l, F2 etc. We approximated the shape of the FO dis­ tribution curve by a 2nd degree polynomial2 and used the distance between the symmetrical 6-dB points as a measure for the FO variability and the ratio height/ width of the polynomial as a measure of ‘FO narrow­ ness". All frequency differences were calculated in quartertones thus providing for a direct comparabil­ ity of FO contours, independently of the actual funda­ mental frequencies. A total of 15 speech parameters of both the time domain and the frequency domain were extracted from the cleaned speech recordings by means of a set of computer programs described elsewhere [Stassen et al.. 1988a], The applied computational algorithms were identical with those presented in the literature [Nilsonne. 1988] except for the frequency domain parameters. Since the principal goal of our study was to measure the course of affective disturbances in terms of speech parameters, we combined, for each individual separately, the results of the 6 repeated measurements into 6-dimensional vectors in order to compute correlation coefficients between ‘rating vec­ tors’ and ‘speech vectors'. Dependencies were subse­ quently investigated on the basis of Spearman corre­ lation coefficients whose significance was tested sta­ tistically. In addition, we tested differences between registrations at group level by means of the Wilcoxon matched-pairs test whereas differences between clini­ cally defined subgroups were tested by the MannWhitney U test. Thus, we could determine the effi­ ciency of all major single-parameter models which have been proposed in the literature. In particular, we addressed the following questions: (1) Which psvchopathological quantities are correlated with speech behavior changes in depressive patients during im­ provement? (2) Are single-parameter models power­ ful enough to model the individual course of depres­ sive disturbances over time in a sufficiently represen­ tative sample of patients? (3) Are there differences in speech behavior between clinically defined subgroups of patients?

95

Stassen/Bomben/Giimher

96

Table 2. Significant differences between first registration and last registration 14 days later for various subgroups of patients and psychopathology

Retarded Early entry Late entry High scorer Low scorer Long-term First admission Total sample

n

APATH

SOMD

RETD

MAND

HAMD

6 15 5 5 5 5 5 20

NS NS NS NS NS NS NS NS

1% 1% NS 1% NS NS NS 1%

1% 1% NS NS NS NS 1% 1%

1% 1% NS NS NS NS NS 1%

1% 1% NS 1% NS NS NS 1%

APATH = Apathic syndrome: SOMD = somatic-depressive syndrome: RETD = retarded-depressive syn­ drome; MAND = manic-depressive syndrome; HAMD = Hamilton 17 score. Test results are based on the Wilcoxon matched-pairs signed-rank test.

Utterances. n:s

Energy/s

n

X

SD

n

X

SD

n

X

SD

n

X

First registration Early entry Late entry First admission Long-term Low scorer High scorer Retarded Total sample

21 20 18 23 26 20 23 20

355 317 288 335 437 331 343 345

231 209 190 231 269 217 198 226

42 43 40 42 47 40 41 42

339 304 328 321 359 343 342 330

167 134 156 149 175 174 162 160

13 12 12 13 17 12 13 13

418 429 410 445 449 408 429 421

172 162 194 166 181 174 157 170

25 23 22 25 32 24 25 25

6.4 7.7 8.0 6.6 5.6 9.2 8.0 6.7

Last registration Early entry Late entry First admission Long-term Low scorer High scorer Retarded Total sample

14 days later 16 329 219 14 278 280 14 284 194 14 321 319 20 397 304 14 309 206 19 310 182 16 318 235

42 38 40 41 46 36 40 41

328 348 331 333 333 377 358 332

147 171 155 152 138 193 165 153

12 11 11 11 14 11 12 11

390 374 369 364 424 359 377 386

175 216 185 172 179 191 163 185

23 20 20 21 27 22 24 22

6.7 8.3 8.7 7.1 6.5 7.4 7.3 7.1

FO variability SD

n

X

SD

Total length s

2.7 4.5 3.9 4.1 2.2 3.6 3.2

3 3 3 3 4 3 3 3

10.0 10.4 10.4 10.6 11.4 9.4 9.0 10.1

2.5 1.7 2.4 2.3 2.9 2.0 1.6 2.3

26 24 23 25 33 24 25 25

3.6 4.9 3.7 4.3 4.9 4.2 3.6 4.0

3 3 3 3 3 3 4 3

10.4 9.5 11.9 10.4 10.8 9.4 9.2 10.2

3.1 1.5 4.9 3.6 1.9 1.9 1.9 2.9

23 21 21 22 28 22 24 23

oo

Pauses/s

r«-i

Pauses, nts

FO variability measured as 6-dB bandwidth in quartertones: n = averaged observed number per patient: total length = registration time. The experimental situation is counting out loud from 1 to 30.

Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM

Table 3. Differences between first registration and last registration 14 days later for various subgroups of patients and selected speech parameters

Speech Characteristics in Depression

97

Table 4. Significant differences between first registration and last registration 14 days later for various subgroups of patients and selected speech parameters

Retarded Early entry Late entry High scorer Low scorer Long-term First admission Total sample

n

Pauses

Pauses/s

Utterances

Energy

F0 variability

6 15 5 5 5 5 5 20

NS NS NS NS NS NS NS NS

1% 1% 1% 1% NS 1% 1% NS

NS NS NS NS NS NS NS NS

NS NS NS NS NS NS NS NS

NS NS NS NS NS NS NS NS

The experimental situation is reading out loud emotionally neutral text (Wilcoxon matched-pairs signedrank test).

Table 5. Significant differences between first registration and last registration 14 days later for various subgroups of patients and selected speech parameters

Retarded Early entry Late entry High scorer Low scorer Long-term First admission Total sample

n

Pauses

Pauses/s

Utterances

Energy

F0 variability

6 15 5 5 5 5 5 20

NS NS NS NS NS NS NS NS

NS NS NS NS 1% 1% NS 5%

NS NS 1% 1% NS NS NS NS

NS NS NS NS NS NS NS NS

NS NS NS NS NS NS NS NS

ior, an essentially homogeneous picture, ex­ cept for the fact that low scorers had by far the longest registration time for each of the three different texts (counting from 1 to 30, reading standard text, and counting from 1 to 30 after reading). This increased registra­ tion time was obviously due to prolonged speech pause times because corresponding utterance durations were comparable to

those of the other groups (table 3). With respect to the speech parameters under in­ vestigation, significant differences between first registration and last registration 14 days later showed up very inconsistently, and, moreover, not at all for the total sample. In particular, no relationship between psychopathological improvement and speech pa­ rameter changes could be found (table 4, 5).

Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM

The experimental situation is couting out loud from I to 30 (Wilcoxon matched-pairs signed-rank test).

The psychopathology vectors computed sep­ arately for each patient from the 6 repeated measurements displayed a very individual pattern, indicating that the variation of re­ spective scores over time is characterized by a pronounced individuality. Changes in both directions, towards a less severe as well as towards a more severe symptomatology, can be observed, and the degree to which changes manifest themselves varies from in­ dividual to individual. Accordingly, the effi­ ciency of speech parameter models with re­ gard to the course of psychopathology has been tested by means of individual correla­ tion analyses. Detailed correlation analyses yielded several remarkable results. First of all, 4 single-parameter models proved in­ deed to be suited for measuring relevant aspects of the course of affective disorders: the ‘pause model’, the ‘energy model’, the ‘dynamics model’ and the ‘pitch model’. All other speech parameters under question, in particular time duration of utterances, turned out to be too unspecific or too loosely related to the psychopathological processes. As to the correlation between psychopa­ thology and speech parameters, our results provide ample evidence for a close relation­ ship between the apathic syndrome and speech dynamics: we found a significant cor­ relation in 60% of cases, 7 cases at the 5% level and 5 cases at the 10% level. The energy model yielded almost identical re­ sults. However, there were 2 additional pa­ tients whose energy values displayed a signif­ icant correlation with the apathic syndrome but whose speech dynamics remained con­ stant during the observation period. Hence, in more than two thirds of patients, the energy/dynamics model turned out to be well suited to monitor the apathic aspect of de­ pression.

Stassen/Bomben/Giinther

Positive as well as negative correlations showed up. For some patients, speech changed during improvement from low­ voiced. monotonous forms towards standard values, whereas in others, changes from loud and abrupt speech towards standard values could be observed. No external criteria were found allowing for a classification of patients according to their speech behavior with re­ gard to the energy/dynamics model. A similar picture was found for the rela­ tionship between the retarded-depressive syndrome and speech behavior with respect to energy and dynamics. Analyses yielded a significant correlation in 50% of cases, 7 cases at the 5% level and 3 cases at the 10% level. Again, positive as well as negative cor­ relations showed up. However, it must be emphasized that 3 different types of patients were found for whom the energy/dynamics model applied: (1) patients with correlations to the apathic syndrome only: (2) patients with correlations to both the apathic and the retarded-depressive syndrome, and (3) pa­ tients with correlations to the retarded-de­ pressive syndrome only. Another important result concerned the subgroup of low scorers. We found a signifi­ cant. positive correlation between the speech parameters ‘time duration of pauses' and the somatic-depressive syndrome in 71% of cases (3 cases at the 5% level and 2 cases at the 10% level). Contrary to our expectations and contrary to the results of several studies in the literature, there was no clear relation­ ship between time duration of pauses and the retarded-depressive syndrome. Only one third of the sample showed a significant cor­ relation. 2 cases at the 5% level and 3 cases at the 10% level. Moreover, only 1 patient of the retarded-depressive subgroup turned out to display this latter correlation. As to the

Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM

98

99

Speech Characteristics in Depression

type seems to more inertly follow changes in affect and sometimes correlations do not reach significance. In spite of all that is now known about the characteristics of speech types, further un­ derstanding of the underlying speech pro­ duction processes is still necessary in order to decide upon specific application fields. Accordingly, a decision in favor of one of the different speech types seems premature.

Discussion Considering the results of our normative investigation into the speech characteristics of 192 healthy subjects and based on the experience of our pilot study with 6 depres­ sive patients, it was not likely that this present study could lead to a single-parame­ ter model which meets the psychiatric re­ quirements and applies in general. Indeed, the distinct individuality of speech charac­ teristics together with the large variety of individual courses of affective disturbances speak against a fairly simple single-parame­ ter model. Nevertheless, our approach to modelling the course of affective distur­ bances in terms of speech parameters has revealed several important clues. First, nei­ ther applicability nor efficiency of the new method depended on the clinically defined subgroups under investigation. Thus, from our current perspective, it seems likely that no principal restrictions concerning the ap­ plicability of the new method exist. Second, with regard to basic speech parameters, no significant difference between the clinically defined subgroups showed up. Thus, speech parameters seem to be less suited to differen­ tiate between clinical categories and will cer-

Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM

correlation between vocal pitch and psycho­ pathology. our results were somewhat disap­ pointing. Contrary to earlier findings in the literature, our study did not lead to a clear picture. Although mean vocal pitch was sig­ nificantly correlated in 45% of cases with the apathic syndrome (3 cases at the 5% level and 6 cases at the 10% level), the expected close relationship between pitch variations and psychopathology could not be con­ firmed. Only a few patients showed a signif­ icant correlation between pitch variation and the somatic-depressive syndrome. As to the Zung scale, detailed analysis revealed a significant correlation between observer rating and self-rating in 13 cases (Spearman coefficient r > 0.6, p < 10%, for at least 4 of 5 interview scales), a nonsignifi­ cant correlation in 1 case, and no correlation at all in 6 cases. In other words, for one third of samples, interview rating and self-rating yielded completely independent results whereas, for the other two thirds, a high coincidence between both rating types showed up. Another important issue of our study elu­ cidated the old problem of how to get a patient to speak. A direct comparison of automatic speech (i.e. counting out loud) with reading out loud, demonstrated the ad­ vantages and disadvantages of the 2 ap­ proaches: Automatic speech displays more variability and seems to follow changes in affect more closely. This advantage, how­ ever, is achieved at the expense of reduced reproducibility. Reading out loud enables speakers to produce more overtones and to develop speech patterns of a ‘richer’ quality. Moreover, speech parameters derived from the reading task are better reproducible as compared to those derived from automatic speech. On the other hand, speech of this

100

Stasscn/Bomben/Gümher

tainly not allow for modelling traditional diagnostic schemes on this basis. Third, pos­ itive as well as negative correlations were found between psychopathology and speech parameters.

This latter point is of particular interest since the predictive value as well as the applicability of speech parameter models can be seriously affected by such a fact. Polarity of the speech parameters energy and

Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM

Fig. 2. a, c Spectral voice patterns reflect the overtone structure (intensity contours and variability) of a hospitalized patient throughout the course of affective disturbances. Pattern derived from the first measure­ ment (a) and the pattern derived from the third measurement 5 days later (c). Spectral intensities are plotted along a log-proportional scale (y-axis) and frequency covers 7 octaves from 64 to 8.192 Hz (x-axis).

Speech Characteristics in Depression

101

dynamics, for example, imply that changes in psychopathology may be accompanied by speech parameter changes in both directions, towards increased or reduced values. In other words, for some patients speech

changed during improvement from low­ voiced forms without dynamics towards standard values, whereas in others, changes from loud and abrupt speech towards stan­ dard values can be observed (such standard

Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM

b, d Spectral voice patterns reflect the overtone structure (intensity contours and variability) of a hospitalized patient throughout the course of affective disturbances. Pattern derived from the fourth measurement (8 days alter the first measurement) (b). and pattern derived from the last measurement 14 days later (d). Spectral intensities arc plotted along a log-proportional scale (y-axis) and frequency covers 7 octaves from 64 to 8.192 Hz (x-axis).

values are available through our normative study with 192 healthy subjects stratified ac­ cording to sex. age and education). As a con­ sequence. a preclassification of patients is required when monitoring therapeutic ef­ fects. such as responses to treatment, with the help of the above speech parameters. Since the ICD diagnostical scheme failed to yield this classification at a sufficiently accu­ rate level, we are developing, on the basis of our normative data, a fully automatized pro­ cedure which makes it possible to classify patients through their first speech record­ ing. A solution of this classification problem, however, will not necessarily mean much progress with regard to our principal goal of developing a method of assessing affect in terms of speech parameters which is practi­ cable in the sense that it is ( I ) applicable to most psychiatric patients, and (2) easy to carry out in standard form. According to the results of this present investigation, the energy/dvnamics model applies only to two thirds of patients whereas, in the other pa­ tients. speech behavior does not change in this respect during improvement. Moreover, the energy /dynamics model assesses only the apathie aspect of depression sufficiently well (the relationship to the retarded-depres­ sive syndrome is less clear). Similar findings hold for the other speech parameters under investigation: they only apply to relatively small subsets of affectively disturbed pa­ tients thus requiring the application of com­ plicated classification procedures prior to any analysis. A multivariate approach based on feature vectors into which scalar speech parameters are combined will possibly pro­ vide a broader basic for the method. The respective methodological development is under way.

Stassen/Bomben/Günther

Alternatively, we pursue another method of approach which aims at measuring voice quality changes in the frequency domain by pattern recognition techniques used, for ex­ ample. in the field of computerized speaker verification. As outlined before, our spectral analyses yield a quartertone resolution over the full frequency range of 7 octaves be­ tween 64 and 8.192 Hz. Accordingly, we are able to simultaneously analyze the contours of all formants relevant to the underlying speech process. This is of particular impor­ tance because the tonal richness or. in turn, the lack of tonal expressiveness is made up by the complete ensemble of overtones, rather than by the mere fundamental fre­ quency. Within the scope of our investigations into the genetic determination of human brain wave patterns [Stassen ct ah. 1988b], we have developed our concept of spectral patterns. Such spectral patterns reflect, on the one hand, the above-mentioned overtone distribution and. on the other, the corre­ sponding spectral variabilities. We applied this approach to our sample of healthy sub­ jects in order to determine the 'natural' dy­ namic properties of individual overtone dis­ tributions. For this purpose, we determined the optimum time interval length for spec­ tral analysis by iteratively optimizing the cri­ terion ‘computerized recognition of persons from a sufficiently large population of speak­ ers’. The resulting time interval length turned out to be large enough that for the characteristic frequency distribution to be revealed, yet small enough for the individual variability of each spectral compound to show up. In particular, we found that 90% of persons could be uniquely identified by their spectral voice patterns (n = 97). A cross-vali­ dation by means of an independent sample

Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM

102

Speech Characteristics in Depression

eral population (comprising 192 healthy sub­ jects and repeated measurements under 3 different experimental conditions) which will allow' us to distinguish between ‘natural’ and ‘significant’ fluctuations and. (4) the next steps in this research project are clearly laid out. Undoubtedly, speech characteris­ tics in depression will be the subject of study and fascination throughout the coming years.

References Alpert. M.: Encoding of feelings in voice; in Clayton. Barrett. Treatment of depression: old controver­ sies and new approaches, pp. 217-228 (Raven Press. New York 1983). Andreascn, N.C.: Alpert. M.: Martz. M.J.: Acoustic analysis, an objective measure of affective flatten­ ing. Archs gen. Psychiat. 38: 281-285 (1981). Avery. D,: Silverman. .1.: Psychomotor retardation and agitation in depression. Relationship to age. sex. and response to treatment. J affect. Disorders 7: 67-76(1984). Blackburn. I.M.: Mental and psychomotor speed in depression and mania. Br. J. Psychiat. 126: 329335 (1975). Bouhuys. A.L.; Muldcr-Hajonides. W.: Speech timing measures of severity, psychomotoric retardation, and agitation in severely depressed patients. J comm. Disorder 17: 277-288 (1984). Bouhuys. A.L.: Alberts. E.: An analysis of the organi­ zation of looking and speech-pause behaviour of depressive patients. Behaviour 80: 269-298 (1984). Clemmer. E.J.; Psyeholinguistic aspects of pauses and temporal patterns in schizophrenic speech. J. psy­ cholinguist. Res. 9: 161-185 (1980). Darby. J.K.: Speech evaluation in psychiatry. (Grune & Stratton. New York 1981). Darby. J.K.; Hollien. H.: Vocal and speech patterns of depressive patients. Folia phoniat. 29: 279-291 (1977). Darby. J.K.: Simmons. N.: Berger. P.A.: Speech and voice parameters in depression: A pilot study. .1. comm. Disorder ¡7: 75-85 (1984). Godfrey. H.P.: Knight. R.G.: The validity of actom-

Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM

(n = 90) yielded identical results [Stassen, 1990]. The frequency-related variability of the formants F0. FI. F2 ... as expressed by the respective ‘contours' and the intensityrelated variability of overtones explain most (90%) of the interindividual diversity of speech sounds. In accordance with these conclusive findings, we expect that the same kind of adaptive strategies will lead to simi­ larly powerful solutions when addressing the problem of intraindividual changes in voice timbres, provided a sufficiently large num­ ber of characteristic speech samples is avail­ able. Thus, we arc now collecting data from affectively disturbed patients with clinically well-defined changes in psychopathology over time. These data will subsequently serve as learning samples for determining significant changes (qualitatively as well as quantitatively) in the overtone structure (contours and intensities) of patients during improvement. Of particular interest in this context is the relationship between the varia­ bilities 'within measurement’ and ‘between measurements'. The ratio of these latter quantities represents the criterion function during optimization. Examples of overtone structure changes during therapy are given in fig. 2. In summary, the point we have arrived at is disappointing in the sense that there ob­ viously exists no sufficiently powerful single­ parameter model for monitoring changes in the affective state of patients during im­ provement which meets the requirements of psychiatric applications. Nevertheless we are optimistic since (1) the proposed method has proved to be practicable: (2) the applied adaptive procedures (‘learning to recognize') have led to reasonable results with regard to the overtone structure of speakers. (3) we have available reference data from the gen­

103

etcr and speech activity measures in the assess­ ment of depressed patients. Br. J. Psychiat. 145: 159-163 (1984). Grcden. J.F.: Caroll. B.J.: Decrease in speech pause times with treatment of endogenous depression. Biol. Psychiat. 15: 575-587 (1980). Greden. J.F.: Albala, A.A.: Smokier. I.A.: Gardner. R.: Caroll. B.J.: Speech pause time: a marker of psychomotoric retardation in endogenous depres­ sion. Biol. Psychiat. / 6: 851-859 (1981). Hardy, P.; Jouvent, R.; Widlöchcr. D.: Speech pause time and the retardation rating scale for depres­ sion (FRD). J. affect. Disorders 6: 123-127 (1984). Hargreaves. W.A.; Starkweather. J.A.: Recognition of speaker identity. Lang. Speech 6: 63-67 (1963). Hargreaves. W.A.; Starkweather. J.A.: Voice quality changes in depression. Lang. Speech 7: 8488/218-220 (1964). Helfrich. H.: Standke. R.; Scherer. K.R.: Vocal indi­ cators of psychoactive drug effects. Speech Commun. 3: 245-252 (1984). Hinchcliffe. M.K.; Lancashire. M.: Roberts. F.J.: De­ pression: defence mechanisms in speech. Br. J. Psychiat. IIS: 471-472 (1971). Hoffmann. G.M.: Gonze. J.C.: Mendlewicz. .1.: Speech pause time as a method for the evaluation of psychomotoric retardation in depressive ill­ ness. Br. J. Psychiat. 146: 535-538 (1985). Hollien. H.: Darby. J.K.: Acoustic comparisons of psychotic and non-psychotic voices: in Hollien. Hollien. Current issues in the phonetic sciences, pp. 829-835. (Benjamins. Amsterdam. 1979). Johnson. W.F.: Emde, R.N.: Scherer. K.R.; Klinncrt, M.D.: Recognition of emotion from vocal cues. Arclts gen. Psychiat. 43: 280-283 (1986). Klos, K.T.: Ellgring. H.: Sprechgeschwindigkeit und Sprechpausen von Depressiven; in Hautzingcr. Straub. Psychologische Aspekte depressiver Stö­ rungen (Roderer. Regensburg 1984). Kraepelin. E.: Manic-depressive insanity and para­ noia (transl. M. Barclay), (Livingstone. Edinburgh 1921). Leff, J.; Abbcrton. E.: Voice pitch measurements in schizophrenia and depression. Psychol. Med. II: 849-852 (1981). Nevlud. G.N.; Fann. W.E.; Falck. F.: Acoustic param­ eters of voice and neuroleptic medication. Biol. Psychiat. 18: 1081-1084 (1983).

Stassen/Bombcn/Günthcr

Newman. S.: Mather. V.G.: Analysis of spoken lan­ guage of patients with affective disorders. Am. J. Psychiat. 94: 912-942 (1938). Nilsonne. A.: Acoustic analysis of speech variables during depression and after improvement. Acta psychiat. scatid. 76: 2 3 5-24 5 (1987). Nilsonne. A.: Speech characteristics as indicators of depressive illness. Acta psychiat. scand. 77: 253263 (1988). Nilsonne, A.; Sundbcrg. J.: Ternstrom, S.; Askcnlclt. A.: Measuring the rate of change of voice funda­ mental frequency in fluent speech during mental depression. J. Acoust. Soc. Am. 83: 716-728 (1988). Pope. B.; Blass. T.: Siegman. A.W.: Rahcr, J.: Anxiety and depression in speech. J. consult, clin. Psychol. 35: 123-133 (1970). Rice. D.G.: Abroms. G.M.: Saxman. J.H.: Speech and physiological correlates o f ‘flat’ affect. Archs gen. Psychiat. 20: 566-572 ( 1969). Roessler. R.: Lester. J.W.: Voice predicts affect dur­ ing psychotherapy. J. nerv. ment. Dis. 163: 166— 176 (1976). Saxman. J.H.: Burk. K.W.: Speaking fundamental fre­ quency and rate characteristics of adult female schizophrenics. J. Speech Hearing Res. II: 194— 203 (1968). Stassen. H.H.: Modelling affect in terms of speech parameters. Psychopathology 21: 83-88 (1988). Stassen, H.H.: Computerized recognition of persons by spectral voice patterns. Meth. Inform. Med. (submitted. 1990). Stassen, H.H.: Bomben, G.: Affective state and voice: reproducibility and sensitivity of speech parame­ ters. Meth. Inform. Med. 27: 87-96 ( 1988a). Stassen, H.H.: Lykken. D.T.: Propping. P.; Bomben. G.: Genetic determination of the human EEG. Survey of recent results on twins reared together and apart. Hum. Genet. 80: 165-176 (1988b). Stassen. H.H.: Kuny, S.; Woggon, B.: Angst. J.: Affec­ tive state and voice: results of a pilot study with 6 depressive patients. Pharmacopsychiatry 22: suppl.. pp. 17-22 (1989). Szabadi. E.; Bradshaw, C.M.: Speech in depressive states, in Simpson. Psycholinguistics in clinical practice, pp. 211-252 (Irvington, New York 1980). Szabadi, E.: Bradshaw, C.M.: Speech pause time: Be­ havioural correlate to mood. Am. J. Psychiat. 142: 265 (1983).

Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM

104

Speech Characteristics in Depression

Weintraub, W.; Aronson. H.: The application of ver­ bal behavior analysis to the study of psychological defense mechanisms: IV. Speech patterns associ­ ated with depressive behavior. J. nerv. ment. Dis. 144: 22-28 (1967). H.H. Stassen Psychiatric University Hospital Zurich Research Department PO Box 68 CH-8029 Zurich (Switzerland)

Downloaded by: Kings's College London 137.73.144.138 - 11/2/2017 11:20:14 AM

Szabadi. E.: Bradshaw. C.M.; Besson. J.A.: Elonga­ tion of pause-time in speech: a simple, objective measure of motor retardation in depression. Br. J. Psychiat 129: 592-597 (1976). Teasdale, J.D.; Fogarty. S.J.: Williams, M.G.: Speech rate as a measure of short-term variation in depres­ sion. Br. J. soc. Psychol. 19: 271-278 (1980). Tolkmitt. F.; Helfrich. H.: Standke. R.: Scherer. K.R.: Vocal indicators of psychiatric treatment effects in depressives and schizophrenics. J. common. Disorders. 15: 209-222 (1982).

105

Speech characteristics in depression.

This study examined the relationship between speech characteristics and psychopathology throughout the course of affective disturbances. Our sample co...
2MB Sizes 0 Downloads 0 Views