J. psychiaf. Res., Vol. 24, No. 4. pp. 335-350, Printed in Great Britain.

A STRUCTURED DEPRESSION

1990 0

INTERVIEW

RATING

SCALE:

AND VERSATILITY MARILYN

K.

POTTS’,

VERSION

MARCIA

0022-3956190 $3.00 + 40 1991 Pergamon Press plc

OF THE HAMILTON

EVIDENCE

OF RELIABILITY

OF ADMINISTRATION DANJELS~,M. AUDREYBURNAM~,and

KENNETHB. WELLS~~~ ‘Department of Psychiatry

of Social Work, California State University, Long Beach, CA 90840-0902, U.S.A.; 2Department and Biobehavioral Sciences, University of California, Los Angeles, California, U.S.A.; and 3The RAND Corporation, Santa Monica, California, U.S.A. (Received 26 March 1990; revised 24 July 1990)

Summary-A structured interview version of the Hamilton Depression Rating Scale (SI-HDRS) is described. Data are presented in support of its inter-rater and internal consistency reliability. SI-HDRS scores were reproducible by trained interviewers who lacked psychiatric backgrounds. Test-retest scores of a subset of patients who were interviewed twice (once in person and once by telephone) were highly correlated. Scores on the SI-HDRS did not differ between face-to-face and telephone administration groups, controlling for demographic factors, depression-specific indicators, and social and physical functioning. Since the SI-HDRS does not require a face-to-face interview by an experienced clinician, this instrument can be used economically in large-scale, communitybased research projects.

Introduction SINCEits initial publication in 1960, the Hamilton Depression Rating Scale (HDRS) has been used extensively as a measure of depression severity (Hamilton, 1960, 1967; Carroll, Fielding, & Blashki, 1973; Hedlund & Vieweg, 1979). Although the HDRS has been criticized for failing to encompass some depression-related symptoms (e.g., anhedonia, non-reactivity of mood, reduction of concentration), it provides extensive coverage of core features of depression (e.g., depressed mood, psychomotor symptoms, feelings of guilt, feelings of inadequacy, reduced energy, sleep disturbance). The HDRS treats depression as a unitary concept, and attempts to identify factors have yielded conflicting results (Hamilton, 1960, 1967; Mowbray, 1972; Paykel, Klerman, & Prusoff, 1970; Ramos-Brieva & CorderoVillafafila, 1988; Weckowicz, Cropley, & Muir, 1971). The HDRS has been shown to have a high degree of scale reliability, and considerable evidence exists for its concurrent, discriminant, and construct validity (Carroll et al., 1973; Cicchetti & Prusoff, 1983; Fava, Kellner, Munari, & Pavan, 1982; Hedlung & Vieweg, 1979; Kearns et al. 1982; Knesevich, Biggs, Clayton, & Ziegler, 1977; Maier, Heuser, Philipp, Frommberger, & Demuth, 1988; Maier, Philipp et al., 1988; Miller, Bishop, Norman & Address for correspondence: Long Beach, 1250 Bellflower

Marilyn K. Potts, PhD, Department of Social Work, Boulevard, Long Beach, CA 90840-0902, U.S.A. 335

California

State University,

336

M. K.

POTTS

el al

Maddever, 1985; Rehm & O‘Hara, 1985; Zimmerman, Coryell, Pfohl, & Stangl, 1986). The HDRS is consistently reported to be sensitive to clinically observed treatment changes; accordingly, it has been used in most clinical trials of antidepressant drugs conducted in the United States. Despite these advantages, the HDRS, which was designed to quantify the severity of depression based on face-to-face clinical interview content, has several limitations as a research tool. Firstly, the item reliability of the HDRS has been criticized (Cichetti & Prusoff, 1983; Maier, Philipp et al., 1988; Rehm SCO’Hara, 1985; Williams, 1988). Since the instrument contains several potential sources of information variance, this is not surprising. Although clinicians using the HDRS are instructed to consider both the intensity and frequency of a symptom when assigning it a value on an anchored rating scale, operationally defined response categories are absent. This may limit the ability of interviewers to agree on the relative “weights” to be given to the intensity and frequency of symptoms, and on the absolute value of each rating. Furthermore, the time frame at which questions are directed encompasses “the last few days or week” (Hamilton, 1967, p. 290). The presence or absence of fluctuating symptoms may thus be subject to varying interpretations. Similarly, the time frame for symptoms involving change (i.e., loss of interest in work and activities, decreased libido, and weight loss) is not specified. Individuals with symptoms of long duration may not be identified as having “changed” their status, thus limiting the accuracy of ratings for such symptoms. Secondly, the HDRS relies heavily on the expertise of the interviewer, who is required to have an extensive psychiatric background. According to Hamilton (1960) himself, the value of the HDRS depends “entirely on the skill of the interviewer in eliciting the necessary information” (p. 56). To increase scale reliability, Hamilton has recommended summing the scores of two raters who independently score the same interview. Finally, the HDRS is not suitable for large-scale and/or community-based studies. Since it cannot be administered by interviewers who lack psychiatric training, and relies on a face-to-face interview, the HDRS would require an inordinate expenditure of time and financial resources if used for such purposes. Resource usage would be doubled if, as recommended by Hamilton, the scores of two raters were summed. Thus, although the HDRS is a widely used and highly accepted indicator of depression severity, a need exists for similar instruments which are more structured, rely on standardized content, and can be administered economically. Two attempts to create such an instrument have been reported (Whisman et al., 1989; Williams, 1988). The instrument developed by Whisman et al. (1989) is contained within the Diagnostic Interview Schedule (DIS). Since it follows the structured format of other DIS items, it requires a minimum of interpretation and thus could feasibly be administered by non-clinicians. However, inter-rater reliability estimates for pre- and post-treatment ratings were found to be low for individual items and only moderately high for the total scale. In addition, it is not known how well this instrument would perform if used apart from the longer DIS. In contrast to the structured format of Whisman et al.‘s (1989) instrument, the one developed by Williams (1988) relies on open-ended questions to encourage patients to describe their experiences in their own words. Follow-up questions are provided for use

STRUCTURED

INTERVIEW HAMILTON

SCALE

331

when further exploration or clarification is necessary, but are designed to be used at the interviewer’s discretion. Interviewers “may also have to add their own follow-up questions to obtain necessary information” (Williams, 1988, p. 743). These features indicate that the interview guide for this version of the HDRS is at best a semi-structured instrument which can be administered only by clinicians. Nevertheless, both item and scale inter-rater reliabilities were considerably higher than those noted by Whisman et al. (1989). Given the previously discussed disadvantages of existing instruments, we have developed a fully structured interview version of the HDRS (the SI-HDRS) for use in the National Study of Medical Care Outcomes (MOS), a longitudinal study of the process and outcome of care for adult patients with a variety of conditions, including depression (Tarlov et al., 1989). This study was designed to contrast depresssion outcomes of patients receiving care from two types of providers (medical practitioners and mental health specialists) and three types of systems (health maintenance organizations; large multispeciality groups; and predominantly fee-for-service, single specialty solo or small group practices). For this purpose, the MOS required an observer measure of depression severity suitable for use with a very large patient sample in multiple geographic sites, and which could be administered by interviewers without specific psychiatric training. The SI-HDRS was designed to meet these requirements. In contrast to the original HDRS, the SI-HDRS relies on standardized questions and response categories, thus leaving little room for information variance. In this paper, we describe the development of the SI-HDRS and address the following questions about its usefulness: 1. What is the reliability of the SI-HDRS? Two types of reliability are examined: interrater and internal consistency. 2. Can the SI-HDRS be administered by trained interviewers who lack psychiatric backgrounds? 3. Can the SI-HDRS be administered by telephone, or is face-to-face interviewing required? To answer this question, we (a) assess whether SI-HDRS scores differ between independent face-to-face and telephone administration groups; and (b) examine test-retest results from a subset of patients who were interviewed both in person and by telephone. The answers to these questions not only provide information regarding the psychometric properties of the SI-HDRS. They address also the feasibility of using this instrument in large-scale research projects where per subject personnel and data-gathering costs must be minimized. Method Development

and Description

of SI-HDRS

The SI-HDRS was developed by one of the authors (MD), and is based on the content of the 17-item version of the HDRS. (See Appendix for SI-HDRS items and range of response categories; the complete instrument and interviewer training materials are available from the authors). Each item in the SI-HDRS corresponds to an item in the original HDRS, but was modified to be suitable for a structured interview. These principles were followed in the development of the SI-HDRS:

M. K. POTTS et al.

338

1. The severity of symptoms was operationalized using clearly defined response categories. 2. Through explicit wording and the use of a specific time frame (within the past month), questions were phrased so as to avoid confusion between the presence or absence of a symptom, and change in that symptom. 3. Instructions were provided for the correct coding of symptoms possibly affected by concurrent medical problems, medication, or behaviors not considered primarily attributable to depression (e.g., weight loss, loss of appetite caused by dieting). Two SI-HDRS items involved interviewer observation and thus required modification for telephone administration. These were item 16, which concerned motor restlessness, and item 17, which concerned speed of thoughts and speech, and impairment of concentration. The face-to-face version of item 16 required the interviewer to rate the respondents’s motor restlessness at the time of the interview; the telephone version of this item asked the respondent to rate directly his or her motor restlessness during the interview. The face-toface version of item 17 required the interviewer to rate the respondent’s speed of thoughts and speech, impairment in concentration, and motor activity at the time of the interview; the telephone version of this item differed only in that no reference was made to motor activity.

Inter-rater Reliability The inter-rater reliability of the SI-HDRS was assessed in a pilot study conducted prior to the initiation of the MOS (Table 1). Two psychiatrists’ ratings for 20 subjects were compared. All subjects were adults with acute major depression hospitalized at the Neuropsychiatric Institute, University of California, Los Angeles. One psychiatrist interviewed each subject; the other observed the interview. Independent ratings were made by each psychiatrist. Between-rater agreement/disagreement was assessed for each of the Table

1

Sources of data for analyses of reliability, reproducibility, and versatility of structural interview version of Hamilton Depression Rating Scale Analysis

Source

of data

Reliability Inter-rater Internal consistency

Pilot test of instrument MOS longitudinal panela

Reproducibility

Interviewer

Versatility

training

of Administration

Comparison of face-to-face and telephone versions

MOS longitudinal

panela

Test-retest using alternate methods

MOS longitudinal

panela

aNational

Study

of Medical

Care Outcomes

(MOS).

STRUCTURED INTERVIEW HAMILTON SCALE

339

17 items (across subjects) and for each of the 20 subjects (across items). Correlations between the psychiatrists’ total scores for each subject were measured using Pearson’s r and the intraclass correlation coefficient (Bartko, 1966). Interviewer Training and Reproducibility of Instrument All interviewers were nurses, none of whom had psychiatric training. These individuals were trained during a three-hour session. Firstly, the SI-HDRS and its accompanying instruction booklet were reviewed. This booklet contained information regarding interviewer-interviewee interactions (e.g., interviewees should not be pressed and should be allowed sufficient time to respond to questions, should not be allowed to wander too far from the point, and should be helped and encouraged to acknowledge symptoms that otherwise may be too distressing to admit). In addition, specific instructions, clarifications, and examples were provided for each item. For example, for item 11, which concerned somatic symptoms of anxiety, interviewers were instructed to rate symptoms as present only if the interviewee associated them as occurring when he or she was also tense, nervous, or anxious. Secondly, trainees used the instrument to rate a videotaped interview conducted by a psychiatrist. These ratings were then discussed and any discrepancies between trainee and psychiatrist ratings were resolved. In addition to its use for interviewer training, the videotaped interview allowed us to assess the reproducibility of the SI-HDRS (Table 1). Following the work of Ziegler, Meyer, Rosen, and Biggs (1978), interviewers’ ratings were compared to those made independently by the psychiatrist who conducted the videotaped interview. This comparison was made prior to the discussion of ratings described above. Agreement/disagreement with this criterion was assessed for each of the 17 items (across interviewers) and for each of the 10 interviewers (across items). Internal Consistency and Versatility Baseline data from the MOS longitudinal panel were used for the remaining analyses (Table 1). Sampling Frame The sampling design used in the MOS has been described elsewhere (Tarlov et al., 1989). Briefly, at each of three study sites, a representative sample of providers was selected from three systems of care. The MOS provider group consisted of medical specialists (i.e., internists, family practitioners, cardiologists, endocrinologists, and diabetologists) and mental health specialists (i.e., psychiatrists, psychologists, psychiatric nurses, and social workers). A representative cross-section of patients was sampled and screened for the presence of hypertension, diabetes, advanced coronary artery disease, and depression. Since the original HDRS was designed for use among depressed patients only, the present study utilized the depressed patient sample. However, patients with another of the aforementioned conditions were not excluded.

340

bl.

I

A structured interview version of the Hamilton Depression Rating Scale: evidence of reliability and versatility of administration.

A structured interview version of the Hamilton Depression Rating Scale (SI-HDRS) is described. Data are presented in support of its inter-rater and in...
1MB Sizes 0 Downloads 0 Views