Magnitude of placebo response and response variance in antidepressant clinical trials using structured, taped and appraised rater interviews compared to traditional rating interviews.

Journal of Psychiatric Research 51 (2014) 88e92

Contents lists available at ScienceDirect

Journal of Psychiatric Research journal homepage: www.elsevier.com/locate/psychires

Magnitude of placebo response and response variance in antidepressant clinical trials using structured, taped and appraised rater interviews compared to traditional rating interviewsq Arif Khan a, b, *, James Faucett a, Walter A. Brown c, d a

Northwest Clinical Research Center, 1951 e 152nd Place NE, Bellevue, WA 98007, USA Department of Psychiatry and Behavioral Science, Duke University School of Medicine, Durham, NC, USA Department of Psychiatry and Human Behavior, Brown University, Providence, RI, USA d Department of Psychiatry, Tufts University, Boston, MA, USA b c

a r t i c l e i n f o

a b s t r a c t

Article history: Received 22 November 2013 Received in revised form 7 January 2014 Accepted 9 January 2014

The high failure rate of antidepressant clinical trials is due in part to a high magnitude of placebo response and considerable variance in placebo response. In some recent trials enhanced patient interview techniques consisting of Structured Interview Guide for the MontgomeryeAsberg Depression Rating Scale (SIGMA) interviews, audiotaping of patient interviews and ‘central’ appraisal with Rater Applied Performance Scale (RAPS) criteria have been implemented in the hope of increasing reliability and thus reducing the placebo response. However, the data supporting this rationale for a change in patient interview technique are sparse. We analyzed data from depressed patients assigned to placebo in antidepressant clinical trials conducted at a single research site between 2008 and 2012. Three trials included 34 depressed patients undergoing SIGMA depression interviews with taping and RAPS appraisal and 4 trials included 128 depressed patients using traditional interview methods. Using patient level data we assessed the mean decrease in total MADRS scores and the variability of the decrease in MADRS scores in trials using SIGMA interviews versus trials using traditional interviews. Mean decrease in total MADRS score was significantly higher in the 3 trials that used SIGMA interviews compared to the 4 trials using traditional interviews (M ¼ 13.0 versus 8.3, t(df ¼ 160) ¼ 2.04, p ¼ 0.047). Furthermore, trials using SIGMA had a larger magnitude of response variance based on Levene’s test for equality of variance (SD ¼ 12.3 versus 9.4, F ¼ 7.3, p ¼ 0.008). The results of our study suggest that enhanced patient interview techniques such as SIGMA interviews, audiotaping and RAPS appraisal may not result in the intended effect of reducing the magnitude of placebo response and placebo variance. Ó 2014 The Authors. Published by Elsevier Ltd. All rights reserved.

Keywords: Clinical trials Research design Psychiatric interview Depression interview Antidepressants Placebo

1. Introduction The longstanding notion that antidepressant-placebo differences in antidepressant clinical trials were large, predictable and stable was refuted more than a decade ago (Walsh et al., 2002; Khan et al., 2002a). Walsh et al. showed that the magnitude of

q This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-No Derivative Works License, which permits non-commercial use, distribution, and reproduction in any medium, provided the original author and source are credited. * Corresponding author. Northwest Clinical Research Center, 1951 e 152nd Place NE, Bellevue, WA 98007, USA. Tel.: þ1 425 453 0404. E-mail addresses: [email protected] (A. Khan), [email protected] (J. Faucett).

placebo response had been increasing for a few decades and Khan et al. showed that more than 50% of antidepressant clinical trials with approved antidepressants failed to show statistical significance over placebo, mostly due to the high magnitude of placebo response. Some researchers (Leon et al., 1995; Perkins et al., 2000; Lipsitz et al., 2004) suggested that much of this failure can be attributed to low inter-rater and intra-rater reliability. Their argument was based on the fact that the magnitude of symptom reduction was large and variable from one antidepressant trial to the next, anywhere from 10% to 50%. Furthermore, the variance in the group of depressed patients assigned to placebo is typically large as measured by the standard deviation (SD). The SD in the depressed patients assigned to

0022-3956/$ e see front matter Ó 2014 The Authors. Published by Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.jpsychires.2014.01.005

A. Khan et al. / Journal of Psychiatric Research 51 (2014) 88e92

placebo at the beginning of the trial is typically about two to four and at the end of the trial it is likely to be around nine or ten. The view that increasing the reliability of ratings may reduce the placebo response in antidepressant trials was supported by data from Kobak et al. (2005, 2007) and Engelhardt et al. (2006) who showed that the inter-rater reliability was low among raters participating in antidepressant clinical trials. In addition, Cogger (2007) conducted a post-hoc analysis of two antidepressant trials and showed that raters who adhered to a structured format and asked specific and focused questions showed larger antidepressant-placebo differences than raters who did incomplete or cursory evaluations. These data lent credence to the idea that ‘raters should be rated’ to improve trial outcome (Engelhart et al.). Based on the findings above, a few independent companies were set up to develop methods to improve the reliability of data in antidepressant clinical trials. Specifically, a structured interview format (SIGMA interview) for the MontgomeryeAsberg Depression Rating Scale (MADRS; Montgomery and Asberg, 1979) was created and proposed as the standard interview for antidepressant clinical trials (Williams and Kobak, 2008). Second, raters located away from the trial sites (‘central raters’) would conduct the depression interviews via video-conferencing or via audiotaped interviews on the phone using the SIGMA format. Another alternative was to require raters at the trial sites (‘local raters’) to use the SIGMA format during the trials and to audiotape the interviews. The local raters would be scrutinized for their ‘adherence’ to SIGMA conventions throughout the trial by central raters who were independent of the site. Then independent reviewers from the vendors review the taped interviews of the local raters for interview format using the criteria outlined in the Rater Applied Performance Scale (RAPS): Adherence, Follow-up, Clarification, Neutrality, Rapport, and Accuracy of Ratings as outlined by Lipsitz et al. (2004). We have termed this review an outside appraisal. Adherence was defined as asking specific closed-ended questions followed by specific probes to define an ‘anchor point’ in the MADRS scale. Clarification and follow-up relate to obtaining specific data from the depressed patient as to events that occurred over the previous week. For example, the depressed patient is prompted to recall the time to bed on each of the previous seven days, specific number of awakenings and total sleep for each night. Neutrality was defined as maintaining a neutral research rapport but not an empathic one as it was considered that such empathy would enhance placebo response. The introduction of these criteria into antidepressant clinical trials were based on earlier reports (Lipsitz et al., 2004; Kobak et al., 2005) showing increased reliability of RAPS compliant interviews. Kobak et al. (2010) reported that the ‘central raters’ got a lower placebo response than ‘local raters’. The authors attributed this lower placebo response in part to lower pre-randomization mean total MADRS scores by the ‘central raters’. However, two reports have challenged the concept that SIGMA interviews including audiotaping and RAPS appraisal lead to better outcome. Oren et al. (2008) reported results from a trial of an investigational medication versus escitalopram and placebo using centralized raters that determined patient eligibility and assessed the primary outcome measure. The central raters conducted evaluations with video-conferencing and were blinded to study design and the week of the patient visit. The authors reported no significant difference between placebo and escitalopram and also the reported results for the investigational medication were significantly worse than placebo (p ¼ 0.04). More recently, Targum et al. (2013) reported results from a study evaluating trial outcome differences based on the method of rating: ‘central raters’ using a structured format and taping versus local site

89

raters. Outcome was evaluated using the Inventory of Depressive Symptomatology-30 Item Clinician Version: IDSc30 (Rush, 2000) and Quick Inventory of Depressive Symptomatology: QIDS (Rush et al., 2003). A significant difference between combination buspirone treatment versus placebo was reported on the IDSc30 for the site based ratings (p ¼ 0.030) whereas the centralized ratings did not detect this difference (p ¼ 0.124) (details reported in Table 3 of the Targum et al. publication). Given this background we undertook the current study. Specifically, we evaluated the magnitude of placebo response and response variance among depressed patients assigned to placebo in seven recently conducted multi-center Phase II and Phase III antidepressant clinical trials, three of which used SIGMA interviews with audiotaping and central ratings with four using traditional interview methods similar to those used over a decade ago. We hypothesized that the magnitude of placebo response and the variance of placebo response would be lower in antidepressant trials that used the SIGMA interview techniques with audiotaping and central RAPS rating appraisal compared to the traditional nonSIGMA MADRS interview techniques. 2. Methods 2.1. Background Based on the publications by Cogger (2007) and Kobak et al. (2007, 2010) a few independent companies (vendors) have set up methods for evaluating depressed patients such as SIGMA interviews with audiotaping of the rating interviews and outside appraisal using the RAPS criteria. Some pharmaceutical companies have chosen to implement such rater interview techniques but others have not and instead continue to use traditional psychiatric interview methods. Thus, the Northwest Clinical Research Center (NWCRC) has been using the two methods of rater interviews based on the choice of the sponsoring pharmaceutical companies in a random order without having any say in which method is selected. As several of the agents in our study either remain in development phase or have recently been submitted to the FDA for regulatory review, the data from the group assigned to antidepressants remain confidential and proprietary. Thus, the focus of our evaluation was on response to patients assigned to placebo based on the interview method used to conduct the trial. 2.2. Study selection Between 2008 and 2012, the NWCRC was an investigative site for twelve antidepressant clinical trials that used an acute, randomized, double-blind, parallel-group and placebo controlled design evaluating mono-therapy for Major Depressive Disorder (MDD) Each of these trials excluded patients diagnosed with Treatment Resistant Depression as specified by a failure to respond to two or more antidepressants in the past. Of the twelve trials, three used the newer interview rating format that consisted of using SIGMA, and audio-taping of individual patient evaluations at each visit (usually weekly) for six to eight weeks. We were able to access the randomizations codes for all of these three antidepressant trials. One of the three trials that used SIGMA rating methods required that an anonymous ‘central rater’ conduct a telephone interview and taped the interview (Trial #1 in Table 1). In this trial the scores reported by the central rater were considered the primary outcome variable. The other two trials required the ‘local raters’ to follow the SIGMA interview schedule and audiotape the interviews (Trial #2 and Trial #3 in Table 1). Specific written feedback was provided throughout the trial to the raters as well as the Principal

90


Table 1 Years of conduct, antidepressant clinical trial characteristics and key mean total MADRS rating scale scores for the placebo arm of seven antidepressant clinical trials categorized by the method of depressed patient interviews. Trial start date

Trial duration

Risk of placebo exposure

N patients

Mean age

%Female

Mean total baseline MADRS scores (SD)

Section 1: Trials using structured, video or audio-taped and appraised depression interviews managed by non-NWCRC vendors Trial 1 2011 8 50% 12 37 8 31.0 (7.0) Trial 2 2009 8 25% 10 40 60 37.7 (5.5) Trial 3 2009 8 50% 12 38 58 35.8 (3.8) Section 2: Trials using traditional depression clinical interview methods practiced at NWCRC Trial 4 2008 8 50% 73 42 44 31.0 (3.7) Trial 5 2009 5 33% 18 39 50 31.3 (3.8) Trial 6 2010 6 33% 10 49 30 32.3 (4.4) Trial 7 2012 8 50% 27 41 52 31.2 (4.4)

Mean decrease in total MADRS (SD)

%Symptom reduction (SD)

15.2 (12.4) 12.3 (14.2) 11.3 (11.1)

48.4 (35.3) 34.6 (41.9) 31.5 (31.7)

9.7 9.6 5.5 4.9

(9.5) (7.4) (9.8) (9.5)

31.8 31.1 17.7 15.3

(31.4) (24.5) (30.1) (34.3)

Abbreviations: MADRS ¼ MontgomeryeAsberg Depression Rating Scale, SD ¼ Standard Deviation.

Investigator about the quality of ratings as described in the ‘rating the rater’ publication by Engelhardt et al. (2006). Of the nine trials that used the traditional non-SIGMA MADRS interview method for evaluating patients, we were able to obtain randomization codes for only four of them [it is common to obtain around 50% of randomization codes as previously reported in Khan et al. (2002b)]. 2.3. Inclusion/exclusion of the patients Aside from the method of patient evaluation, the two groups of trials were similar in design and patient population. All of the trials recruited male and female adult patients between the ages of 18 and 70 with a diagnosis of MDD based on Diagnostic and Statistical Manual of Mental Disorders e Fourth Edition criteria (American Psychiatric Association, 2000). The patients all agreed to an informed consent detailing study procedures and were specifically informed of the possibility of being recorded during the depression interview. All included female patients were not pregnant and agreed to use adequate contraception during the studies. Depressed patients with significant physical or psychiatric comorbidities were excluded from the studies as were patients with established diagnosis of Treatment Resistant Depression. Patients with a history of substance abuse in the past six months were excluded as were patients that required medications that were excluded by the study protocol. Patients that did not have at least one post-baseline follow-up measurement were excluded from the study. There were no systematic differences in patients recruited for trials that used SIGMA interview methods with taping and outside appraisal as compared to the patients recruited for trials that used traditional interviews. The exclusion/inclusion criteria were homogeneous across the trials. Patients were randomly enrolled into a trial using either methods of interview based on factors including clinical evaluation date and enrollment start and stop dates for the respective trials. 2.4. Synthesis and analysis of data After receiving randomization codes from pharmaceutical company sponsors we tabulated the clinical trial characteristics of the studies. We recorded the trial start date and duration of the double-blind treatment phase of each of the trials. We then tabulated the patient level data by creating a database using the statistical software SPSS (version 19.0). The database consisted of a patient identifier, baseline MADRS score, last observation carried forward MADRS score as well as information on the age and sex of each patient enrolled. Only the

patients that were assigned to placebo were included in the dataset. In other words, all of our comparisons were done using individual patient level data as each patient was rated by either the traditional method or the structured method depending on which trial they were enrolled. In order to review and report the mean patient level outcomes for each individual trial, a numerical code was created to represent each of the seven individual trials. A numerical code was then created to quantify the rating style used in the trials as the independent variable. All of the patients that were enrolled in trials that were conducted using traditional patient rating methods were coded 1, with all of the patients enrolled in trials that were conducted using the structured method of rating the patients coded 2. We conducted three statistical tests during our analysis of the two groups of trials. First, we used and independent samples t test to compare the baseline scores. To statistically evaluate the outcome for all of these patients assigned to placebo we conducted an independent samples t test comparing the mean decrease in MADRS change scores based on the type of patient interview method used (patients that were interviewed using the traditional rating method versus those interviewed using the SIGMA method with taping and RAPS appraisal). Levene’s test for equality of variance was used to evaluate the similarity of change score distribution between the two sets of trials. 3. Results A total of 162 patients were assigned to placebo in the seven clinical trials that were included in our study. The mean age of the patients was 41 years and 56% of the patients were male. There were no significant age differences or differences in gender distribution between the trials using SIGMA interviews versus trials using traditional non-SIGMA MADRS interviews. As shown in Table 1 the trials were dichotomized into two groups for our evaluation: trial #1 used central raters who followed the SIGMA interview format that was taped and appraised using RAPS method. Trials #2 and #3 used local raters who followed the SIGMA format, with interviews being taped and appraised by outside reviewers. Trials #4 to #7 used the traditional psychiatric interview format to obtain MADRS scores. The results of our statistical comparison based on the rating style used in the trials are shown as Table 2. Our first comparison of the baseline scores indicated that the patients enrolled in trials that used SIGMA interviews had significantly higher scores at the prerandomization visit compared to the patients enrolled in trials using traditional interviews (M ¼ 34.7 6.1 versus M ¼ 31.2 3.9, t(df ¼ 160) ¼ 4.08, p < 0.001.


91

Table 2 Comparison of outcome with placebo treatment as measured by the mean decrease in total MADRS scores and the variance in the mean decrease in the total MADRS score as measured by the standard deviation among the patients enrolled in seven antidepressant trials.a

N patients assigned to placebo Mean (Standard Deviation) baseline score Mean decrease in total MADRS score Variance of the mean decrease in total MADRS Score reported as standard deviation

Trials (N ¼ 3) using structured, video or audio-taped and appraised depression interviews managed by non-NWCRC vendors

Trials (N ¼ 4) using traditional depression clinical interview methods practiced at NWCRC

34 34.7 (6.1) 13.0 12.3

158 31.2 (3.9) 8.4 9.4

Probability valueb,c

—

t(df ¼ 160) ¼ 4.1, p < 0.001 t(df ¼ 160) ¼ 2.04, p ¼ 0.047 F ¼ 7.3, p ¼ 0.008

Abbreviation: MADRS ¼ MontgomeryeAsberg Depression Rating Scale. a The mean MADRS score decrease and variance of the mean decrease were calculated using patient-level data from trials using traditional rating methods (N ¼ 128) as well as the trials using structured rating methods (N ¼ 34), they were not calculated using cumulative trial level data. b An independent samples t test was used to evaluate the significance of difference in MADRS change scores. c The variance of the groups was significantly different based on Levene’s test for equality of variance. The reported probability value does not assume equal variances.

Our evaluation of outcome of the patients assigned to placebo indicated that the mean decrease in MADRS score was significantly larger in the patients that were interviewed using SIGMA with audiotaping and outside RAPS appraisal of the interviews (M ¼ 13.0) versus the patients that were interviewed using the traditional interview methods (M ¼ 8.3), t(df ¼ 160) ¼ 2.04, p ¼ 0.047. Lastly, we compared the variance of the mean MADRS decrease scores of two groups of patients and found that the patients interviewed using SIGMA with audiotaping and outside RAPS appraisal had significantly greater variability (SD ¼ 12.3) versus the trials using traditional patient interview methods (SD ¼ 9.4), F ¼ 7.3, p ¼ 0.008. 4. Discussion The aim of the study was to evaluate if structured SIGMA interviews that are audiotaped and appraised by central raters using RAPS criteria reduced the magnitude of placebo response in antidepressant clinical trials. Contrary to expectations we found that the magnitude of placebo response as well as the variance in placebo response was significantly higher in the antidepressant clinical trials that used the newer techniques as compared to the traditional techniques. The results of our study follow the results reported by Oren et al. (2008) who found a placebo decrease of 34% in a trial conducted using these techniques that was nearly identical to the response to the active comparator escitalopram. This sample consisted of a multi-site study including 160 depressed patients (escitalopram N ¼ 52, placebo N ¼ 108). As has been shown, more than a 30% mean decrease from total pre-randomization scores in the placebo group is a harbinger of antidepressant clinical trial failure (Khan et al., 2003). Our findings also parallel those of Targum et al. (2013) who compared the data from ratings by ‘local raters’ for IDSc30 and showed an almost five fold greater difference in combination drug therapy for depression versus placebo when compared to ‘central raters’ evaluations (4.84 versus 1.07, Table 3, p. 949). The Targum et al. study consisted of a sample of 101 depressed patients (combination therapy N ¼ 67, placebo ¼ 34). On the other hand our findings are very different from those reported by Kobak et al. (2010). We found a significantly higher baseline score in the trials that used SIGMA interviews with taping and outside appraisal by RAPS criteria whereas Kobak et al. reported that site interviewers inflated the baseline score as compared to central raters of the same patients. In addition, Kobak et al. found that the decrease in placebo group was larger for site raters than it was for central raters. Our study on the other hand

found a significantly larger and more variable placebo response in pivotal clinical trials using SIGMA interview methods as compared to the trials using traditional interview methods. Thus, three of four studies that have prospectively evaluated enhanced rating techniques including SIGMA interviews, audiotaping and outside appraisal do not support the contention that these techniques decrease the placebo response or response variance or increase antidepressant-placebo differences. Even if these new techniques resulted in data similar to that derived from traditional interviews, the unnecessary burden in both cost and time to use SIGMA ratings with central RAPS appraisal is difficult to justify. The implication of increasing variance is that the power calculations for such trials need review. Specifically, an increase in variance of 30% would decrease the power by a third or more. So far, to our knowledge no antidepressants have been approved by the FDA as a result of trials that use the SIGMA rating techniques with taping and outside appraisal of the patient interviews. In trying to understand our unexpected finding we asked several raters about their rating experiences. In discussing her reaction to the structured and appraised interview one rater said, “You are aware that your adherences to the exact language of the form as well as your assessment skills are being monitored. Of course, the interviewer begins to focus more on the details of the interview form and her own performance and attention is less available to focus on the patient. What is lost is the level of information that comes from a more global perception of the patient presentation, including non-verbal cues”. Another rater had this to say about the SIGMA ratings with audiotaping and RAPS appraisal. “These are question and answer sessions and do not get into an in-depth exploration of the patient’s world that is unique to the individual. Also, I feel that patients feel that they are being scrutinized for good behavior and have to present themselves in the best light and are reluctant to criticize their experience. Simply put the patients become actors and seem to perform for the ratings”. The results of our study along with those of Oren et al. (2008), Kobak et al. (2010) and Targum et al. (2013) represent only the tip of the iceberg in regards to evaluating the newer depression interview methods. Many more antidepressant clinical trials have been conducted over the past five years using the taped SIGMA interviews with outside appraisal, or with other newly introduced interview techniques. It is important that the data from trials conducted using these interviews be scrutinized by their sponsors and by the vendors who practice these interview techniques. Until then, the question of whether structured and taped interview techniques may help in obtaining a lower magnitude of placebo response, lower placebo variance and potentially increased

92


antidepressant-placebo differences requires further research. Simply put, the concept of increased reliability in patient interviews may not simply translate into a decrease in the magnitude of placebo response and its variance. It is possible that these phenomena may be more related to the clinical skills of the individual rater rather than reliability of ratings. In summary, the results of our study suggest that enhanced patient interview techniques such as SIGMA interviews, audiotaping and RAPS appraisal may not result in the intended effect of reducing the magnitude of placebo response and placebo variance, but may rather have the unintended consequence of increasing or worsening it. Conflict of interest *Arif Khan, MD, Principal Investigator of over 340 clinical trials sponsored by over 65 pharmaceutical companies and 30 CROs, has done no compensated consulting or speaking on their behalf, nor does he own stock in any of these or other pharmaceutical companies. Dr. Khan is not compensated for his role as author. In 2009 Dr. Khan founded Columbia Northwest Pharmaceuticals LLC, and is Medical Director of the company. Columbia Northwest Pharmaceuticals owns intellectual property rights for potential therapies for Central Nervous System disorders and other medical conditions. *James Faucett is an employee of the Northwest Clinical Research Center. *Walter A. Brown has no competing interests to declare. Author contributions *Dr. Khan oversaw collection of the data, drafted preliminary versions of the manuscript, designed the experiments and critically revised of the manuscript. *James Faucett drafted preliminary versions of the manuscript, created the database, conducted statistical evaluations and critically revised the manuscript. *Dr. Brown designed the experiments and critically revised the manuscript. *All authors have approved the final version of the manuscript. Role of the funding source All of the authors were salaried by their institution at time of manuscript preparation. None of the authors received compensation for authorship of the manuscript.

Acknowledgment None. References American Psychiatric Association. Diagnostic and statistical manual of mental disorders. text revision. 4th ed. Washington, DC: American Psychiatric Association; 2000. Cogger KO. Rating rater improvement: a method for estimating increased effect size and reduction of clinical trial costs. J Clin Psychopharmacol 2007;27:418e9. Engelhardt N, Feiger AD, Cogger KO, Sikich D, DeBrotta DJ, Lipsitz JD, et al. Rating the raters: assessing the quality of Hamilton depression rating scale for depression clinical interviews in two industry-sponsored clinical drug trials. J Clin Psychopharmacol 2006;26:71e4. Khan A, Khan S, Brown WA. Are placebo controls necessary to test new antidepressants and anxiolytics? Int J Neuropsychopharmacol 2002a;5:193e7. Khan A, Khan SR, Shankles EB, Polissar NL. Relative sensitivity of the Montgomerye Asberg depression rating scale, the Hamilton depression rating scale and the clinical global impressions rating scale in antidepressant clinical trials. Int Clin Psychopharmacol 2002b;17(6):281e5. Khan A, Detke M, Khan SRF, Mallinckrodt C. Placebo response and antidepressant clinical trial outcome. J Nerv Ment Dis 2003;191:211e8. Kobak KA, Feiger AD, Lipsitz JD. Interview quality and signal detection in clinical trials. Am J Psychiatry 2005;162:628. Kobak KA, Kane JM, Thase ME, Nierenberg AA. Why do clinical trials fail? The problem of measurement error in clinical trials: time to test new paradigms? J Clin Psychopharmacol 2007;27:1e5. Kobak KA, Leuchter A, DeBrota D, Engelhardt N, Williams JB, Cook IA, et al. Site versus centralized raters in a clinical depression trial: impact on patient selection and placebo response. J Clin Psychopharmacol 2010;30:193e7. Leon AC, Marzuk PM, Portera L. More reliable outcome measures can reduce sample size requirements. Arch Gen Psychiatry 1995;52:867e71. Lipsitz J, Kobak K, Feiger A, Sikich D, Moroz G, Engelhardt N. The rater applied performance scale: development and reliability. Psychiatry Res 2004;127:147e 55. Montgomery SA, Asberg M. A new depression scale designed to be sensitive to change. Br J Psychiatry 1979;134:382e9. Oren D, Marcus R, Pultz J, Dockens R, Croop R, Coric V, et al. A clinical trial of corticotropin releasing factor receptor-1 antagonist (CRF1) for the treatment of major depressive disorder. In: Poster presented at the 2008 annual meeting of the American college of neuropsychopharmacology, 7e11 December; Scottsdale, AZ 2008. Perkins DO, Wyatt RJ, Bartko JJ. Penny-wise and pound-foolish: the impact of measurement error on sample size requirements in clinical trials. Biol Psychiatry 2000;47:762e6. Rush AJ. The inventory of depressive symptomatology (IDS): clinician (IDS-C) and self-report (IDS-SR). Int J Method Psychiatric Res 2000;9(2):45e59. Rush AJ, Trivedi MH, Ibrahim HM, Carmody TJ, Arnow B, Klein DN, et al. The 16-item quick inventory of depressive symptomatology (QIDS), clinician rating (QIDS-C) and self-report (QIDS-SR): a psychometric evaluation in patients with chronic major depression. Biol Psychiatry 2003;54:573e83. Targum SD, Wedel PC, Robinson J, Daniel DG, Busner J, Bleicher LS, et al. A comparative analysis between site-based and centralized ratings and patient self-ratings in a clinical trial of major depressive disorder. J Psychiatric Res 2013;47:944e54. Walsh BT, Seidman S, Sysko R, Gould M. Placebo response in studies of major depression: variable, substantial, and growing. JAMA 2002;287:1840e7. Williams JW, Kobak K. Development and reliability of a structured interview guide for the MontgomeryeAsberg Depression Rating Scale (SIGMA). Br J Psychiatry 2008;192:52e8.

Marion B. Sulzberger: excerpts from taped interviews and talks.

Factors contributing to the increasing placebo response in antidepressant trials.

Structured interviews for borderline personality disorder.

[Agreement between Clinical Evaluation and Structured Clinical Interviews in Psychosomatic Inpatients].

Variance imputation for overviews of clinical trials with continuous response.

Using Joint Interviews to Add Analytic Value.

Reliability of video taped interviews with children suspected of being sexually abused.

The public health system response to the 2008 Sichuan province earthquake: a literature review and interviews.

The natural history of conducting and reporting clinical trials: interviews with trialists.

Placebo response in antipsychotic clinical trials: a meta-analysis.

Psychiatric disorders among obese patients seeking bariatric surgery: results of structured clinical interviews.

Using smartphone apps in STD interviews to find sexual partners.

Hospital library evaluation using focus group interviews.

Brain Connectivity Predicts Placebo Response across Chronic Pain Clinical Trials.

Nonverbal communication of patients with borderline personality disorder during clinical interviews: a double-blind placebo-controlled study using intranasal oxytocin.

Antidepressant response to tricyclics and urinary MHPG in unipolar patients. Clinical response to imipramine or amitriptyline.

Tricyclic antidepressant concentrations and clinical response.

Computerized screening interviews.

Placebo response rate in clinical trials of fistulizing Crohn's disease: systematic review and meta-analysis.

Differential HbA1c response in the placebo arm of DPP-4 inhibitor clinical trials conducted in China compared to other countries: a systematic review and meta-analysis.

The feasibility of conducting structured diagnostic interviews with preadolescents: a community field trial of the DISC.

Multiple mini-interviews combined with group interviews in medical student selection.

A latent transition analysis for the assessment of structured diagnostic interviews.

Researcher or nurse? Difficulties of undertaking semi-structured interviews on sensitive topics.