Clinical Psychology and Psychotherapy Clin. Psychol. Psychother. 23, 87–95 (2016) Published online 20 January 2015 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/cpp.1942

Practitioner Report

Some Problems with Randomized Controlled Trials and Some Viable Alternatives Timothy A. Carey1* and William B. Stiles2 1 2

Centre for Remote Health, A Joint Centre of Flinders University and Charles Darwin University, Alice Springs, Australia Department of Psychology, Miami University, Oxford, OH, USA

Randomized controlled trials (RCTs) are currently the dominant methodology for evaluating psychological treatments. They are widely regarded as the gold standard, and in the current climate, it is unlikely that any particular psychotherapy would be considered evidence-based unless it had been subjected to at least one, and usually more, RCTs. Despite the esteem within which they are held, RCTs have serious shortcomings. They are the methodology of choice for answering some questions but are not well suited for answering others. In particular, they seem poorly suited for answering questions related to why therapies work in some situations and not in others and how therapies work in general. Ironically, the questions that RCTs cannot answer are the questions that are of most interest to clinicians and of most benefit to patients. In this paper, we review some of the shortcomings of RCTs and suggest a number of other approaches. With a more nuanced understanding of the strengths and weaknesses of RCTs and a greater awareness of other research strategies, we might begin to develop a more realistic and precise understanding of which treatment options would be most effective for particular clients with different problems and in different circumstances. Copyright © 2015 John Wiley & Sons, Ltd. Key Practitioner Message: • Practitioners can think more critically about evidence provided by RCTs and can contribute to progress in psychotherapy by conducting research using different methodologies. Keywords: randomized controlled trial, gold standard, treatment outcomes

In 1967, Gordon Paul suggested that ‘the question towards which all outcome research should ultimately be directed is the following: What treatment, by whom, is most effective for this individual with that specific problem, and under which set of circumstances?’ (Paul, 1967, p. 111). This question remains relevant today as indicated by citations in peer-reviewed journals (e.g., Lewis, Simons, & Kim, 2012). However, in many ways, we are no closer to answering this question than we were over four decades ago. To begin to develop a coherent and accurate understanding of which treatment is needed by any particular individual in any given situation, it may be necessary to reconsider the privileged position afforded to the randomized controlled trial (RCT). It has become standard rhetoric that RCTs are the gold standard for evaluating psychological treatments. Such is the pervasiveness of RCTs that, currently, it would be almost incomprehensible to consider a psychological treatment as evidence-based unless its efficacy had been demonstrated in one or more *Correspondence to: Timothy Carey, Centre for Remote Health, Flinders University, PO Box 4066, 0871 Alice Springs, Australia. E-mail: tim.carey@flinders.edu.au

Copyright © 2015 John Wiley & Sons, Ltd.

RCTs. Table 1 lists the number of publications per decade for two high-impact psychology journals concerned with psychological treatments. The increasing popularity and use of RCTs over the last four decades is illustrated by the growing number of peer-reviewed publications. It is easy to understand the strong allegiance to the RCT design. The RCT has been described as ‘one of the simplest, most powerful, and revolutionary tools of research’ (Jadad & Enkin, 2007, p. 1). The RCT is a statistical adaptation of the experimental method, which is the closest science has come to a method for demonstrating causality (Haaga & Stiles, 2000). When they are conducted well and used to address appropriate questions, RCTs yield results that are compelling. In situations where assumptions of linear causality can reasonably be applied, RCTs can be an excellent method for demonstrating causal effects. In situations where linear causality does not apply, however, RCTs will be a poor choice of methodology. The RCT methodology has shortcomings that are particularly relevant for understanding psychological treatments as they are implemented in routine clinical settings. In this paper, we explore and explain some conceptual and statistical shortcomings of RCTs and

T. Carey and W. B. Stiles

88 Table 1. Number of randomized controlled trial articles published per decade in Behaviour Research and Therapy (BRAT) and the Journal of Consulting and Clinical Psychology (JCCP) Decade 1980–1989 1990–1999 2000–2009 2010–

BRAT

JCCP

0 2 28 50

1 7 64 54

review some alternative methodologies which, if they are applied with the same intensity and resources that have lately been applied to RCTs, may help us use psychological treatments more strategically and systematically for the benefits of the people who access them.

CONCEPTUAL AND STATISTICAL PROBLEMS WITH RANDOMIZED CONTROLLED TRIALS Agency and Causation Fundamentally, RCTs address causality (Bracken, 2013). They were designed to answer the question ‘Does this program work?’ (Christie & Fleischer, 2009). This is a question of causality: Did A cause B? Is it reasonable to attribute increases in crop yield to fertilizer F or the recovery of patients to surgical technique P? Implicit in these questions is a model of causality in which variations in A directly and unambiguously (under controlled conditions) lead to measurable changes in B. The locus of responsibility for creating the change is placed with the experimental treatment, for example, the fertilizer or the surgical procedure, because plants or patients in these designs are conceptualized as relatively passive recipients compared with the treatments they are exposed to. Perhaps in an effort to make research of psychological treatments as rigorous as research of medical treatments, researchers borrowed the methodology that had been used to answer important questions in medicine (Budd & Hughes, 2009). Psychological treatments, however, are unlike medical treatments in crucial ways, and important assumptions that underpin RCTs do not necessarily apply in the context of psychological treatments. We can identify four main problems with the application of RCT assumptions to psychological treatments. First, treatment techniques are a small part of what contributes to psychological change. RCTs focus on the techniques of treatment and emphasize the specificity of treatment (Hemmings, 2000); however, as Lambert (1992) points out, treatment techniques are one of the least important components of treatment in terms of the amount of outcome variance accounted for. Copyright © 2015 John Wiley & Sons, Ltd.

Second, RCTs ascribe improvements in the clients’ mental state to the treatment. From an RCT perspective, it is assumed that the treatment causes or produces the effect. This view is irreconcilable with the concept of the client as the agent of change (Bohart, 2000). Psychological treatments do not mechanistically cause individuals to get better, and clients are not passive recipients of the treatments (Bohart, Tallman, Byock, & Mackrill, 2011). Rather, the clients are the active agents. Clients use the resources offered by the treatment to create the effects they desire. A third problem for demonstrating the causal influence of treatment involves defining what the treatment actually is. Psychological treatments were manualized to introduce standardization; however, this did not really standardize the treatments. In a manualized treatment of 12 sessions of cognitive behavioural therapy, for example, what should be considered the ‘treatment’? Is it the sequence of 12 sessions or the ordering of activities as specified in the manual? If cognitive activities are introduced after behavioural activities in the manual but a clinician uses them in the reverse order (behavioural activities after cognitive activities), does this constitute a different treatment? Finally, treatment groups are not homogeneous. The probabilistic question of, on the average, ‘does this treatment work?’ is very blunt for investigating the usefulness of psychological treatments. RCTs yield a quantification of how much one group differs from another group, and the probability that a difference of this magnitude or greater could have occurred by chance. While RCTs in any field show some degree of variability in results, it is almost always the case with psychological treatments that some participants in the control group improve more than some participants in the treatment group, and similarly, some participants in the treatment group deteriorate more than some participants in the control group (Blampied, 2001). If the average change in pre and post scores for the treatment group, however, is more favourable (in a statistical sense) than the average change in pre-post scores for the comparison group, then the treatment will be deemed to have ‘worked’. This is a very unusual way to speak about causation.

Independent Variables, Dependent Variables and Independence While problems of agency, causation, and treatment definition are serious enough, a more critical problem concerns the difficulty in demarcation of independent variables (IVs) and dependent variables (DVs). It is essential for the conduct of RCTs that IVs and DVs are clearly defined and independent of each other. In other contexts, such as when testing pharmacological agents, it might be relatively straightforward to disentangle the predictor variables from the response variables. This is not the case for psychological treatments. Clin. Psychol. Psychother. 23, 87–95 (2016)

Randomized Controlled Trials Problems and Some Alternatives

89

The underpinning model of RCTs is linear, and it is assumed that the IV affects the DV in a straightforward manner. Thus, in the context of RCTs, it would be assumed that both pharmacological treatments and psychological treatments are independent variables that affect the dependent variable, perhaps symptoms of depression, as quantified on standardized self-report questionnaires. A crucial difference, however, that is often overlooked is the direction of effects. For pharmacological treatments, we can expect a uni-directional effect as the pharmacological agent affects the individual’s symptoms via their biochemistry. That is, the agent is the same regardless of the biochemistry. In the psychological treatment, however, there is a bi-directional effect. Psychological treatments (the IV) are altered session by session and moment by moment depending on the way individual participants respond (the DV). This is the principle of responsiveness (Stiles, 2009b, 2013). Psychological treatments, therefore, are never standardized in the same sense that pharmacological treatments are. Furthermore, when one considers more closely the way in which the IV has its effect on the DV, it can be appreciated that the pharmacological IV impacts on the depression symptoms DV via the biochemical systems of the individual participants. The psychological treatment, however, addresses the meaning and experiences of the symptoms for the participant. While changes in a participant’s meaning and experience of depression will no doubt also affect their biochemistry, changes in biochemistry are not the identified target for psychological treatments. Crucially, there is likely to be far more variability in how individuals understand and experience their depressive symptoms than in how their biochemical systems react to a particular drug. The non-independence of IVs and DVs raises serious conceptual problems, but it also has important statistical implications as well. Many statistical techniques such as the analysis of variance (ANOVA) rely on the assumption of independence. If the independent variable is not independent of the dependent variable, then ANOVA is an inappropriate form of data analysis as are methods of hierarchical regression analysis (Krause & Lutz, 2009).

age, gender, ethnicity, and symptom severity. However, other variables that are likely to be more important to psychological treatment are rarely measured or reported. Examples of variables that could affect the delivery of psychological treatment might be intelligence, psychological mindedness, cognitive flexibility, commitment and motivation, openness, and tolerance of uncertainty. Random assignment is conducted to ensure that the groups are equivalent with respect to extraneous variables that might otherwise influence the DV and thus confound the treatment and control group comparison and the causal conclusions that follow (Krause & Howard, 2003). It is important to recognize, however, that random assignment does not actually solve the problem of possible confounding by unknown or unmeasured variables; it only tends to reduce the extent of the problem (Krause & Howard, 2003). Unless researchers use extremely large samples, they cannot assume that the possibility of confounding variables has been eliminated simply because they are using a randomized design. Furthermore, the virtues of random assignment are particularly sensitive to the requirement of large sample sizes, which is necessary in many research designs (Bickman, & Reich, 2009; Hsu, 2003). The difficulty of random assignment has led to a bias in the disorders and treatments selected for study using RCTs. Some important disorders have had no RCT simply because it is too difficult conduct a timely random assignment. Obtaining a randomized control group is particularly difficult for high-risk clinical difficulties (suicide, violence, etc.) and rare disorders, as well as clinical problems in important settings where RCTs are not a priority (e.g., private practices and disaster responding). It is unattractive to do RCTs on very long term treatments or to examine changes that occur over many years or conditions that require unique management strategies should something go wrong. As a result, the overwhelming bulk of RCTs concern shorter treatments on common disorders that entail relatively few risks. A further problem is the difficulty (and hence rarity) of random sampling from the populations of interest for evaluating psychotherapy. Random assignment can remove bias between groups, but it cannot remove the bias from a sample that has not been randomly drawn from a population. Without a systematic sampling strategy, it is not possible to even estimate what the sampling bias might be. Random sampling affects the generalizability of the results. If it is not possible to estimate the sampling bias, there is no way of determining the extent to which the results obtained for the sample apply to the population. In fact, in the absence of explicit random sampling, it is very difficult even to know from what population the sample was drawn. While non-random samples are not a problem exclusively to RCTs, they do compromise randomization, which is purported to be one of the main advantages of the RCT design.

Random Assignment and Random Sampling One of the important strengths of the RCT design is the random assignment of participants to either the treatment or control group. Random assignment is supposed to allocate participants to groups such that the two groups tend to be initially equivalent in all aspects relevant to the study (Bradford Hill, 1951). Random assignment works well when fields of wheat are being randomized or people are being randomly allocated to receive either coronary artery bypass surgery or angioplasty. For psychological treatments, the degree to which groups are equivalent is often reported with respect to assessed variables such as Copyright © 2015 John Wiley & Sons, Ltd.

Clin. Psychol. Psychother. 23, 87–95 (2016)

T. Carey and W. B. Stiles

90

Direction of Inference The principles of statistical inference allow statements to be made about a population in general from data gathered from a sample of that population. For psychological treatments, however, what is required is knowledge about how the treatment might affect an individual, not the average effect the treatment could be expected to have at a population level. That is, even if random sampling allowed results to be generalized to a defined population, the direction of inference in statistical analysis is in the opposite direction from what is required to address Paul’s question (Blampied, 2001). At best, RCTs allow conclusions to be made about the impact of a treatment on members of a population, whereas what is needed is information about what treatment works for whom under which conditions. To summarize, in theory, RCT methodology would be an appropriate research design for addressing questions of whether or not a treatment works. However, RCTs have a number of problems that compromise their ability to provide the sort of information about psychological treatments that will enable treatments to be developed that can be used more effectively and efficiently by individuals. They are based on the wrong model of causation, and their requirement for a clear and distinct separation between the IV and DV cannot be accomplished with psychotherapies. It is misleading to consider them as a gold standard. Jadad and Enkin (2007, p. 106) consider ‘the very concept of a hierarchy of evidence to be misguided and superficial.’ Some important public health questions, for example, such as the effects of environmental tobacco smoke (West et al., 2008), are not able to be answered with RCT designs. There are compelling reasons, therefore, to consider other research strategies for investigating psychological treatments.

ADDITIONAL RESEARCH APPROACHES Even if RCTs worked as intended, they would not yield the specific information required to understand what treatments are needed by which clients. Clearly, additional approaches and criteria are needed if progress is to be realized regarding how psychotherapy helps people resolve psychological distress. We focus on approaches that can yield valuable information even when scaled down to the level of individual practices. Readers may also wish to consult Elliott (2010) for additional alternatives to RCTs.

Serial Replication In contrast to randomization, replication is a feasible principle to demonstrate and build confidence in results Copyright © 2015 John Wiley & Sons, Ltd.

regarding which therapies under what conditions help which patients. Persons and Silberschatz (2009) recommended serial replication in which findings are replicated across different patients, different therapists, and different contexts. Replication is important in all scientific research, of course, but it is relatively uncommon in RCTs of psychotherapy, presumably because of the high costs of conducting them. Serial replication can provide clinicians with useful information about the effects of therapy under a variety of different conditions. To illustrate, the Method of Levels is a transdiagnostic cognitive therapy that has been evaluated in routine clinical practice (e.g., Carey, 2005; Carey, Carey, Mullan, Spratt, & Spratt, 2009; Carey & Mullan, 2008; Carey, Tai, & Stiles, 2013) with the results being replicated across different settings (general practices and hospital outpatient clinics), different services (primary care and secondary care), different time periods (for example, 9, 12, and 24 months), different health systems (the National Health Service (NHS) in Scotland and the Australian public mental health service), different therapists (up to four different therapists), and different patients with different problems (over 500 patients have participated in the evaluations). Although these evaluations have not involved RCT designs, the fact that findings of effectiveness have been replicated in a variety of different ways yields confidence in the Method of Levels, while results of different studies provide useful indicators of what is therapeutically useful to different patients with different problems in different contexts. Moreover, most of these evaluations were conducted by clinicians in full-time clinical practice, which also demonstrates the practical utility of this approach.

Convergence of Evidence Another alternative to the mono-methodological gold standard is convergence of evidence. Bohart et al. (2011) pointed out that because no study is perfect, decisions should be based on the extent to which a large body of evidence from a variety of sources converges on a common conclusion. The notion of convergence contrasts with the requirement that for psychological treatments to be regarded as evidence-based, they need to be subjected to RCTs alone. For example, in Australia, the National Health and Medical Research Council’s guide to the development of clinical practice guidelines describes the highest level of evidence for clinical practice as ‘evidence obtained from a systematic review of all relevant randomized controlled trials’ and the second level of evidence as ‘evidence obtained from at least one properly designed randomized controlled trial’ (NHMRC, 1999, p. 56). The principle of convergence suggests that decisions about what works for whom should be based on evidence from a variety of sources. If this treatment is replicated Clin. Psychol. Psychother. 23, 87–95 (2016)

Randomized Controlled Trials Problems and Some Alternatives

91

with different therapists in different settings across different time periods, what are the results? When this treatment is evaluated in routine clinical practice and benchmarked against other published data, how does it compare? When a series of case studies are conducted, what do these studies reveal about both the underlying theory and important therapeutic mechanisms? In qualitative studies, what do patients report about their experience of the treatment? As evidence from a number of different methodologies is obtained, a clearer picture will emerge about under what conditions and for which people this particular treatment might work best. Knowledge accumulated in this way would be more clinically helpful than the evidence obtained from RCTs alone. Elliott (2002) has described a systematic procedure for bringing convergent evidence to bear on the question of the efficacy of treatment in single cases. Conclusions are based on a wide range of indicators (quantitative and qualitative, self-report and therapist perspective) and formal statements of both pro and con arguments regarding treatment effectiveness and possible alternative explanations.

An example from routine clinical practice demonstrates the way in which questionnaire scores and clinical observations can be used to rule out competing explanations. A young adult woman attended 12 weekly sessions of the Method of Levels. She had experienced difficulties at school and had trouble maintaining employment. For the past 8 months, she had isolated herself at home, frequently expressed suicidal thoughts, smoked, and used drugs. She stated that she only attended the first session because her parent was worried about her and wanted her to see someone. Table 3 provides unsolicited indicators offered by the client and observations recorded by the therapist that suggest an improvement in mental state during the time period in which treatment was provided. The client completed the K10, a widely used, 10-item, global measure of distress (Kessler et al., 2002), at pretreatment, after six sessions, and at post-treatment with scores of 39, 32, and 25, respectively. A previous study reported a reliable change index of 7.58 for the K10 (Murugesan et al., 2007). Using this score as a benchmark indicates that the client’s change from 39 to 25 could be regarded as reliable. The client also completed the Outcome Rating Scale (ORS), a visual analogue scale (Miller & Duncan, 2004) assessing individual, relational, social, and overall functioning at every session. A conservative reliable change index for the ORS is 6.8 (Miller & Duncan, 2004). To achieve clinically significant improvement, a person’s score must begin at or below the clinical cutoff of 25 and increase by greater than 6.8. The client’s scores indicated she achieved both reliable and clinically significant change. Finally, the client was accepted into university and, at both 3 and 12-month follow-ups, she was still enrolled in her course. This clinical example illustrates the way in which clinicians can collect and organize data efficiently and systematically in order to eliminate many of the alternative explanations for change highlighted by Kazdin (2011). Given the longevity of the client’s problems, the rate of improvement once treatment started, and the sustained improvement after the end of treatment, it seems reasonable to conclude that treatment altered the course of the client’s troubles. Also, the changes occurred after the treatment commenced and not before. Improvements were recorded in a number of ways (ORS, K10, observationally, and self-report) across multiple time points. Clinicians, therefore, can use the data they collect during the course of their routine clinical practice to make conclusions about the impact of their treatment with their clients and also against which to consider competing explanations.

Eliminating Alternative Explanations When a person moves from a state of psychological distress to one of psychological contentment or at least less distress, there are potentially many explanations to account for the change. Of interest to clinicians and researchers is to know how defensible it is to conclude that participating in a particular program of treatment was primarily responsible for enabling patients to make the changes they did. Kazdin (2011) argues that it is possible to collect data in such a way that competing explanations can be eliminated and valid inferences can be drawn. At any point in time, psychological treatment is only a small part of the ongoing activity of a person’s life. Table 2 provides a summary of the way in which internal validity can be threatened through competing explanations. It might be the case, for example, that the reduction in psychological distress could be attributed to the expected growth and development of the individual. Maturation, therefore, might be a more appropriate explanation of the change in the person’s psychological state. It is also known that many emotional states remit of their own accord over a period of time. With depression, for example, 40% of people will recover within a few months whether they receive treatment or not (Healy, 2012). There could also be characteristics of any testing that is conducted that relate more to the change in score than the resources of the treatment. Kazdin (2011), for example, suggests that an extreme score might become less extreme at a second time point through regression to the mean.

Copyright © 2015 John Wiley & Sons, Ltd.

Theory-Building Research Theory-building research seeks to test, improve, and extend a particular theory (Stiles, 2009a, in press). In what Clin. Psychol. Psychother. 23, 87–95 (2016)

T. Carey and W. B. Stiles

92 Table 2.

Eliminating alternative explanations (adapted from Kazdin, 2011)

Alternative explanation

Threat to internal validity

The person’s condition changed because of their developmental life stage or because they grew tired or bored of the treatment.

Maturation

The person’s condition could have reasonably been expected to remit of its own accord within the time period in which treatment was provided.

Natural progression of the condition

Situations in the person’s life changed concurrently with their participation in the treatment program.

Change in circumstances

The changes had started to occur prior to the commencement of treatment.

Temporal sequencing

The person only provided pre and post scores, and their extreme pre scores could have been expected to move closer to the mean at the post assessment.

Conditions of testing

Kuhn (1970) called normal science, dominated by an accepted paradigm, theory-building is the main research purpose. In theory-building research, the domain of the theory, and hence its generality, is specified within the theory itself. Particular studies typically examine only small derived aspects of the theory. This contrasts with RCTs, in which generality is understood as external validity— the range of populations and settings in which a treatment can be expected to work. Theory-building research aims at building explanatory theory, which synthesizes observations, explaining how observations are related to each other. The research provides quality control on the theory by comparing the theoretical account with observations. Whereas RCTs seek to test whether an existing treatment is effective, theorybuilding research seeks to improve understanding and

Table 3. Session number 2 3 4 5 6 7

8 11

hence can contribute to systematic development of improved treatments. Explanatory theory is distinct from treatment theory, which is meant to guide clinicians in conducting therapy. Some explanatory theories of psychotherapy also seek to be treatment theories (e.g., psychoanalytic theory, cognitive theory, and person-centred theory), and indeed, there is nothing so clinically practical as a good explanatory theory of psychopathology and psychotherapeutic process. However, treatment theories may be separate from and far simpler than explanatory theories. For example, a treatment theory may simply state that therapists should be as genuine, accurately empathic, and unconditionally accepting as they can be, or that watching an object move back and forth while talking about problems will be helpful. Treatment theories are judged not by their descriptive accuracy but by

Client reports and therapist observations during delivery of psychological treatment Client report • Felt positive and hopeful • Socializing more • Wanted to get a job, get married, and have children • Reconnected with ex-partner • Going out with best friend • Registered for a 1-day writing course • Reduced cannabis used from every day to every second or third day • Negotiated part-time work • Had resumed playing guitar and was going out with friends • No further thoughts of suicide. Reported that she began attending to her grooming and appearance once she decided not to kill herself anymore • Waking 30–40 min every day • Part-time work commenced • Reduced smoking to one cigarette or less a day • Applied to go to university

Copyright © 2015 John Wiley & Sons, Ltd.

Therapist observation

• Attended for the first time without her hood over her head • Attended without her parent waiting in the reception area • Not wearing the pullover with the hood • Brought in a book she was reading as well as pieces of writing she had recently completed

• Was wearing lipstick for the first time

Clin. Psychol. Psychother. 23, 87–95 (2016)

Randomized Controlled Trials Problems and Some Alternatives

93

whether therapists who use them produce positive results (e.g., in RCTs). Explanatory theories, in contrast, seek to understand the details of the process and explain a wide variety of observations. They are judged by how well they correspond with the detailed observations. Explanatory theories grow by accommodating new observations as follows. Researchers make theory-relevant observations. If the new observations are consistent with theory, then confidence in the theory increases. If the observations are inconsistent with the theory (and the inconsistencies cannot be ascribed to faulty methods), then the theory may be abandoned or, more often, adjusted to accommodate the inconsistent observations. If the new observations are outside the domain of the theory, they can be ignored, but, in contrast, sometimes the theory can be expanded to encompass them. These adjustments and expansions are technically called abductions (Rennie, 2000; Stiles, 2009a, in press). They are constrained by the requirement that the adjusted or expanded theory must remain consistent with all previous theory-relevant observations. Through research and abductions, then, theories become more trustworthy and grow to encompass an ever-widening range of observations with ever-greater precision. From this perspective, theories are never finished; there is no point when the theory is ready for a final test. Instead, each relevant observation either confirms the theory or leads to some abductive modification. Of course, all researchers (and readers) must beware of confirmatory biases. There is always a danger that observers will see what they want to see. Actively seeking disconfirmation with the goal of improving the theory (through abductions) goes some way towards counteracting the bias favouring existing theory. Theory-building research can be performed with any sort of research, quantitative or qualitative. Theorybuilding case studies (Stiles, 2009a) may be of particular interest to clinicians, insofar as qualitative case observations may include details about the therapist, patient, setting, context, and process of therapy that could begin to address Paul’s (1967) question. By adjusting and expanding the theory to accommodate these details in successive cases, the theory can come to convey, in compact form, the accumulated observations of the clinicians and researchers who have contributed to it.

analysis affords an intensive scrutiny of events in therapy. The procedure begins with selecting a specific type of therapeutic problem, for example, a problematic reaction point, in which a client reports an uncharacteristic or puzzling personal reaction and identifying in-session markers of that problem (Rice & Saperia, 1984). Next, expert opinions about how the problem might be solved are gathered and synthesized, yielding a rational model of the task. Then, a corpus of instances of the problem—in therapy events that illustrate the problem—are collected from therapy recordings, and these are compared with the rational model to assess whether the model works in practice. Next, the rational model is progressively corrected and refined in light of the empirical observations; that is, a rational-empirical model is constructed by successive rational and empirical analyses. Finally, the model is verified by comparing successful and unsuccessful instances of attempts to solve this problem. As presented in the cited references, task analysis is a pragmatic strategy that begins by drawing on clinical experience and works towards a stand-alone account of how to address a particular sort of clinical problem. However, most of the task analytic procedures could be straightforwardly adapted to theory building by beginning with an existing theory rather than newly gathered expert opinions and working towards either confirmation of the theory or towards abductions that would reconcile the theory with observations regarding the solution of this particular problem.

Task Analysis Task analysis is an observational, inductive, and iterative strategy in which investigators use observations of individuals performing tasks to progressively improve descriptions of how the task can best be performed (Greenberg, 1984, 2007; Greenberg & Foerster, 1996; Pascual-Leone, Greenberg, & Pascual-Leone, 2014). Task Copyright © 2015 John Wiley & Sons, Ltd.

Benchmarking Another approach to assessing treatment outcomes is to benchmark results with comparable results in the published literature (Minami & Wampold, 2008; Minami, Wampold, Serlin, Kircher, & Brown, 2007). One source for benchmarks is through the results reported in clinical trials or even meta-analyses of clinical trials. These benchmarks will provide statistics such as effect sizes for particular outcome measures used in various studies. Clinicians interested in benchmarking the results from their clinical practice can use outcome measures to compare the effect sizes they obtain in routine practice with the effect sizes published in the literature. The logic of benchmarking is compelling. Although RCTs can involve comparisons of bona fide treatments, it is common to compare a preferred treatment with a nonstandard form of treatment such as a waiting-list control group, a Treatment as Usual group, a self-help group, or an educational group. Such comparisons tend to favour the researcher’s preferred treatment. With benchmarking studies, however, researchers can compare the outcomes from their preferred treatment with the published outcomes obtained from the preferred treatments of other Clin. Psychol. Psychother. 23, 87–95 (2016)

T. Carey and W. B. Stiles

94 researchers. Carey et al. (2013), for example, conducted an evaluation in routine clinical practice and collected outcomes from published studies of routine clinical practice and constructed tables to benchmark effect sizes, efficiency ratios, and reliable and clinically significant change statistics obtained in their study with the same statistics from published studies. When considering the benchmarking standard, it is important to ensure that the comparisons being made are defensible. Comparisons could be made, for example, between studies using the same outcome measure or studies using different outcome measures but reporting a statistic such as an effect size. Benchmark comparisons could also be chosen based on research design such as benchmarking the results of a study in routine clinical practice with other published studies of routine clinical practice.

Realist Evaluation One of the conceptual difficulties mentioned earlier for the RCT design is that agency is presumed to lie within the treatment being offered. The responsiveness of both the client and the therapist to the therapy being delivered is ignored. Realist evaluation is a methodological framework that is concerned with understanding ‘what works for whom in which circumstances, in what respect, and how?’(Pawson & Tilley, 1997, p. 2). Realist evaluation acknowledges first and foremost that no treatment works for all people all the time, primarily because different people require different resources. Realist evaluation uses both quantitative and qualitative methods to analyse important relationships between and amongst variables. Central to the realist approach is the delineation of context–mechanism–outcome pattern configurations (Pawson & Tilley, 1997). Of interest is understanding what mechanisms in what contexts lead to which outcomes. This seems like an approach that would be particularly suited to understanding psychotherapy and the way it is used by different people for the effects they intend. Through realist methods, it might be possible to begin to understand what different people experiencing different problems in different circumstances need in order to achieve a similar level of contentment and satisfaction. The realist evaluation framework, therefore, could provide a structure to researchers to begin to systematically understand the differential impact of treatment across individuals.

CONCLUDING COMMENTS Paul’s (1967) question will not be answered while we continue to consider RCTs as the gold standard. RCTs may Copyright © 2015 John Wiley & Sons, Ltd.

provide some, but not all, pieces to the puzzle. A variety of other methods can yield robust evidence for particular treatments or therapeutic principles, yet that evidence is likely to be ignored under the assumption that RCTs are the gold standard. Our understanding and the development of increasingly more effective psychotherapies may well hinge on our willingness to accept the limitations of the RCT and to welcome other methodologies to the table as equal players. Only then will patients experience greater benefits from our ability to provide a sophisticated and nuanced answer to the question of the most appropriate treatment for any particular individual.

REFERENCES Bickman, L., & Reich, S. M. (2009). Randomized controlled trials. A gold standard with feet of clay? In S. I. Donaldson, C. A. Christie, & M. M. Mark (Eds.), What counts as credible evidence in applied research and evaluation practice? (pp. 51–77). Los Angeles: Sage. Blampied, N. M. (2001). The third way: Single-case research, training, and practice in clinical psychology. Australian Psychologist, 36(2), 157–163. Bohart, A. (2000). The client is the most important common factor: Clients’ self-healing capacities and psychotherapy. Journal of Psychotherapy Integration, 10(2), 127–149. Bohart, A. C., Tallman, K. L., Byock, G., & Mackrill, T. (2011). The “Research Jury Method”: The application of the Jury Trial Model to evaluating the validity of descriptive and causal statements about psychotherapy process and outcome. Pragmatic Case Studies in Psychotherapy, Retrieved from http:// pcsp.libraries.rutgers.edu 7, Module 1, Article 8, 101–144. Bracken, M. B. (2013). Risk, change, and causation. New Haven, CT: Yale University Press. Bradford Hill, A. (1951). The clinical trial. British Medical Bulletin, 7(4), 278–83. Budd, R., & Hughes, I. (2009). The Dodo Bird verdict – controversial, inevitable and important: A commentary on 30 years of meta-analyses. Clinical Psychology & Psychotherapy, 16, 510–522. Carey, T. A. (2005). Can patients specify treatment parameters? Clinical Psychology and Psychotherapy: An International Journal of Theory and Practice, 12(4), 326–335. Carey, T. A., Carey, M., Mullan, R. J., Spratt, C. G., & Spratt, M. B. (2009). Assessing the statistical and personal significance of the Method of Levels. Behavioural and Cognitive Psychotherapy, 37, 311–324. Carey, T. A., & Mullan, R. J. (2008). Evaluating the Method of Levels. Counselling Psychology Quarterly, 21(3), 1–10. Carey, T. A., Tai, S. J., & Stiles, W. B. (2013). Effective and efficient: Using patient-led appointment scheduling in routine mental health practice in remote Australia. Professional Psychology: Research and Practice, 44, 405–414. Christie, C. A., & Fleischer, D. (2009). Social inquiry paradigms as a frame for the debate on credible evidence. In S. I. Donaldson, C. A. Christie, & M. M. Mark (Eds.), What counts as credible evidence in applied research and evaluation practice? (pp. 19–30). Los Angeles: Sage. Elliott, R. (2002). Hermeneutic single case efficacy design. Psychotherapy Research, 12, 1–20.

Clin. Psychol. Psychother. 23, 87–95 (2016)

Randomized Controlled Trials Problems and Some Alternatives

95

Elliott, R. (2010). Psychotherapy change process research: Realizing the promise. Psychotherapy Research, 20, 123–135. Greenberg, L. S. (1984). Task analysis: The general approach. In L. N. Rice & L. S. Greenberg (Eds.), Patterns of change: Intensive analysis of psychotherapy process (pp. 124–148). New York: Guilford Press. Greenberg, L. S. (2007). A guide to conducting a task analysis of psychotherapeutic change. Psychotherapy Research, 17, 15–30. Greenberg, L. S., & Foerster, F. S. (1996). Task analysis exemplified: The process of resolving unfinished business. Journal of Consulting and Clinical Psychology, 64, 439–446. Haaga, D. A. F., & Stiles, W. B. (2000). Randomized clinical trials in psychotherapy research: Methodology, design, and evaluation. In C. R. Snyder & R. E. Ingram (Eds.), Handbook of psychological change: Psychotherapy processes and practices for the 21st century (pp. 14–39). New York: Wiley. Healy, D. (2012). Pharmageddon. Los Angeles, CA: University of California Press. Hemmings, A. (2000). A systematic review of the effectiveness of brief psychological therapies in primary health care. Families, Systems & Health, 18, 279–313. Hsu, L. M. (2003). Random sampling, randomization, and equivalence of contrasted groups in psychotherapy outcome research. In A. E. Kazdin (Ed.), Methodological issues and strategies in clinical research (3rd ed.). (pp. 147–161). Washington, DC: American Psychological Association. [Reprinted from the Journal of Consulting and Clinical Psychology, 57, 131–137.] Jadad, A., & Enkin, M. (2007). Randomized controlled trials: Questions, answers, and musings (2nd ed.). Malden, MA: Blackwell Publishing. Kazdin, A. E. (2011). Single-case research designs (2nd ed.). New York: Oxford University Press. Kessler, R. C., Andrews, G., Colpe, L. J., Hiripi, E., Mroczek, D. K., Normand, S-L. T., Walters, E. E., & Zaslavsky, A. M. (2002). Short screening scales to monitor population prevalences and trends in non-specific psychological distress. Psychological Medicine, 32, 959–956. Krause, M. S., & Howard, K. I. (2003). What random assignment does and does not do. Journal of Clinical Psychology, 59(7), 751–766. Krause, M. S., & Lutz, W. (2009). Process transforms inputs to determine outcomes: Therapists are responsible for managing process. Clinical Psychology: Science and Practice, 16, 73–81. Kuhn, T. S. (1970). The structure of scientific revolutions. Chicago: Chicago University Press. Lambert, M. J. (1992). Psychotherapy outcome research: Implications for integrative and eclectic theories. In J. C. Norcross & M. R. Goldfried (Eds), Handbook of psychotherapy integration (pp. 94–129). New York: Basic Books. Lewis, C. C., Simons, A. D., & Kim, H. K. (2012). The role of early symptom trajectories and pretreatment variables in predicting treatment response to cognitive behavioral therapy. Journal of Consulting and Clinical Psychology, 80(4), 525–534. Miller, S. D., & Duncan, B. L. (2004). The outcome and session rating scales: Administration and scoring manual. Chicago: Institute for the Study of Therapeutic Change.

Minami, T., Wampold, B. E., Serlin, R. C., Kircher, J. C., Brown, G. S. (2007). Benchmarks for psychotherapy efficacy in adult major depression. Journal of Consulting and Clinical Psychology, 75(2), 232–243. Minami, T., & Wampold, B. E. (2008). Adult psychotherapy in the real world. Biennial Review of Counseling Psychology, 1, 27–45. Murugesan, G., Amey, C. G., Deane, F. P., Jeffrey, R., Kelly, B., & Stain, H. (2007). Inpatient psychosocial rehabilitation in rural NSW: Assessment of clinically significant change for people with severe mental illness. Australian and New Zealand Journal of Psychiatry, 41, 343–350. NHMRC. (1999). A guide to the development, implementation and evaluation of clinical practice guidelines. Canberra: Commonwealth of Australia. Retrieved on 10 March 2014 from http:// www.nhmrc.gov.au/_files_nhmrc/publications/attachments/ cp30.pdf Pawson, R., & Tilley, N. (1997). Realistic evaluation. Los Angeles: Sage. Pascual-Leone, A., Greenberg, L. S., & Pascual-Leone, J. (2014). Task analysis: New developments for programmatic research on the process of change. In W. Lutz & S. Knox (Eds.), Quantitative and qualitative methods in psychotherapy research (pp. 249–273). London: Taylor & Francis Books. Paul, G. L. (1967). Strategy of outcome research in psychotherapy. Journal of Counseling Psychology, 31(2), 109–118. Persons, J. B., & Silberschatz, G. (2009). Are results of randomized controlled trials useful to psychotherapists? In A. E. Kazdin (Ed.), Methodological issues and strategies in clinical research (3rd ed.) (pp. 547–568). Washington, DC: American Psychological Association. [Reprinted from the Journal of Consulting and Clinical Psychology, 66, 126–135.] Rennie, D. L. (2000). Grounded Theory Methodology as Methodical Hermeneutics: Reconciling realism and relativism. Theory & Psychology, 10(4), 481–502. Rice, L. N., & Saperia, E. P. (1984). Task analysis and the resolution of problematic reactions. In L. N. Rice & L. S. Greenberg (Eds.), Patterns of change (pp. 29–66). New York: Guilford. Stiles, W. B. (2009a). Logical operations in theory-building case studies. Pragmatic Case Studies in Psychotherapy, 5(3), 9–22. Available: http://jrul.libraries.rutgers.edu/index.php/pcsp/ article/view/973/2384 Stiles, W. B. (2009b). Responsiveness as an obstacle for psychotherapy outcome research: It’s worse than you think. Clinical Psychology: Science and Practice, 16, 86–91. Stiles, W. B. (2013). The variables problem and progress in psychotherapy research. Psychotherapy, 50, 33–41. Stiles, W. B. (in press). Theory-building, enriching, and factgathering: Alternative purposes of psychotherapy research. In O. Gelo, A. Pritz, & B. Rieken (Eds.), Psychotherapy research: General issues, process, and outcome. New York: Springer-Verlag. West, S. G., Duan, N., Pequegnat, W., Gaist, P., Des Jarlais, D. C., Holtgrave, D., Szapocznik, J., Fishbein, M., Rapkin, B., Caltts, M., & Dolan Mullen, P. (2008). Alternatives to the randomized controlled trial. American Journal of Public Health, 98, 1359–1366.

Copyright © 2015 John Wiley & Sons, Ltd.

Clin. Psychol. Psychother. 23, 87–95 (2016)

Some Problems with Randomized Controlled Trials and Some Viable Alternatives.

Randomized controlled trials (RCTs) are currently the dominant methodology for evaluating psychological treatments. They are widely regarded as the go...
132KB Sizes 2 Downloads 7 Views