CLINICAL TRIALS

PANEL DISCUSSION

Clinical Trials 2014; 11: 457–466

University of Pennsylvania 6th annual conference on statistical issues in clinical trials: Dynamic treatment regimes (afternoon session) Christy Chuang-Stein, Dean Follman and Rick Chappell

Glossary A dynamic treatment regime (DTR) individualizes treatment over time via decision rules that specify whether, how, or when to alter the intensity, type, or delivery of treatment at critical clinical decision points in individual patients [1]. A DTR is said to be optimal if it yields the best clinical outcome on average when applied to the entire patient population [2]. Sequential multiple assignment randomized trials (SMARTs) or equivalently, sequentially randomized trials have been developed explicitly for the purpose of constructing proposals for high-quality DTRs [1]. Moderator Benjamin French, PhD, University of Pennsylvania: We have an excellent panel to lead us through the discussion of this afternoon’s talks. Christy Chuang-Stein is the Head of Statistical Research and Consulting Center at Pfizer. She has 28 years of pharmaceutical industry experience. Dean Follmann is the head of the Biostatistics Research Branch of the National Institute of Allergy and Infectious Disease (NIAID). Estelle Russek-Cohen is the Director of the Division of Biostatistics in the Center for Biologics Evaluation and Research (CBER) and Rick Chappell is Professor in the Departments of Biostatistics and Medical Informatics and of Statistics at the University of Wisconsin-Madison. Christy Chuang-Stein, PhD, Pfizer: At a conceptual level, I think the SMART idea is intuitive, interesting, and innovative. Since I am the pharmaceutical industry member on this panel, I will make my remarks from the industry perspective. I usually don’t use disclaimers, but I am going to use one today, simply because we haven’t had a robust conversation about SMARTs at Pfizer or within any working group I have been involved with. Therefore, the opinions expressed today are completely my own. I believe that some Pharmaceutical Research and Manufacturers of America companies have

employed certain aspects of the SMART concept. The idea of SMART is currently being explored by an Adaptive Program Working Group [3] as a way to optimize the strategies at the development program level. The Adaptive Program Working Group is part of a bigger Adaptive Design Scientific Working Group, an industry and academic collaborative effort. Dr. Kidwell mentioned the CATIE trial. The trial represented a major undertaking by the National Institute of Mental Health (NIMH). It was the largest, longest, and most comprehensive independent trial to examine the existing therapies for patients with chronic schizophrenia. Clinical antipsychotic trials for intervention effectiveness (CATIE) was conducted over a 4-year period with drugs donated and supplied by manufacturers in identical-looking capsules to enable blinding. The objective of the trial was to determine which medications provided the best treatment for patients. In addition, the study hoped to offer guidance on what medication to try next if the first medication was not satisfactory. The trial, as depicted in Figure 1, has two stages. About 1500 patients were initially randomized to one of five drugs. Patients were to receive the drugs assigned to them for 18 months (Stage I). Four of the drugs were from a class of drugs called the second generation of antipsychotics or atypicals for short. They were olanzapine, quetiapine, risperidone, and ziprasidone. These atypicals became available in the late 1990s and early 2000s. The fifth drug was perphenazine, which was a first-generation antipsychotic. The aim of Stage I was to see whether the newer second-generation drugs could provide more benefit to patients than the older drug. Approximately, three-quarters of the patients discontinued their initial treatment, before the 18th month. Participants who couldn’t get a good response, either because they could not tolerate the side effect or simply did not respond adequately, were eligible to enter Stage II of the trial. Those who discontinued Stage I because of lack of efficacy would be randomized to either the first-generation

Ó The Author(s), 2014 Reprints and permissions: http://www.sagepub.co.uk/journalsPermissions.nav 10.1177/1740774514538552 Downloaded from ctj.sagepub.com at NORTH DAKOTA STATE UNIV LIB on May 24, 2015

458

C Chuang-Stein et al.

R Stage eI Olanzapine i

Quetiapine Q tti i

id Risperidone Ri

Zi id Ziprasidone

P h Perphenazine

Atypical antipsychotics (2nd generation) Responders (Phase I)

Continue

Stage II

Discontinued - lack of efficacy

Discontinued - side effect

R Clozapine

R

A diff atypical

Ziprasidone

A diff atypical

Figure 1. Schematic of Stage I and Stage II in the CATIE Trial.

antipsychotic clozapine or an atypical different from what they received during Stage I if they had been randomized to an atypical. On the other hand, if a participant discontinued because of side effects, he or she would be randomized to the second-generation antipsychotic ziprasidone or an atypical different from what they received in Stage I if the patient was randomized to an atypical. The design was admittedly a bit complicated, but it allowed multiple objectives to be investigated in one trial with randomization in Stage II dependent on the participant’s experience in Stage I. One-third of the patients who did not achieve an adequate response in Stage I chose to leave the study altogether without entering into Stage II. A total of 543 trial participants did enter into Stage II. Interestingly enough, some who discontinued Stage I due to lack of efficacy didn’t want to receive a first-generation antipsychotic, so they put themselves in the category of side-effect-related discontinuation to ensure that they would receive an atypical in Stage II. Among the 543 participants who entered into Stage II, about three-fourths discontinued before the scheduled treatment completion. I encourage you to go to the NIMH Web site to read about this ground-breaking trial [4]. This trial also highlighted the common challenge of high dropout rate in antipsychotic trials. This challenge is equally applicable to SMARTs. I have been thinking about how pharmaceutical companies can take advantage of the SMART concept. Pharmaceutical companies are pursuing more and more targeted therapies. In the cancer area, these are treatments that block the growth and the

spread of cancer by interfering with specific targets involved in tumor growth and progression. As a pharmaceutical product developer, if we have evidence to suggest that a therapy designed as a targeted therapy is hitting the target with high affinity, we will have established the proof of mechanism. If data from the proof-of-concept study suggest that proof of mechanism does translate to clinical effect, the targeted therapy under development is often speculated to be efficacious as a first-line, secondline, and third-line therapy. When this happens, one can consider designing a confirmatory trial that will enroll patients whose disease possesses the target molecular characteristics, randomizing eligible patients to receive either the new targeted therapy or the standard of care as the first line. Patients who fail on the first-line standard of care can be randomized to receive the new targeted therapy as a second-line option or a second-line standard of care, and we continue to follow them. Patients failing on the second-line standard of care can be further randomized to receive the new targeted therapy as a third-line option or a third-line standard of care. This is conceptually appealing. Of course, this conceptual appeal needs to be balanced against time required to conduct this trial, and we may have limited control over the number of patients going into the three stages as we have seen with the CATIE trial (Figure 2). Another potential application is enrichment at the proof-of-concept stage. Suppose we are developing a new treatment for a very hard to treat condition. At this early stage, we don’t have a good prediction model for who might be responding to

Clinical Trials 2014; 11: 457–466

http://ctj.sagepub.com Downloaded from ctj.sagepub.com at NORTH DAKOTA STATE UNIV LIB on May 24, 2015

Dynamic treatment regimes (afternoon session)

459

If respond, continue New TT If fail, receive SOC R SOC SO

If respond, continue If respond, continue New TT

A Among patients with the targeted characteristics

If ffail

If fail, receive SOC

R If respond, continue SOC, 2nd line New TT N If fail

R SOC, 3rd line

Figure 2. Testing a targeted therapy as a first-, second, and third-line treatments.

the new treatment. One option we can consider is to first treat patients with an approved product in the same class (or an approved product for the disorder, if there is no approved product in the same class). Patients who responded to the approved product can go through a washout period and then be randomized to the new treatment or a placebo. What we are doing here is enriching the study population by selecting those patients who are likely to respond to the particular pharmacologic class of drugs to which the new treatment belongs. We give the drug a good chance to demonstrate its efficacy if it is indeed efficacious. Patients who did not respond to the approved product can be randomized to the new treatment or a placebo. In these patients, we can investigate whether the new treatment could be a viable option to those who failed on the approved treatment. The latter is often an easier pathway to marketing authorization. This type of enrichment is mentioned in draft Food and Drug Administration (FDA) guidance on enrichment strategies issued in December 2012 [5]. I mentioned earlier that the idea of SMART is being explored by members of the Adaptive Program Working Group. When designing a development program to gain marketing authorization for a product, one should look at the development program as a continuum. We need to look at the FDA-defined Phase 2 and Phase 3 together because they are two contiguous phases of the entire development program. We know the information we want to achieve at the end of Phase 3. The question is, how we can back engineer to find the best dose–response strategy for Phase 2 to obtain the desired information for Phase 3? This is similar to the backward induction that has been presented by some researchers of SMARTs. Not surprisingly, the Adaptive Program Working Group has found it challenging to obtain technical solutions to the optimization problem. http://ctj.sagepub.com

Part of the difficulty might be due to the fact that the optimization problem is being phrased too broadly and there are other factors that need to be taken into consideration. The pharmaceutical industry is always looking for better and more efficient ways to develop products. We want to keep the quality high under the constraints of cost and speed. CATIE was unique in its scope and objectives. It was also very expensive. It cost the NIMH approximately US$43 million 10 years ago (not including the cost of the drugs). Recently, there have been efforts to compare active treatments indirectly using data from randomized clinical trials. We need to start thinking about comparisons among active treatments using other data sources also. Finally, I like Dr Kidwell’s idea of forming collaborations. I believe we can do much more through collaboration and partnership. Dean Follmann, PhD, NIAID: I thought Drs Collins and Moodie were using DTRs as a way to craft an overall treatment package. In Dr Moodie’s case, it was to come up with some optimal way to dose warfarin using patient history and also the understanding of the pharmacokinetics and pharmacodynamics of how warfarin behaves in the body and could be evaluated in a clinical trial. Dr Collins’ overall objective was to find the best intervention to stop smoking with a price tag US$500. Once she comes up with a specific strategy, it can be evaluated in subsequent clinical trials. These are engineering or learning/optimization roles for DTRs. I didn’t really know a lot about DTRs when I got the speakers’ slides. Coming into it, I was kind of intimidated because I thought it was a complicated literature. I also thought it often involved observational studies, which adds the need for a whole host of methods to correct for selection of treatment, Clinical Trials 2014; 11: 457–466

Downloaded from ctj.sagepub.com at NORTH DAKOTA STATE UNIV LIB on May 24, 2015

460

C Chuang-Stein et al.

Figure 3. SMART design for two approaches to treatment of attention-deficit hyperactivity disorder.

Figure 4. Masterseed seed trial.

assignment bias, and confounding. I was pleasantly surprised by most of what I heard today, as they were randomized studies but with kind of a twist. I think DTRs, when described in their generality are not simple. But, if you want to answer multiple questions, DTR designs may be more efficient than addressing separate questions in sequential clinical trials in which you would randomize people, answer the first-line question, randomize other people and answer the second-line question, and so on. Three of the talks described similar DTR designs in which people are randomized to initial therapy and then on the basis of the observed response are then randomized or may be assigned to a secondline treatment. I will be using Dr Collins’ slide (Figure 3) on children with attention-deficit hyperactivity disorder to dissect how I learned to understand what these kinds of studies were doing. Dr Collins’ study involves randomization to behavioral modification (BMOD) or medication. After a period of time, you see whether patients respond or not. The nonresponders are subsequently randomized to something else (either remain with the same strategy or enhance, or augment with the other strategy). What can be extracted from this DTR that’s reliable and understandable? First, you can answer the first-line question: Is medication better than BMOD? You compare the first three boxes to the bottom three boxes. That’s just a two-arm clinical trial comparison (SG1 + SG2 + SG3) versus (SG4 + SG5 + SG6). The second comparison addresses the best rescue therapy or second-line therapy for people who did not respond. Here, we have two randomizations, basically a stratified clinical trial where the two strata are non-response to medication, that’s SG2 and SG3, and the second one is non-response to BMOD (SG5 and SG6). That’s another clinical trial (with just two strata), and we know how to deal with those. What I

wasn’t familiar with and I first found curious, and then kind of intriguing, was the best strategy question. Several of the speakers pointed out that, in fact, hidden within the sequentially randomized trial, you can uncover a strategy trial just as if you had randomized people to four different treatments from the start. One thing I was also struck by when I was reading about DTRs is the connection with factorial studies, which have a long history in agriculture, and so I came up with a fictional example of a seed company, say Masterseed, that is interested in two factors: one is whether in March you should use manure or anhydrous fertilizer; second, what should you do with weedy fields that you might discover in June? Should you use chemical or mechanical weeding for the weedy fields? You can imagine the seed company going around to the different farms and randomizing farmers at the start in March and saying, ‘okay, you are going to be getting manure for the fertilizer and chemical if the field is weedy in June’. This is just a standard two-by-two factorial trial. I show the design in Figure 4 below, modifying Dr Collins’ slide. We have time: March, June, and October; we have a nice endpoint, which is the corn harvest; and it looks exactly the same as a factorial trial. You can construct these trials by randomization to the four different strategies at the outset. One thing that factorial studies emphasize is parsimony in the analysis and also the design. If you have a situation where you disallow interactions and just have main effects, there is no increase in sample size and you can get two studies for the price of one. Also, you can do fractional factorials, where you might be interested in four factors, but you don’t have to explore all 24 possibilities. If you disallow certain higher order interactions, you can come up with an efficient design that answers questions that you are interested in.

Clinical Trials 2014; 11: 457–466

http://ctj.sagepub.com Downloaded from ctj.sagepub.com at NORTH DAKOTA STATE UNIV LIB on May 24, 2015

Dynamic treatment regimes (afternoon session)

Figure 5. Factorial design with two analyses: (1) compare anhydrous versus manure using two strata – herbicide and weeding and (2) compare herbicide to weeding in the weedy fields using two strata – anhydrous weedy fields and manure weedy fields.

I was trying to figure out how on earth you uncover a clinical trial when you do sequential randomization. Well, if you think about randomization in March, it is very clear in my mind how you could view it as four different strategies. This leads to the next slide, which is the standard factorial analysis, where we have the corn yield in each of the four cells (Figure 5). We could do a main effects analysis for anhydrous versus manure, and we could also do a main effects analysis of herbicide versus weeding as standard analyses of the factorial study. With the DTRs and the sequential randomization, it is a little more complicated. What happens is that halfway through the growing season, we identify which fields are weedy and which are not, and then this allows us to do an analysis where we restrict the comparison of weed versus herbicide to those fields that are weedy. There is no sense to do the analysis that I just mentioned of the main effects of herbicide versus weeding because that’s averaging over essentially weedy and non-weedy fields or responders and non-responders. Dr Kidwell discussed the SMART design using several oncology examples [6] where its use is quite advantageous since most cancer patients require sequential treatments. In her examples, she contrasts the use of stage-specific comparisons and complete follow-through using the same patients in a SMART. Another strategy is to just pick the best of the four means in a factorial design, but that is not invoking the parsimony that you might expect or hope for within a factorial study. If you have a very complicated or a very high dimensional DTR, you are paying a big price if you allow all possible interactions. We know from the factorial study literature that often we can focus on main effects and lower order interactions. The factorial construction I am talking about is only possible if you do it by design, and so if you have different rescue therapies for the http://ctj.sagepub.com

461

BMOD, you can’t do a factorial study and hope for the parsimony that I am talking about. You basically have to look at the strategy-type studies and look at the four different strategies. In conclusion, I think this is an exciting new field. It is a rigorous approach that uses randomization to test the different strategies. When you think about these as factorial studies, it might be helpful to think about what we’ve learned from decades of work in factorial studies, and if you are sort of on the fence, if you want to have different interventions for the second-line therapy versus using a factorial design, it might make sense to use a factorial design because it would probably be more efficient. Finally, I think the DTRs will be viewed as complicated to some collaborators, and so you will have to talk about how they are efficient, how you can answer multiple questions all in the context of a single design and find simple ways to convey these messages. Estelle Russek-Cohen, PhD, FDA: I am providing my personal views and not necessarily those of the FDA. There are some points raised that are related to my FDA position and some because I was a faculty member in the College of Agriculture at the University of Maryland. First, some general comments. There are several issues in the implementation of DTRs that have regulatory implications. Informed consent can be a big issue in some of these trials if, for example, they look fairly complex or use multiple drugs. FDA makes a big distinction between exploratory and confirmatory studies and generally doesn’t regard a Phase 2 study as confirmatory. The I-SPY-2 trial [7] on Neoadjuvant and Personalized Adaptive Novel Agents to Treat Breast Cancer is an adaptive Phase 2 design. The goal is to use biomarkers in order to select the right combinations of therapies for each study participant. The reason I think it is important to mention this study is that the investigators used a two-step informed consent process, which I thought was unusual. The first informed consent dealt with participating in the biomarker selection process; then, once the subset of treatments participants were going to be exposed to was determined, participants received yet another informed consent form to sign. I had wondered what the informed consent would look like initially when a study can have up to six or eight therapeutic agents and you can’t tell a patient which one they are going to get before they go on to the trial. What also made I-SPY-2 (Investigation of Serial Studies to Predict Your Therapeutic Response with Imaging and Molecular Analysis-2) interesting was that the National Institute of Health (NIH) was effectively the study sponsor. The companies came in under a National Cancer Institute (NCI) umbrella. They all had their own individual Investigational New Drug (IND) investigational Clinical Trials 2014; 11: 457–466

Downloaded from ctj.sagepub.com at NORTH DAKOTA STATE UNIV LIB on May 24, 2015

462

C Chuang-Stein et al.

protocols for Phase 1 to show that they had passed the minimum bar for safety, but the Phase 2 trial was actually sponsored by NCI; having a single sponsor would also be a possibility if multiple drugs for similar indications are being studied in a SMART trial. The ability to interpret efficacy endpoints in DTRs is a concern at FDA. When the shift from one treatment step to another is close in time and you make a lot of modifications in a short timeframe, you need to think about the interpretation of the outcomes. While efficacy is of course important for FDA, we also look at safety. If you start using more than one product, we are going to worry about safety of multiple products being put together, whether because of simultaneous or sequential administration. Occasionally, there could be carryover effects that could potentially impact which products can be used in what order. In many of the talks today, I didn’t get a sense of the timing of the start and end of each study period and possible carryover effects that you might try to minimize. In the papers presented today, I also thought there wasn’t enough attention to power, and I wonder how difficult it might be to publish these results if effects are not significant. Companies certainly worry about those things when they come in to see us. In the context of Dr Collins’ attention-deficit hyperactivity disorder example, I would say diagnosis is very hard; so in the context of investigating some BMODs, the intervention may work because the child didn’t have attention-deficit hyperactivity disorder to begin with. In these studies, I worry about short-term and long-term effectiveness and whether BMOD is a long-term strategy. I think one of the reasons many health providers like pills is because it is easy. It doesn’t mean it is the right strategy. I do think BMOD should be the first thing that you try. But I wonder if the timeframes for the two treatment arms (behavioral and medical) are exactly the same and whether that might affect some of the dynamic approaches you might think about. The other concern I have with regard to all these complex interventions in adaptive studies is protocol adherence. There was a huge dropout rate in at least one or two of the studies presented today, and I do wonder if all these multiple layers add a problematic complexity; if more and more people drop out with each step, outcomes will become harder and harder to interpret. I think you might want to restrict how many different steps you have in your trial, in order to have something interpretable when you are finally done. I am not sure if patients will drop out more because you keep switching interventions or if they think something is really going well, so they stay with the study. So I can’t tell whether these transitions encourage or discourage compliance, and maybe it depends on the particular intervention.

Dr Moodie examined prothrombin time tests as reported by medical devices used with the warfarin treatment. You will never get an International Normalized Ratio of exactly 2.5 because all lab tests have measurement error. In real life, you are going to discover there is measurement error in both International Normalized Ratio values and subsequent dosing of warfarin. There have been many efforts associated with modeling warfarin and International Normalized Ratio, and I wonder how these various approaches would compare, perhaps first by simulation and then maybe in a real trial so you find what could be the most viable alternative candidate. I also wondered if you could potentially train the model using a big dataset so when a new patient came in you use the database as a prior set of information because patients stay on warfarin for years and then try and tailor the dosing strategy more to the individual as you go along. Then you can see if the algorithm derived from the big database works or not, and if it doesn’t, then figure out how you can update the coefficients at a more individual level. The biggest concern I have about these kinds of approaches is whether the recommendations could cause a real medical problem for the patient; for example, if the dose is changed too radically when good practice requires a more gradual approach. I don’t know whether that is an issue in warfarin or not. Dosing isn’t truly continuous. If you take warfarin orally, a model may need to consider the real-life inexactness of dose constraints and measurement errors. What we often see at the FDA is that patients are entered into trial for a second- or third-line therapy, and you may have no clue as to precisely what firstline therapy they had and now you are going to recruit those who didn’t respond to the first-line treatment. With Dr Kidwell’s approach, you would have more predictability with regard to what they get at each stage. I wonder if we randomized patients to rescue therapy at time of progression, whether progression-free survival would be any more interpretable (FDA doesn’t require demonstration of benefit on overall survival for every type of cancer in every stage of disease). The ideal is when progression-free survival is a very good surrogate for overall survival; in such cases, FDA would be more likely to accept the earlier endpoint if it was measured precisely. In melanoma trials, for example, we have seen endpoints besides overall survival (e.g., disease-free survival), but this can vary with the indication. Early stage may be very different from late stage. There has to be evidence of a clear clinical benefit. I don’t mean to say that overall survival isn’t the most common endpoint in oncology trials, just that it is not the only possibility. But a concern in this type of study is the issue of compliance and big dropout rates in some

Clinical Trials 2014; 11: 457–466

http://ctj.sagepub.com Downloaded from ctj.sagepub.com at NORTH DAKOTA STATE UNIV LIB on May 24, 2015

Dynamic treatment regimes (afternoon session) of the oncology trials, which may create problems with interpretability of results. Richard J. Chappell, PhD, University of WisconsinMadison: I note that Dr Chaung-Stein gave a disclaimer that she does not represent the views of Pfizer because there had been no consensus developed. We have not had a meeting at the University of Wisconsin, and I can guarantee you that if we did there would certainly not have been a faculty consensus. I too then represent myself. I looked at the four talks in terms of practicality. After reviewing the presentations earlier, I came to some general (but not comprehensive) conclusions. The first was separability of intervention components, to use Dr Collins’ terminology. I particularize the definition to constitute any aspect of an intervention that can be separated out for study. For that purpose, the components must at least be potentially sequential in order to separate them for a DTR. Traditional chemotherapies are cytotoxic, and thus they usually have efficacy immediately apparent in some way. For many newer chemotherapies along with some of those in radiation oncology (cytostatic therapies), efficacy is determined only in the long term, which means that follow-up is a big issue. What happens if a subject’s treatment initially appears to be a failure? We are taught never to say that the subject failed, rather only his or her treatment. So suppose there is stable tumor burden, which is not a response, but it is later a success, with stable tumor burden over several years perhaps. You can summarize by saying that the treatment was a failure, but the patient lived, inverting the traditional joke, but that kind of thing can happen more and more lately; my point is that long-term followup is crucial here. If your treatment intervention components are sequential, that can lead to various kinds of confusion. Dr Collins, for example, mentioned that best medical therapy is more expensive than behavioral therapy, and I would agree with that in the short term. What about efficacy over the long term? Suppose the best medical therapy need only be temporary to affect long-term results and behavioral therapy needs to be continuously administered? Also, what if one has a greater toxicity rate than the other? That also requires resources to evaluate. So many important aspects of treatment effect may not be clear without long-term follow-up. I know I am telling you things that the drug companies (or anybody who is concerned about a budget, which I guess is all of us) won’t especially like. Dr Moodie used a quick and precise endpoint for warfarin, proximity to the midrange of a bioassay. That seems like a great endpoint if you are sure of it – if you know its relevance – which I think for warfarin http://ctj.sagepub.com

463

(since we have decades of experience with it) we do know pretty well. We have lengthy experience with certain other treatments as well; over a century, for example, with radiation therapy. If we can determine how much radiation is absorbed by a certain organ, we are pretty sure of determining how likely it is to be damaged in terms of both short- and longterm toxicity, for example, but for most treatments that I have been involved with, we aren’t exactly sure of theoretical relationships with the short-term endpoint, much less a long-term one. Again, I am pleading for follow-up here. That is not just an issue regarding DTRs – it is a very old issue for adaptive trials, which have been around as long as group sequential methods have existed. I know of a study that was stopped early for obvious superiority of one treatment with respect to a survival time endpoint, and after it was stopped early, the investigators had the thrill of seeing the curves come together again and then cross significantly in the opposite direction. Then what do you do in a case of nonproportional hazards? I suppose that means that even such so-called hard endpoints like survival can be considered surrogate outcomes in the short term because their survival differences over the first year don’t necessarily predict long-term differences: ‘past performance does not necessarily guarantee future returns’. That applies, of course, to survival and to any other time-to-event endpoint. Follow-up is key. It may not be essential in all such trials, but I urge you to consider it. The need for adaptive and dynamic strategies in most trial levels is real. Dr Moodie makes a good point that trials at many or all levels should be adapted to choices of treatment and administration. Why do I say that dynamic adaptive strategies need to be conducted at most trial levels? It is because even without patient heterogeneity, Phase 1 trials in cancer, for example, won’t give you the right dose unless you are very lucky. To claim that we have the optimal dose after any typical Phase 1 trial is a matter of great faith. We may well not – we probably don’t for most Phase 1 designs – so we need to keep on doing adaptive trials of various types, including DTRs, when they are appropriate in Phase 2 and Phase 3 trials. Even in late phase trials, I think individualization can be considered. Dr Kidwell provides an excellent summary of types of trials of chemotherapy in which this approach would be useful [7] and argues why they are needed. I would like to compare the SMART trial in Figure 3 presented by Dr Collins to the randomized discontinuation trial design in Figure 6 (see also Rosner et al. [8] for a statistical assessment of the design). I view the randomized discontinuation design as an enhancement to a non-randomized Phase 2 design, rather than a degradation of a Phase 3 design. You will note that there is no initial Clinical Trials 2014; 11: 457–466

Downloaded from ctj.sagepub.com at NORTH DAKOTA STATE UNIV LIB on May 24, 2015

464

C Chuang-Stein et al.

Figure 6. Generalized schema for a randomized discontinuation trial design [9] (with permission).

randomization. The randomization is only in the middle group that achieved tumor stability (defined radiologically). The subjects in the middle group are randomized to continue or discontinue therapy and then observation of either progression or response. My point is not to say that this is a great alternative to anything presented today; however, I would like to emphasize that there are three outcomes here rather than two: good, bad, and indeterminate. You can base your further regimens on more than two different outcomes. Second, though I definitely endorse randomization, not all outcomes have to be randomized. It depends. That might help with clinical acceptance, which was mentioned by Dr Kidwell as a problem. DTRs in general are a type of factorial trial that may or may not have negative interactions. I’ve had experience in nasopharyngeal carcinoma, a kind of throat cancer in Southeast Asia, where chemotherapy very clearly helps prevent progression when added to radiation in surgery. Does that mean it works? Well, there didn’t seem to be much difference in survival yet, although we don’t have enough follow-up. Following my own warning, I should be careful there. It could well be that there is a negative interaction because those patients who got chemotherapy and then recurred are in worse shape than those patients who didn’t get chemotherapy and recurred because if you give it twice it probably won’t work. Many patients with nasopharyngeal carcinoma can get salvage chemotherapy and be cured after a recurrence. You have to be careful in this respect. Dr Kidwell ended her talk with some reasons why clinicians did not like, or might not like, DTRs. One

of them was its ‘black box’ nature and it is true, they don’t like black boxes dictating treatment to their patients and I don’t blame them. A way out of that is to conduct lots of simulations so you can show them: ‘if this happens then we will do one strategy and if that happens then we will do the other thing’. Give them lots of examples. Another thing she said is that drug companies may not like inter-drug comparisons. That is traditionally true for superiority trials. Your stockholders won’t be delighted with you for proving your drug to be worse then somebody else’s. But non-inferiority trials, are all the rage, in both senses of that word. It is getting popular to have inter-treatment comparisons with non-inferiority objectives, and those might be used for DTRs. Linda Collins, PhD, Pennsylvania State University: I really liked Dr Chappell’s idea that components, in order to be separable, must be potentially sequential. A lot of times people will ask me how you figure out what is a component or can I separate things out? Dr Susan Murphy and I talk all the time about how SMART trials are a variation on factorial experiments. To me, I always see them that way. She, of course, knows they are variations on factorial experiments, but tends not to think about that when she is planning a SMART trial. Dr Follman, I really liked your putting SMARTs in the general context of factorial designs. If, as you pointed out, as you are going out across the branches of treatments, the treatments aren’t the same (and sometimes it does not make sense for them to be the same) then that initial main effect is potentially confounded with some of the later treatments. It is still a

Clinical Trials 2014; 11: 457–466

http://ctj.sagepub.com Downloaded from ctj.sagepub.com at NORTH DAKOTA STATE UNIV LIB on May 24, 2015

Dynamic treatment regimes (afternoon session) factorial design, but an unbalanced factorial design, making it more complex to design and analyze. Kelley M. Kidwell, PhD, University of Michigan: A lot of people have brought up a good point about dropout and how specifically we take this into account with DTRs. I would like to point you all to Dr Wang’s 2012 paper [10], which really does address this in the context of DTRs. We do have to think about where the dropout will occur and how we can account for dropout in these regimes in order to have something that is interpretable. Of course, if we are doing these in an earlier stage of development, we can figure out where this dropout is likely to occur and try to have some sort of interventions to limit dropout, which could be very helpful in terms of medication, not just for that particular trial, but as we go forward in treating the population of the disease. John Whyte, Moss Rehabilitation Research Institute: Comments have been made throughout the day that one of the advantages of SMART trials is that they approximate what we think medical practice should be like, that is, individualized decision making with patients. I’ve generally assumed that in traditional trials a lot of dropout is due to the rigidity of the trial and the fact that what they are looking for, at a certain point, is something they can no longer get from the trial. When we see very high dropout rates from SMART trials, I am not quite sure how to think about that. Is it that there is still some aspect of research burden despite the tailoring and individualization or are we looking at what happens in real practice? That is, after you try two things and you still are not doing well, you just say forget it or you go to another doctor or you drop out of the clinical practice as well. Has there been any research looking at who the dropouts are, whether they are going back into even more tailored care, that is, general clinical care, or whether it is attributes of the study design that are pushing them? Kelley M. Kidwell, PhD, University of Michigan School of Public Health: I am not sure if there has been specific research looking at dropout, but I think that what we do need to do is endorse more of this pilot study work on these SMART designs to sort out and to understand why people are dropping out and to be able to try to minimize it for a larger study so that we can make better inferences. I think perhaps we need to be starting smaller if we are going into this world because we do not have quite enough experience yet and then move into larger studies with approaches to minimize dropout.

http://ctj.sagepub.com

465

Christy Chuang-Stein, PhD, Pfizer: Pharmaceutical companies are very interested in finding out how patients adhere to their medications in the real world setting. We have conducted extensive studies of how long patients stayed on their prescriptions. For some chronic conditions, it is not unusual that more than 50% of the patients no longer took their medications at the 1-year mark even though we don’t know the specific reasons why they discontinued their medications. In addition, we were surprised to find out how many patients did not come back for refills after the pharmacy filled their prescriptions initially. This means that, as a society, we have a large number of patients who are not compliant. Just ask yourself – what are the reasons that motivate you to go back to the pharmacy to get your prescriptions filled regularly? A large portion of the patient population simply does not do that, despite the ease of getting refills for most of chronic medications nowadays. Bibhas Chakraborty, PhD, Columbia University: I have a comment about both dropout and adherence. As far as I can tell, there has not yet been any systematic study in the context of SMART for either of these two very important issues. Nonetheless, I would argue that a SMART study protocol is probably better than a standard randomized clinical trial in terms of encouraging adherence and discouraging dropout. Think of a standard randomized trial: if someone is randomized to Treatment A, and Treatment A does not work (or at least does not seem to work) for that individual, then she or he doesn’t have a choice but to either stop taking the treatment (non-adherence) or drop out altogether. On the other hand, if someone is in a SMART, then she or he should have an extra incentive to continue following the protocol because she or he knows if she or he is a non-responder, she or he will have a second chance of getting randomized to something different, something that might work. This very knowledge should give people some motivation for either adhering to the treatment or not dropping out, and waiting a little longer for the next randomization. This is simply a conjecture, and needs verification at least via some sort of simulation studies. At any rate I think it would be very important to study these things systematically in real studies. Erica E. M. Moodie, PhD, McGill University: The enrichment guidance that Dr Chuang-Stein mentioned [5] has a suggestion that one way to do an enrichment trial is to do a run-in with a placebo to see who actually follows the protocol, and keep those patients on the trial. I think that that’s a really good idea that might reduce the dropout rate. I don’t

Clinical Trials 2014; 11: 457–466 Downloaded from ctj.sagepub.com at NORTH DAKOTA STATE UNIV LIB on May 24, 2015

466

C Chuang-Stein et al.

know how you would do that in a blinded oncology study, but in some of the other medication areas it might be very viable. Richard J. Chappell, PhD, University of Wisconsin-Madison: I agree with Dr Moodie. I worked on a trial of vitamin K where there had to be a 2-month run-in with vitamin D and calcium for bone strength [11]. That served a double purpose of making sure participants had adequate levels of those nutrients plus making sure they came back, and that worked very well. There was quite a low dropout after the randomization and a fair amount of dropout during the run-in, which was a better place for it. Susan Ellenberg, PhD, University of Pennsylvania: I would like to thank everyone here for your attention and your questions, with special thanks to our panelists and speakers. The need for this kind of design was something that was apparent to me back in the late 1980s when I started working in AIDS therapies and people were changing treatments all the time. We talked about it then, but did not have either the energy or the time (or maybe the ‘SMARTs’) to do anything about it. It is really a pleasure to see how this area has developed.

Participants Benjamin French, PhD, University of Pennsylvania, Moderator Christy Chuang-Stein, PhD – Pfizer Dean Follmann, PhD, NIAID Estelle Russek-Cohen, PhD, FDA Richard J. Chappell, PhD, University of WisconsinMadison Linda Collins, PhD, Pennsylvania State University, Kevin Lynch, PhD, University of Pennsylvania Kelley M. Kidwell, PhD, University of Michigan School of Public Health John Whyte, Moss Rehabilitation Research Institute Bibhas Chakraborty, PhD, Columbia University Susan S. Ellenberg, PhD, University of Pennsylvania

Funding Funding for this conference was made possible (in part) by 2 R13 CA132565-06 from the National Cancer Institute. The views expressed in written conference materials or publications and by speakers and moderators do not necessarily reflect the official policies of the Department of Health and Human

Services; nor does mention by trade names, commercial practices, or organizations imply endorsement by the U.S. Government.

Conflicts of interest The views expressed in written conference materials or publications and by speakers and moderators do not necessarily reflect the official policies of the Department of Health and Human Services, nor does mention by trade names, commercial practices, or organizations imply endorsement by the US Government.

References 1. Almirall D, Lizotte D, Murphy SA. SMART design issues and the consideration of opposing outcomes. Discussion of ‘Evaluation of viable dynamic treatment regimes in a sequentially randomized trial of advanced prostate cancer’ by Wang, Rotnitzky, Lin, Millikan, and Thall. J Am Stat Assoc 2012; 107: 509–12 (PMCID: PMC3607391). 2. Chakraborty B, Laber B, Zhao Y. Inference for optimal dynamic treatment regimes using an adaptive m-out-of-n bootstrap scheme. Biometrics 2013; 69(3): 714–23 (PMCID: PMC3864701). 3. Gallo P, Chuang-Stein C, Dragalin V, et al. Adaptive designs in clinical drug development – An executive summary of the PhRMA working group. J Biopharm Stat 2016; 16: 275–83 (PMID 16724485). 4. http://www.nimh.nih.gov/funding/clinical-trials-for-resear chers/practical/catie/index.shtml (accessed 29 May 2014). 5. http://www.fda.gov/downloads/Drugs/GuidanceCompliance RegulatoryInformation/Guidances/UCM332181.pdf (accessed 20 April 2014). 6. Kidwell K. SMART designs in cancer research: Past, present and future. Clin Trials 2014; PMID 24733671 (Epub ahead of print). 7. http://clinicaltrials.gov/show/NCT01042379 (accessed 20 April 2014). 8. Rosner GL, Stadler W, Ratain MJ. Randomized discontinuation design: Application to cytostatic antineoplastic agents. J Clin Oncol 2002; 20: 4478–84 (PMID 12431972). 9. Stadler WM. The randomized discontinuation trial: A phase II design to assess growth-inhibitory agents. Mol Cancer Ther 2007; 6: 1180–85 (PMID 17431101). 10. Wang L, Rotnitzky A, Lin X, Millikan RE, Thall PF. Evaluation of viable dynamic treatment regimes in a sequentially randomized trial of advanced prostate cancer. J Am Stat Assoc 2012; 107(498): 493–508 (PMCID PMC3433243). 11. Binkley N1, Harke J, Krueger D, et al. Vitamin K treatment reduces undercarboxylated osteocalcin but does not alter bone turnover, density, or geometry in healthy postmenopausal North American women. J Bone Miner Res 2009; 24(6): 983–91 (PMCID: PMC2684650).

Clinical Trials 2014; 11: 457–466

http://ctj.sagepub.com Downloaded from ctj.sagepub.com at NORTH DAKOTA STATE UNIV LIB on May 24, 2015

University of Pennsylvania 6th annual conference on statistical issues in clinical trials: Dynamic treatment regimes (afternoon session).

University of Pennsylvania 6th annual conference on statistical issues in clinical trials: Dynamic treatment regimes (afternoon session). - PDF Download Free
829KB Sizes 0 Downloads 4 Views