CROSSOVER DESIGN IN PHARMACY RESEARCH Rebecca L. Cody and Marion K. Slack

OBJECI1VE: Reports of pharmacy research using crossover designs were reviewed to determine if the studies adequately consider interaction effects and use appropriate statistical analyses. DATA SOURCES: All crossover studies published in DIep, The Annals ofPharmacotherapy during 1988 and 1989 were analyzed.

Reports of crossover studies were included only if at least two treatments were applied in a different order to two or more groups of subjects.

STUDY SELECTION:

DATA EXTRACTION: The principal characteristics of crossover studies and the critical design variables were listed and each study analyzed according to these variables. The critical design variables included consideration of period, sequence, and carryover effects as well as the presentation of data by groups and the use of multivariate statistical analysis. The analysis was conducted independently by each author and conflicts were discussed until consensus was obtained.

RESULTS: A total of II crossover studies were identified: 6 were bioavailability trials, 3 were treatment comparisons, and 2 had multiple objectives. The possibility of period, sequence, or carryover effects was less with bioavailability studies than with treatment comparisons. Only I study presented data by group and only 4 studies used multivariate analysis.

The crossover design appears more appropriate for bioavailability trials than for treatment trials in pharmacy research. Analysis of data from crossover designs could be improved by presenting the data for each treatment group and using multivariate statistical analysis.

CONCLUSIONS:

Ann Pharmacother 1992;26:327-33.

but potentially disastrous research designs.' The crossover study is a research design that uses the same subject for two or more treatment trials, allowing the researchers to compare the subject's response between trials. Because this design removes between-patient variation from the calculation of the effects, crossover designs are very powerful; some investigators have found statistically significant differences in crossover studies with as few as four subjects! However, if the design and statistical requirements of crossover studies are not met, the study results are potentially worthless. The design requirements arise from the nature of the crossover design. As shown in Figure I, one group of paCROSSOVER DESIGNS ARE POWERFUL

REBECCA L. CODY, Pharm.D., is a Pharmacist, Indian Health Service, Fort Defiance Service Unit, Fort Defiance, AZ; and MARION K. SLACK, Ph.D., is an Adjunct Lecturer, Department of Pharmacy Practice, College of Pharmacy, University of Arizona. Tucson, AZ 85721. Reprints: Marion K. Slack, Ph.D.

tients (group I) receives the A treatment and then crosses over to the B treatment. Thesecond group of patients (group 2) receives the B treatment and then crosses over to the A treatment. In this example, the crossover design uses two treatments and two time periods. Crossover designs may involve more than two treatments and more than two time periods; however, the more complex designs are not discussed here. The treatments in a crossover design are evaluated by combining the effects of the A treatments (A I + A2), combining the effects of the B treatments (B I + B2), and then comparing A with B. The effect of the treatment usually is measured at the end of each treatment period; however, some studies use multiple measurements. For the comparison to be valid, the effect of each drug cannot be influenced by either the time period in which the treatment was administered or the order in which the treatment was given. That is, patients are assumed to be in the same state at the beginning of the second treatment period as they were at the beginning of the first treatment period. Differences in the effectiveness of the drugs between the treatment periods resulting from the passage of time are known as period effects. Period effects occur because patients are observed at least twice and their condition may change between the first and second observations.! For example, depressed patients may be less depressed during the second treatment period simply because depression tends to improve over time. Other examples of period effects include learning effects, the development of tolerance or resistance, and changes in the disease state. In general, psychological variables, including variables such as pain, may be vulnerable to period effects. The patient may learn to expect a certain amount of pain relief during the first treatment period and then use those expectations to evaluate pain relief in the second treatment period so that the evaluations change from one time period to the next. No period effects are present when the sums of the treatments across time periods are equal; that is, in Figure 2 (AI + B2) equals (A2 +Bl). Period effects appear in graphs of time versus treatment means as an increase or a decrease in the treatment means from the first period to the second period.' In the example shown in Figure 3, the treatment appears to be less effective secondary to the development of tolerance. Period effects that influence both treatment conditions equally do not affect the interpretation of a difference between treatments; that is, the difference between treatments remains

The Annals ofPharmacotherapy



1992 March, Volume 26



327

the same. However, period effects will increase within-person variability, which reduces the power of the design and decreases the advantage of a crossover design. Changes in the effectiveness of the drug treatment produced by the order in which the drugs were administered are known as sequence effects.' In the absence of sequence effects, treatments Al + BI are equivalent to treatments A2 + B2. If A is more effective relative to treatment B when it is given during the first treatment period than during the second treatment period, or vice versa, then a sequence effect is present. For example, a placebo treatment for weight reduction may be more effective when patients first start treatment and weight levels are high than later when weight levels are lower. The graph in Figure 4 shows a sequence effect: one drug (B) is more effective in period one than period two so that the difference between the treatments is less in period one than in period two. Sequence effects appear statistically as interactions. Interactions affect the interpretation of the results because the magnitude of the treatment differences is not consistent. A third possible effect in crossover designs is a carryover effect. We consider carryover effects to occur when the drug given during the first period persists into the second period. Some authors consider any residual effect of the drug given in the first period that continues into the second period to be carryover effects.' Because residual effects may include learning effects or physiologic changes that we considered as period effects, we have restricted the

Al

~2

Dependent Variable

~

Bl

1

2 Periods

Figure 3. Graph of equal period effects. The graph shows a difference between the treatments but the slope indicates a period effect such as would be expected if patients developed tolerance.

Al

Dependent Variable

A2

B~

Bl

1 Time Period Period 1

Period 2

2 Periods

Figure 4. Graph of a sequence effect. Treatment B is more effective in period 1. The difference in drug effect is greater in period 2; hence. an interaction is present.

SequenceAB (Group 1)

Al

Bl

B2

A2

Sequence BA (Group 2)

Figure I. Basic layout of a crossover design. Patients should be randomly assigned to groups.

Al

A2

B2

Bl

Dependent Variable

2

1

Periods Figure 2. Graph of no period effects. The graph shows a difference between treatment A and treatment B but no period effects.

328 •

The Annals ofPharmacotherapy



definition of carryover effects to persistence of the drug. Unlike sequence effects, which can appear in either the first or second treatment periods, carryover effects affect the treatment response only in the second time period. Carryover effects can appear in the statistical analysis as either a period or sequence effect. If the carryover is equivalent for drug A and B, so that the effect in the second period is increased or decreased equally over the effect in the first period, the carryover will appear as a period effect. If the carryover is different for each drug in the second period, the carryover will appear as a sequence effect, with sequence A-B having a different total effect than sequence B-A. Because carryover effects are associated with the continued presence of the drug administered in the first period into the second period, carryover effects can be eliminated by using a washout period. Washout periods are time intervals without treatment that allow the patient to return to baseline levels before the second treatment is started.' The ability to remove the influence of carryover effects through the use of a washout period differentiates carryover and period effects. Period effects represent long-term or permanent changes in the subject that are unlikely to be eliminated with a washout period; carryover effects represent temporary changes secondary to continued presence of the drug. Carryover effects are best exemplified by comparing drugs with different half-lives when the effects of the drug

1992 March, Volume 26

Research/Practice

with a long half-life persist into the second treatmentperiod but the effects of the drug with a short half-life do not. Hence, the drug with the short half-life may appear more effective during the second treatment period than the first. A graph of the means would show an interaction effect. As the above discussion indicates, statistical interactions appear if a sequence effect is present or if unequal carryover effects are present. In addition, statistical interactions are produced by period effects if the period effect is different for each treatment condition. For example, tolerance that develops faster for one drug than for the other will produce unequal period effects that appear as an interaction in the statistical analysis.Therefore, a statistical interaction indicatesonly that the treatmenteffects are not consistent across time periods; the interaction does not indicate the cause of the inconsistency. We have described period effects as being associated with changes in the patient that are usually long-term or permanent,carryovereffects as the persistence of the drug from one treatmentperiodto the next, and sequenceeffects as the differences in response secondary to the order of treatmentadministration. Other discussions in the literature may differentiate the effects on the basis of their appearance in the statistical analysisv or refer to all effectsas carryover effects." We found a strictly statistical presentation to be the least helpful when designing and evaluatingclinical studies because it does not indicate how to modify the research designto avoid interaction effects. The statistical requirements of a crossover study are more complex than for a parallel-groups design. In general, the statistical analysis should begin by graphing the treatment means versusthe time period.' Simple inspection of the graph will indicate if period or sequence effects are likely. The statistical analysis then should proceed with testing for period and sequence effects as well as testing for a treatment effect. Although the three different effects can be assessed using multipler-tests,'a more efficientand informative method is to use a multivariate analysis such as repeatedmeasuresANOVA. With multivariate analysis, the three effectsare assessedsimultaneously.' If the interaction effects are close to zero, the analysis and interpretation of the treatmenteffectcan proceeddisregarding the interaction. However, if the interaction is not close to zero then the interaction is evaluated for clinical importance or statistical significance. Some authors have recommended that the interaction be tested statistically with an independentgroups t-test using a p-value of 0.10;4 however, the crossover design may lack the power needed to detect a clinically important interaction. Therefore, the researcher must assess the clinical meaning of the interaction and provide assurance to the reader that the interaction is not clinically important. If the interaction is clinically important then only the data from the first treatmentperiod are used, transforming the crossover study into a parallelgroupsdesign.' Becausethe design and statistical requirements are more demanding for crossover studies, we reviewed the crossover studiespublishedin a pharmacyjournal to determineif the studies reported adequately considered period, sequence, and carryover effects, and used an appropriate statistical analysis. Our goal is to increase reader and researcher awareness of the problems uniqueto crossoverdesignsand to facilitate improvement of future crossover studies.

Methods To assess crossoverdesigns as currently used in pharmacyresearch, we identified all such studiesappearing in D1CP. TheAnnals of Pharmacotherapyfor the years 1988and 1989.We defmeda crossoverstudyas one that applied at least two treatments in a different order to two or more groups of subjects. We excluded from our review any study in whicha singlegroupof subjects receiveda set sequenceof treatments. The literature on the designand analysisof crossoverdesignswas reviewed in order to identify those concepts we considered to be critical in the utilization of crossoverstudies. From the literature review, we identified and defmedeight study design and analysisvariables (Appendix I). The criteria were established to assess design problems such as period, sequence, and carryovereffectsand to assess the adequacy of the statistical analysis. Because the analysis of crossover designs is rarely discussed in standardstatistical texts, we examinedthe referencelist to determinewhatresources werebeingused.Each studywas assigned a reference numberand analyzed in lightof thesevariables. Becauseconducting research using crossover designs is difficult, we realize that our criteria representan ideal and that most "real world" crossoverstudies will not meet all criteria.

Results A total of 11 crossover studies were identifiedin DICP from 1988 and 1989: six studies were bioavailability trials, three were treatment comparisons, and two had multiple objectives.v"Table 1 indicates that most of the studiesrandomized their subjects and included a washoutperiod. All but one study used a crossover rule based on time; that is, the crossover point was dependent on the passage of time rather than on the patient's response to therapy. However, less than half the studies addressed the question of blinding. In assessing the studiesagainstthe study designanalysis variables, we found that, in general, the bioavailability studies met our criteria better than the treatment comparisons (Table 2). Most of the bioavailability studies were concerned with single doses of drug so that carryover effects were unlikely. However, two studiesv" examined bioavailability in patients alreadyreceiving the drugs of interest, thus increasing the likelihood of carryover effects. Period effects were not likely for any of the bioavailability studies but were present in one of the treatment comparisons II and were possible in a second treatment comparison," Overall the studies failed to meet our statisticalcriteria. Only one study provided data for the readers by groups and only four studiesutilizedmultivariate analysis.'In five studies the statistical analysis reported did not agree with the analysis described in the methodssections. 1\\'0 studies did not discuss the statistical analysis in the methods sections. In addition, only two studies listed references to the statistical analysisof crossoverdesigns. Discussion In the course of conducting the review, two cardinal points began to take form and are worthy of discussion. First and foremost is whether or not a crossover design is appropriate for one's investigation. The crossover design seems particularly well-suited to bioavailability studies because interactions are unlikely in a classic bioavailability study and because patients serve as their own controls, thus decreasing intersubject variability. This sourceof variance can be quite large, even among subjects matched on such backgroundvariables as age, sex, height, and weight

The Annals ofPharmacotherapy



1992 March. Volume 26 • 329

riod. However, the validity of this assumption has not been investigated. The assumption of no carryover effects should be tested, even in crossover designs limited to single doses, if the drug involved is known to be an auto-inducer. Avoiding interactions in a treatment trial is much more difficult. As in bioavailability trials, collecting baseline data, using adequate washout periods and basing the crossover point on elapsed time will minimize possible interactions. However, even with an adequate washout period, the patient may not be at baseline at the beginning of the second treatment period. Indeed, in some cases it may not be possible to reestablish this baseline state, thus making a crossover design an inappropriate choice. An example from the clinical trials reviewed in this study may help to illustrate this problem. In the potassium repletion study, patients were randomized to receive either potassium alone followed immediately by potassium with lidocaine or the reverse order of treatments. The main dependent variable was pain perception. Although both patients and care-

because of differences in metabolism or volumes of distribution. Interactions are avoided in a classic bioavailability study because these studies include: (1) collection of baseline data; (2) a washout period between trials of sufficient duration to allow serum concentrations to return to baseline values (i.e., at least four half-lives); and (3) a crossover point based on the elapse of time. All eight of the bioavailability studies we reviewed collected baseline data and seven based their crossover points on a predetermined time. Five of the bioavailability studies used washout periods in which no drug was given. However, in three of the studies,8,IO,16 the drugs being studied were chronically used by the patients enrolled (levothyroxine, theophylline, digoxin) and therefore could not ethically include a washout period. This problem was partially rectified in one study by collecting serum samples only at the end of each treatment period." Presumably, the drug administered first was eliminated and the drug administered second attained its steady-state concentration by the end of the two-week pe-

Table 1. Characteristics of 11 Crossover Studies in Dlep,' 1988 and 1989 VAR[ABLES REF.

N

TYPE OF STUDY

DRUG

phenytoin

bioavailability multiple

19 [8 6

procainamide SR levothyroxine KCI infusion theophylline

treatment treatment multiple

II

17

amphotericin B

treatment

12

16

nitrofurantoin

bioavailability

13

8

praziquantel

bioavailability

14 15 16

36 7 17

leucovorin nifedipine SR digoxin

bioavailability bioavailability bioavailability

6

6

7

10

8 9 10

[NDEPENDENT

± enteral feeding 2 products 2 products ± lidocaine sprinkle vs. tablet infusion rate

2 products to standard 3 products to standard 2 products ±food capsule vs. tablet

CROSSOVER RULE b

METHOD OF ALLOCATION

serum concentrations

time

random

not possible

yes

serum concentrations PVCs thyroid function pain, ADRs serum, saliva concentrations fever, chills, dose of meperidine serum, urine concentrations serum concentrations

time

random

not described

no

time time not described

random random random

double double not described

no no no

time

random

single

no

time

not described

single

yes

time

random

not described

yes

time time time

random not described random

not described not possible not possible

yes yes no

DEPENDENT

serum concentrations serum concentrations serum, urine concentrations

'Now The Annals ofPharmacotherapy. bRefers to how the point of crossover was determined. ADRs =adverse drug reactions; PVCs = premature ventricular contractions; SR

TYPE OF BLINDING

WASHOUT PER[OD

=sustained release.

Table 2. Assessment of Study Design and Analysis Variables

REF.

DRUG

PROVIDED DATA BY GROUP

6 7 8 9 10 II 12 13 14 15 16

phenytoin procainamide SR levothyroxine KCI infusion theophylline amphotericin B nitrofurantoin praziquantel leucovorin nifedipine SR digoxin

no yes no no no no no no no no no

NA

COLLECTED BASELINE DATA

PER[OD EFFECT LIKELY

CARRYOVER EFFECT LIKELY

USED MULTIV AR[ATE ANALYS[S

ANALYS[S REPORTED AGREES WITH METHODS

CROSSOVER REFERENCE LISTED

DROPOUTS DESCRIBED

yes yes yes yes yes no yes yes yes yes yes

no yes no possible no yes no no no no no

no possible possible possible possible possible no no no no no

no no yes yes no yes no yes no no no

yes no not stated no yes yes no yes no not stated no

no yes yes no no no no no no no no

NA NA no NA NA yes NA NA NA NA yes

=not applicable; SR =sustained release.

330



The Annals ofPharmacotherapy



1992 March, Volume 26

Research/Practice

givers were blind, learning or expectancy may have affected the results.' Patients experiencing pain when receiving potassium alone during the first treatment period may have expected to experience pain during the second treatment period; patients experiencing little or no pain during period I may not have expected to feel much pain during period 2. Additionally, both treatment groups rated their perception of pain in period I without any explicit standard of reference; however, in period 2 patients could make a relative judgment by comparison to period I.' In the analysis, the authors failed to test their data for either period or sequence effects although their presence was a possibility. Interaction effects between sequences of medication can also occur in parallel designs involving comparisons of pain medications. Indeed, a parallel-groups study may resemble a crossover study because most patients in pain studies have received previous medication that may influence their perception of pain before they are treated with (crossed over to) the study medications. Wallenstein advocates using a crossover design so that the interactions between different pain medications can be assessed. I? However, the use of a crossover design should be considered carefully when interaction effects are anticipated and the intent is to analyze them statistically. The statistical tests for interaction effects are for independent (parallel) groups so that the sample of patients needed must be large enough for parallel-groups comparisons. Therefore the economic advantage obtained from using the crossover design is no longer present. Also, the interaction effects are confounded in 2 x 2 crossover designs.' If an interaction is detected statistically, one can only describe it as an interaction effect. The analysis does not differentiate between interactions produced by unequal period effects, sequence effects, or carryover effects. In addition, to assess all interaction effects, the prestudy phase should be considered as the first treatment period followed by two treatment periods involving the study drugs. Then the study becomes a three-period crossover study. Because the treatment during the prestudy phase is not likely to be standardized, an analysis of covariance using the prestudy data or baseline data or covariates seems more practicable. With analysis of covariance, the treatment groups can remain independent and the effects of prestudy medications assessed. A viable alternative to the crossover design is the pretest! posttest design in which subjects are randomly assigned to independent groups; pretest (baseline) measures are obtained at the beginning of treatment and posttest (posttreatment) measures at the end of treatment. The groups are compared by using the pretest scores as a covariate in an ANOVA or by comparing difference scores. A difference score is obtained by subtracting the pretest score from the posttest score. By using the pretest scores as a covariate or by using a difference score, between-subject variation is removed so that the power of the pretest/posttest design is nearly equivalent to that of a crossover design. Because the treatment groups are independent and do not receive a sequence of treatments, interaction effects between treatments and treatment periods are not possible and interpretation of the results is not confounded by possible period, sequence, or carryover effects. Of the crossover studies we reviewed, the study comparing levothyroxine products seems especially suited to a

pretest/posttest design using difference scores," In the original study, the effects of two different levothyroxine products (A and B) on thyroid function were evaluated by having patients take one product for four weeks and then crossover to the second product for four weeks. Thyroid function was compared at the end of two and four weeks. To use the pretest/posttest design, patients would be randomly assigned to one of two groups: one group receives product A and one group receives product B. Baseline measurements of thyroid function are obtained at the beginning of treatment and at the end of treatment for both groups. The baseline measurements are subtracted from posttreatment measurements and the mean differences compared. If patients are entered into the study when thyroid treatment is initiated, confounding of the effects of the two products is not possible. Confounding of the effects of the two levothyroxine products seems likely in the crossover design used in the study by Curry et al. Most patients were taking product A on a chronic basis at the beginning of the study. Half of the patients then were crossed over to product B for four weeks, and returned to product A for the next four weeks; the other half of the patients continued on product A, and then were crossed over to product B.8 The authors appear to have assumed that one product was eliminated and the other product attained steady state at the time of the thyroid function measurements two and four weeks later. However, the authors offered no evidence to support their assumption and their results showed that thyroid stimulating hormone concentrations were not at baseline after four weeks of treatment. Thus, interpretation of the study results depends on the importance the reader attaches to the violation of the assumption that subjects should be at baseline at the beginning of the second treatment period and the assumption that there were no period or carryover effects from chronic treatment with product A. As described above, the same study could have been conducted using a pretest/posttest design with no possibility of confounding the effects of the two drug products. Whether or not a crossover design is appropriate also depends on how the crossover point is determined. The point of crossover should be based on elapsed time, objectively determined beforehand, and should not be contingent upon other factors or outcomes. If the study protocol precludes this arrangement then, again, crossover design is not an appropriate choice. An example of such a protocol might be a study of antihypertensives in which effectiveness is measured as a reduction in blood pressure and the patient is crossed over to another antihypertensive when a specified reduction in blood pressure has occurred. Because a patient's blood pressure will no longer be at baseline at the beginning of the second treatment period, any further change in blood pressure in period 2 will be confounded with any previous changes. Both this example and the pain example illustrate the possibility of interactions in treatment comparisons if baseline conditions are not reestablished at the beginning of the second treatment period. While reviewing these studies in terms of conducting research and reporting results, several weaknesses became apparent. The pharmacist as a scientist should not only employ sound research methods but should report the research in such a way that the reader has a clear understanding of what was done. Sound clinical research includes ran-

The Annals ofPharmacotherapy •

1992 March. Volume 26 • 331

domization and blinding of subjects as well as describing the procedures used in the research report. Nine of the 11 studies reviewed mentioned the use of randomization, but only four studies explicitly discussed blinding. All dropouts should be described as well as the method of handling attrition; only one study failed to do this. The methods section should agree with the results and analysis sections; that is, if researchers describe ANOVA in the methods section then they should report F ratios in the results section. In nearly half of our studies the results reported did not agree with the methods described. Finally, in all studies the researcher should test a single hypothesis or, if multiple hypotheses are being tested, the results and discussion of each hypothesis should be clearly delineated for the readers. Studies in which multiple experimental objectives were examined in our review'-" were considerably more difficult for the reader to interpret. Crossover treatment trials have additional data reporting requirements. The treatment data should be reported by sequence group and time period so that the readers can assess the likelihood of interactions. Only one study in our review reported the data by groups.' Although the authors did not assess the data for interaction effects in this study, a graph of the means for each sequence group indicates that an interaction may be present. In period 1, the treatment mean is higher for the A treatment; in period 2, the treatment mean is higher for the B treatment. Thus, if the researcher fails to report sequence or period effects, by using these data the readers can graph the results and discern whether interactions are possible. Even if the researcher reports the results of statistical tests for interaction effects, the readers may find the data reported by sequence group and time period of interest. For instance, in one study, the authors reported the presence of period effects but did not report treatment data across periods." Therefore, the readers cannot assess the extent of the period effect. Further, patients appeared to be developing tolerance to the adverse effects of amphotericin B infusions; readers may be interested in the rate that the tolerance was developing. The study also illustrates the necessity for a thorough understanding of period, sequence, and carryover effects by researchers using the crossover design." In this study, the statistical analysis revealed a significant period effect, so the researchers used only the data from the first treatment period. Significant period effects that are equal for each sequence group do not interfere with treatment comparisons. Thus, the data from later time periods may have been appropriately included in the analysis. Likewise, the crossover studies that we reviewed could be improved by the use of multivariate statistics. Multivariate statistics allow the researcher to assess period and sequence effects as well as treatment effects while minimizing alpha slippage secondary to multiple t-tests. However, multivariate statistics can be complex and difficult to use because the references available on the statistical analysis of crossover designs are widely scattered and often written for the statistician rather than the practitioner-researcher. In order to effectively use crossover designs, researchers need to recognize the statistical requirements and consult with a statistician prior to data collection. In conclusion, our review of crossover trials demonstrated that this design is used more appropriately in bioavail332



The Annals ofPharmacotherapy



Appendix I. Criteria for Assessing Study Design and Analysis Variables VARIABLE

Provided data by group

Collected baseline data

Period effect likely

Sequence/carryover effect likely

Used multivariate analysis Analysis reported agreed with methods Crossover reference listed

Dropouts described

CRITERIA

data was presented so that groups receiving each treatment sequence could be identified data was collected on dependent variable at beginning of study. e.g., serum concentration for drug drug affects organism (e.g., its metabolism or physiology) so that response on dependent variable is changed or no test for period effect washout period

Crossover design in pharmacy research.

Reports of pharmacy research using crossover designs were reviewed to determine if the studies adequately consider interaction effects and use appropr...
1MB Sizes 0 Downloads 0 Views