Research Article Received 26 November 2012,

Accepted 30 September 2013

Published online 21 October 2013 in Wiley Online Library

(wileyonlinelibrary.com) DOI: 10.1002/sim.6017

Twice-weighted multiple interval estimation of a marginal structural model to analyze cost-effectiveness K. S. Goldfeld* † Cost-effectiveness analysis is an important tool that can be applied to the evaluation of a health treatment or policy. When the observed costs and outcomes result from a nonrandomized treatment, making causal inference about the effects of the treatment requires special care. The challenges are compounded when the observation period is truncated for some of the study subjects. This paper presents a method of unbiased estimation of costeffectiveness using observational study data that is not fully observed. The method—twice-weighted multiple interval estimation of a marginal structural model—was developed in order to analyze the cost-effectiveness of treatment protocols for advanced dementia residents living nursing homes when they become acutely ill. A key feature of this estimation approach is that it facilitates a sensitivity analysis that identifies the potential effects of unmeasured confounding on the conclusions concerning cost-effectiveness. Copyright © 2013 John Wiley & Sons, Ltd. Keywords:

marginal structural model; causal inference; cost-effectiveness; observational data; censoring; sensitivity analysis; advanced dementia

1. Introduction Cost-effectiveness analysis (CEA) is an important part of the toolkit for evaluating the merits of a treatment or health policy. In the CEA framework, incremental gains in costs and health outcomes, typically survival or quality-adjusted survival (QAS), are combined in a single statistic. This summary statistic provides information about the incremental cost of an additional unit of health outcome under the treatment of interest. If that incremental cost is lower than what society or institutions are willing to pay, the treatment is considered cost-effective. Once the data are in hand, estimating cost-effectiveness can be fairly standard under the ideal research conditions of a well executed, randomized study with no missing data. However, in less than ideal conditions—one where the study is not randomized, there is missing data, or both—there are many potential challenges to drawing conclusions about the cost-effectiveness of the treatment. This paper presents a method of unbiased estimation that can be applied under less than ideal conditions to facilitate conclusions about the causal relationship between a treatment and both cost and health outcomes. The method was motivated by a need to analyze the cost-effectiveness of treatment protocols for advanced dementia residents living in nursing homes when they become acutely ill. Data for the application are from the Choices, Attitudes, and Strategies for Care of Advanced Dementia at the End-of-Life (CASCADE) study, a prospective cohort study conducted between February 2003 and February 2009 that described the experience of 323 nursing home residents with advanced dementia and their families [1]. Following a brief overview of the CEA analytic framework, challenges of and a solution for estimating costs and QAS with censored data are presented. Subsequently, the concepts and issues concerning analysis using observational data are discussed. An estimation method that combines solutions to the joint problems of censoring and observational data—the twice-weighted multiple interval estimator using

1222

Department of Population Health, NYU School of Medicine, New York, NY, U.S.A. *Correspondence to: Keith S. Goldfeld, Department of Population Health, NYU School of Medicine, New York, NY, U.S.A. † E-mail: [email protected]

Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1222–1241

K. S. GOLDFELD

marginal structural modeling—is presented, tested with simulation, and applied to the advanced dementia study. Finally, the key feature of the method and its application is a sensitivity analysis that evaluates the robustness of the conclusion to relaxing the important assumption of no unmeasured confounders.

2. The cost-effectiveness analysis framework Cost-effectiveness analysis evaluates the merits of a treatment or health policy by integrating measurements of costs and outcomes in a single statistic that relates costs to outcomes, either the incremental cost-effectiveness ratio (ICER) or the incremental net benefit (INB). The premise of the analysis is based on the notion that society, institutions, or individuals are willing to pay up to a specific amount for an extra unit of outcome, such as an additional quality-adjusted life year (QALY) that may result from a treatment. The INB measures the difference between the value of the change in benefits and the incremental cost and is defined as I NB  E  C D  .outcomei nt erve nt i on  outcomecont rol /  .costi nt erve nt i on  costcont rol / ; where  is defined as the willingness-to-pay (WTP) for an incremental unit of benefit. The intervention will be selected if the INB is positive. If, in a hypothetical scenario, the average cost for treatment group A is $75,000 and quality-adjusted life expectancy is 4.5 years, whereas costs in treatment group B are $20,000 with an average survival of 4.0 QALYs, the value of the INB will be $30; 000 when the WTP is $50,000 per QALY gained ($50,000/year * 0.5 years increase in life expectancy  $55,000 in incremental costs). On the basis of the negative INB, the treatment would not be deemed cost-effective. The INB statistic was selected over the incremental cost-effectiveness ratio, which is commonly used, because it can be a more stable measure when changes in benefits are small and is easier to interpret when there are reductions in benefits following treatment [2]. The cost-effectiveness outcome plane simultaneously shows the incremental change in the outcome .E / due to a treatment and the incremental cost of that treatment .C / (Figure 1). The slope  of the

Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1222–1241

1223

Figure 1. The diagram represents the cost-effectiveness analysis outcome plane. The cost-effective region of the plane is situated below the threshold defined by the line passing through the origin with slope .  represents the willingness-to-pay for an extra unit of benefit. All points that lie in the SE quadrant always fall below the threshold line and will lead to the conclusion that the treatment is preferred. Likewise, points in the NW quadrant will lead to the rejection of treatment. However, points in the NE and SW quadrants will lead to a different conclusion, depending on the relationship to the threshold. In this example, points labeled INB1 and INB2 are both positive and represent cost-effective treatments. Points INB3 and INB4 are not cost-effective.

K. S. GOLDFELD

Figure 2. The cost-effectiveness acceptability curve shows the proportion of bootstrap samples for which the incremental net benefit (INB) is positive at each level of , the amount society is willing to pay for additional year of life. For example, at a level of $25,000, the INB of 60% of the bootstrap samples exceeded $0. Given this willingness to pay, we would not conclude that the treatment under study is cost-effective. However, at a willingness to pay at or greater than $40,000 per year, 90% or more of the bootstrap samples result in a positive INB, suggesting that the treatment is more likely to be cost-effective.

line passing through the origin represents the WTP for an incremental unit of benefit. The plane comprises four quadrants. A point in the southeast quadrant results when a treatment reduces costs while increasing the desired outcome; in this case, treatment is always preferred regardless of the WTP. On the other hand, a point in the northwest quadrant occurs when the treatment increases costs but provides reduced benefit; here, treatment will always be rejected, again regardless of the WTP. In the northeast and southwest quadrants, a point below or to the right of the WTP line defined by  indicates that the treatment will be preferred over the control. In Figure 1, the cost-effective region of the plane is below the threshold defined by the line that has slope . All points that lie in the southeast quadrant always fall below the threshold line and will lead to the conclusion that the treatment is preferred. Likewise, points in the northwest quadrant will lead to the rejection of treatment. However, points in the northeast and southwest quadrants will lead to a different conclusion depending on the relationship to the threshold. In the figure, hypothetical points below the threshold (INB1 and INB2 ) represent cost-effective treatments, whereas points above (INB3 and INB4 ) do not. The cost-effectiveness acceptability curve (CEAC) is an approach to evaluate the uncertainty of the INB [3]. In a Bayesian framework, the CEAC represents the probability that the treatment is costeffective given as a function of WTP: P r.E  C > 0/. In terms of the cost-effectiveness plane, this is the probability that the point .C ; E / is below the WTP threshold defined by . The CEAC can be estimated with Bayesian methods that assume the parameters for the means (E and C ) are random variables and uses assumptions about its multivariate distribution. Alternatively, the CEAC can be estimated nonparametrically using bootstrap methods. Although the bootstrap estimation method is frequentist by nature, the CEAC estimated from a bootstrap has been loosely interpreted in a Bayesian framework with a noninformative prior [4]. Figure 2 shows a hypothetical CEAC. In this example, the curve shows the proportion of bootstrap samples for which the INB is positive at varying levels of . For example, at a level of $25,000, the INB estimates of 60% of the bootstrap samples were positive. We would not conclude that the treatment under study is cost-effective when the WTP is $25,000. However, at a WTP at or greater than $40,000 per year, 90% or more of the bootstrap samples result in a positive INB, suggesting that the treatment is more likely to be cost-effective.

3. Estimating costs and quality-adjusted survival under censoring

1224

Estimating both average costs and QAS using censored data is particularly challenging because the censored outcome (either costs or QAS) might not be independent of the actual outcome, even in cases where Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1222–1241

K. S. GOLDFELD

censored survival times and actual survival times are independent. Given survival time T and censoring time C , we observe only time T obs , where T obs D mi n.T; C /. The time dependent outcome M (e.g., cumulative lifetime cost or QAS) is determined by a nondecreasing, cumulative function M.t /, where t represents time and M D M.T /. M obs is the observed accumulated outcome, where M obs D M.T obs /. If M.t / is a stochastic process (as opposed to a deterministic one)—a reasonable assumption given the wide variation in health care costs—then M.T / and M.C / will be induced to be dependent. 3.1. Induced dependency If the accumulation rate R varies across individuals and M.t / D Rt , then M.T / and M.C / will be dependent. In this simplified case, accumulation until death is RT and accumulation until censoring is RC . Even when T , C , and R are all independent, RT and RC will not be independent, as long as Var.R/ > 0, E.T / > 0 and E.C / > 0, because C ov.RT; RC / D E.T /E.C /Var.R/ > 0: The problem with dependent censoring is discussed in detail elsewhere [5]. In the case of traditional survival using   the Kaplan–Meier method, probability of survival until time t , S.t /, is estimated Q analysis as i Wsi Tik and can be estimated using Kaplan–Meier

K. S. GOLDFELD

methods where the event of interest is censoring (rather than death). Bang and Tsiatis show that MO psw is a consistent estimator for EŒM  and that the estimator is asymptotically normal [9]. 3.2.2. Multiple period regression estimate. The simple partitioned weighted estimator was extended to accommodate covariates [10]. The linear regression model is Mi D ˇ 0 Zi C i ; where ˇ is a p  1 vector of unknown regression parameters, Zi is a p  1 vector of covariates, and i is a zero-mean error term with no specified distribution. The first element of Zi is 1, and the second element represents the treatment indicator. In this case, ˇO0 is the estimate of the intercept, and ˇO1 is the estimate of the treatment effect on the outcome. With this approach, ˇO is the sum of all ˇO k ’s, estimated for each of the k intervals : ( ˇOk D

n X i D1

ık  i k  Zi Zi0 GO Ti

)1

n X i D1

ık  i k  Mik Zi ; GO Ti

  where ıik is the indicator function and GO Tik is the probability of observation used in the simple parO is the sum of the parameter estimates titioned weighted estimator. The unbiased parameter estimate, ˇ, PK O across all periods, kD1 ˇk .

4. Estimating costs and quality-adjusted survival using observational data The randomized trial is the preferred approach to evaluating treatment efficacy, effectiveness, and even cost-effectiveness, because the process of randomization naturally balances characteristics that might be directly associated with the outcome. This allows the researcher to isolate the differences in outcome due to the treatment. However, there are many situations where randomization is not possible, and an observational study design is necessary [14, 15]. A particular analytic challenge in an observational trial is that one or more confounding factors might be both related to the treatment decision and the outcome. A number of analytic frameworks can facilitate the analysis of causal relationships, including structural equation models, graphical models, and potential outcome models [16]. The methods described here are drawn from the potential outcome framework. 4.1. Potential outcomes

1226

Using formal notation of the potential outcomes or counterfactual framework, the observed single outcome for subject i is Mi and the potential outcome under treatment a is Mia (discussion from Hernán [17]). For a study comparing treatment versus control, a 2 .0; 1/, the potential outcomes for a subject are Mi0 and Mi1 , resulting from control and treatment, respectively. Although every subject has a potential outcome Mi0 and Mi1 , only one potential outcome is observed; the particular outcome that is observed depends on the treatment level the subject is assigned to or chooses. The individual causal effect is defined as Mi1  Mi0 . We cannot observe both outcomes for each individual, which is the fundamental problem of causal inference. The population causal effect is defined as the average individual causal effect measured across an entire population: E.M 1  M 0 /. If there is a causal effect, then E.M 1  M 0 / ¤ 0. Although the individual causal effect cannot be computed, it is possible under certain circumstances to estimate the population causal effect by estimating E.M 1  M 0 / D E.M 1 /  E.M 0 /. In the course of a study, we observe M 1 only for those who are treated and M 0 only for those who are not. The observed population average outcome is simply E.M /. We can say that treatment and outcome are associated if the average outcome given treatment does not equal the average outcome for those not treated: E.M jA D 1/ ¤ E.M jA D 0/, where A is the indicator for treatment. The distinction between association and causation is that association is based on averages of outcomes from two different groups (treated versus untreated), whereas causation is based on averages of potential outcomes drawn from a single group or population. More important, association can be estimated directly from all observed data, but causation cannot be. Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1222–1241

K. S. GOLDFELD

4.2. Randomization and causation In order to make a connection between association and causation in a potential outcomes framework, we need to make three key assumptions. (i) Exchangeability or ignorability requires that the treatment decision be independent of the potential outcomes. Put another way, the treatment exposure does not predict the potential outcome or vice versa: M a ? ? A, M a is independent of A. (ii) Consistency requires that the observed mean for an individual equals the potential outcome of treatment for that individual: .M jA D 1/ D M 1 and .M jA D 0/ D M 0 . (iii) Positivity requires that the probability of treatment is not 0 or 1: 0 < P .A D a/ < 1. Taken together, these assumptions satisfy the requirements of identifiability, whereby the estimated effect from the observed data can be compatible with only one value of the causal effect measure. Under these conditions, and without any other sources of bias, association is causation, or E.M jA D 1/  E.M jA D 0/ D E.M 1 /  E.M 0 / D E.M 1  M 0 /. Under randomization, treatment assignment is by definition conducted independently of all observed and unobserved subject characteristic, including their potential outcomes. So, exchangeability holds: M a ? ? A. (It is important to note that observed outcome M D M 1 I.A D 1/ C M 0 I.A D 0/ is not independent of A). Under randomization, 0 < P .A D a/ < 1, and positivity holds. And if we further assume consistency under randomization, association is causation. 4.3. Observational data and causality In an observational study, the treatment decision may not be independent of baseline covariates, measured or unmeasured. If one or more of those characteristics are associated with the potential outcomes, treatment choice may also be associated with the potential outcome. In that case, where there is confounding, E.M a jA D a/ ¤ E.M a /. As a result, it is no longer the case that association is causation. There may in fact be association between treatment and observed outcome but that association may be partially due to common causes of treatment and outcome rather than due to treatment alone. A straightforward comparison between observed means will not necessarily provide the population causal effect. However, for subjects sharing baseline characteristics, it might be impossible to predict who will choose the treatment or not. In other words, conditional on baseline characteristics, treatment assignment may be random, and most importantly, independent of potential outcomes. If we are able to identify and control for all the relevant baseline covariates, then it is possible to proceed with the assumption of independence. If it is possible to make an assumption of no unmeasured confounders, it is also possible to say that at each level of measured covariate L, treatment assignment is essentially random and independent of the potential outcomes [18]:  0  M ;M1 ? ? AjL Under consistency, E.M jA D a; L/ D E.M a jA D a; L/, and under the (possibly unrealistic) assumption of no unmeasured covariates or confounding, E.M a jA D a; L/ D E.M a jL/. From these assumptions, it follows that E.M jA D a; L/ D E.M a jL/. Under these conditions, it is possible to estimate the average of the potential outcomes at a particular level of measured covariate L using observed data. Furthermore, given the positivity assumption 0 < P .A D ajL/ < 1, it is possible to estimate the sample average by averaging across all levels of L: E.M a / D EfE.M a jL/g. An important assumption for unbiased estimation of causal treatment effects is that all confounders have been measured and included in the model. This may be a difficult assumption to make and is impossible to evaluate empirically [19]. Given the difficulty of making this assumption, it is extremely important to identify the sensitivity of the conclusion to relaxing the assumptions of unmeasured confounders. Any method that is applied needs to include a compelling sensitivity analysis. 4.4. Adjustment for confounding

Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1222–1241

1227

A number of approaches can be used to adjust for confounding with the aim of making causal inference, including regression, various propensity score (PS) methods, instrumental variable methods, and more recently, marginal structural models (MSM). The methods proposed here are based on the MSM, which were proposed by Robins in 1998 [20].

K. S. GOLDFELD

A simple MSM is E.M a / D ˇ0 C ˇ1 a where a 2 .0; 1/, M 1 and M 0 are the potential outcomes with and without treatment, respectively, and ˇ1 can be interpreted as the causal effect of treatment on the outcome. The models are marginal models, as they model the marginal distribution of the potential outcome random variables M 1 and M 0 . They are structural models, because they model potential outcomes; in the literature, models for potential outcomes are often called structural. It is important to note that these models do not include any covariates, as they are models for ‘causal effects on the entire source population; they are not models for observed associations [20]’. The two parameters of the model can be consistently estimated using inverse probability-of-treatment weighted (IPTW) estimators. IPTW works by creating pseudo-populations that have two important properties. First, the treatment effect is no longer confounded by the covariates used in the estimation of the IPTW. Assuming that the IPTW model has been correctly specified, all measured confounding is removed from the analysis. Second, the theoretical mean of the potential outcomes is the same for the pseudo-population and the original population, which means that the causal effect is the same for the original population and the pseudo-population. The MSM approach has advantages over other methods. Although consistent parameter estimation requires that the IPTW be correctly specified (i.e., include all confounders), there are no specific assumptions necessary about how the IPTW is to be estimated. Instrumental variable methods, which have been developed explicitly to address unmeasured confounding [21], require that the instrument has an effect on the outcome only through its effect on the treatment decision—an assumption that cannot always be easily justified with the data at hand. A potential limitation of the MSM is that it is strictly marginal. However, this may reflect a strength for CEA. Traditional approaches, such as regression that model observed outcomes (rather than potential outcomes) and that control for confounding, produce conditional outcomes. The conditional analysis is useful at a clinical and etiologic level, where it is important to understand the different effects of treatment, both beneficial and harmful, for patients with different characteristics. However, when the goal of the research is to inform regulatory or policy standards pertaining to an entire population, the marginal approach may be more relevant [22]. The aim of CEA is to understand the aggregate causal effect of treatment on cost and QAS and the overall population impact of a treatment. This research goal is a broad societal one, so the MSM is appropriate. Finally, MSM enables sensitivity analysis to assess the possible influence of unmeasured confounding [23]. Sensitivity analysis of models based on ordinary least squares regression and pair matching based on PS methods is possible [24–28]. However, ordinary least squares regression and matched-pair designs cannot easily be applied to CEA using censored observations and small sample sizes. 4.5. Weighting by inverse probability of treatment In MSM, we can estimate the counterfactual mean E.M a / under the assumption of no unmeasured confounders (i.e., M 0 ; M 1 ? ? AjL) using inverse probability of treatment weights: n

1X MO a D I.Ai D a/Mi Wi ; n i D1

1 where Wi D P .Ai DajL . Robins suggests a modified weight that results in a consistent but more stable i/ estimator [20]:

Wi D

P .Ai D a/ : P .Ai D ajLi /

In particular, this estimated weight is more stable when the conditional probability of treatment given confounders is close to zero [29].

5. Twice-weighted interval estimates for censored observational data

1228

To analyze censored observational data, we can use a partitioned MSM framework with inverse probability of treatment and inverse probability of observation weights. In this twice-weighted multiple Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1222–1241

K. S. GOLDFELD

interval estimation, we consider a potential outcome Mia partitioned into Miak for each period k, where k D 1; : : : ; K and each treatment a 2 .0; 1/. Each individual i has two potential outcomes for each interval k: Mi1k and Mi0k . The observed outcome (either cost or QAS) for each individual in interval k is Mik . The average potential outcome EŒM a  is a

EŒM  D

K X

h i E M ak :

kD1

The average potential outcome for treatment a in period k, EŒM ak , is estimated by MO ak from observed data as Pn ıik I.Ai Da/ Mik i D1 PO .A DajL / G O .T k / i i ak i O M D P ; ıik n I.Ai Da/ i D1 PO .A DajL / G O .T k / i i i   where ıik , Tik , and GO Tik are as described earlier. Because P .Ai jL/ is unknown, it must be estimated, which is often using logistic regression. MO ak can be used to estimate both average costs and average QAS. 5.1. Consistent estimator The following sketches a proof that shows the consistency of the estimator MO ak . The required assumptions include (i) identifiability, (ii) correct specification of the probability of treatment given confounders represented by L, and (iii) independent censoring (i.e., probability of censoring is independent of treatment and  potential outcomes). The three assumptions of identifiability  were described earlier: exchange ability Mi0k ; Mi1k ? ? Ai jLi , consistency, Miak D Mik jA D a , and positivity (0 < P .A D ajL/ < 1). Dividing the numerator and denominator of each MO ak by the sample size n yields n1

Pn

MO ak D n1

k

ıi I.Ai Da/ Mki i D1 PO .A DajL / G O .T k / i i i

Pn

k

ıi I.Ai Da/ i D1 PO .A DajL / G O Tik / . i i

:

Given the assumption of consistency, it is possible to replace the observed outcome Mik with the potential outcome Miak . On the basis of the law of large numbers, the modified numerator converges in probability: ! n X ıik I.A D a/ ık I.Ai D a/ 1 ak P ak  M n :   Mi ! E O .Ai D ajLi / GO T k P .A D ajL/ G Tk P i i D1

D EŒ1: Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1222–1241

1229

When there is administrative censoring (e.g., censoring resulting from the end of a study in calendar time), the probability of censoring is independent of the cost or QAS outcome measures and the treatment decision. In this case, ! !   I.A D a/ ık ık I.A D a/ ak ak   : E E M DE M P .A D ajL/ G.T k / P .A D ajL/ G Tk   ık But, E G T k D 1 when averaged over T k : . / ˇ !# ! " ık ı k ˇˇ k   DE E  ˇ T E G Tk G Tk ˇ "  !#  P C >Tk   DE E G Tk

K. S. GOLDFELD

As a result, the numerator reduces to E

!   I.A D a/ ık I.A D a/ ak ak  M : D E M P .A D ajL/ G T k P .A D ajL/

Taking the average of the reduced numerator across potential outcomes and confounders, and using the assumption that treatment is independent of potential outcomes (i.e., assumption of no unmeasured confounders), the expectation is simplified: ˇ      ˇ I.A D a/ I.A D a/ ak ak ˇ ak E DE E M M ˇ L; M P .A D ajL/ P .A D ajL/ 

 I.A D a/ ˇˇ  ak ak ˇL D E E M jL; M E P .A D ajL/ ˇ ˇ    I.A D a/ ˇˇ ak L : DE M E P .A D ajL/ ˇ Because  E

then E

ˇ    I.A D a/ ˇˇ P .A D ajL/ L DE P .A D ajL/ ˇ P .A D ajL/ D 1;

!

I.A D a/ ık ak ak  M D E M : P .A D ajL/ G T k

By similar argument the denominator converges in probability to 1, and as a result, !

k I.A D a/ ı P   M ak D E M ak ; MO ak ! E P .A D ajL/ G T k confirming that the twice-weighted estimator is consistent for the period specific potential outcome. A more stable (less variable) estimator uses the stabilized weight suggested by Robins. This stabilized interval estimator MO sak is a slight modification of MO ak presented earlier: Pn

i D1

MO sak D

I.Ai D a/

Pn

ıik PO .Ai Da/ Mik O .T k / PO .Ai DajLi / G i

i D1 I.Ai D a/

ıik PO .Ai Da/ O .T k / PO .Ai DajLi / G i

:

5.2. Estimating the incremental net benefit The true INB is defined in terms of the potential outcomes for cost and QAS as INB D E Q1  Q0  E D 1  D 0         D  E Q1  E Q0  E D 1  E D 0 ;  1 0 where  is the WTP, Q ; Q are the potential QAS times with and without treatment, respectively,  1 0 and D ; D are the potential costs with and without treatment, respectively. The potential outcomes can be partitioned so that a

E ŒQ  D

K X kD1

ak

EŒQ 

a

and EŒD  D

K X

EŒD ak :

kD1

1230

The estimate of the INB is calculated using the estimates of cost and QAS:



O D  QO s1  QO s0  DO s1  DO s0 ; INB Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1222–1241

K. S. GOLDFELD

where the estimators of cost and QAS are based on the partitioned estimators QO sa D

K X

QO sak

DO sa D

and

kD1

K X

DO sak ;

kD1

and the partitioned estimates are the twice-weighted interval estimates: Pn

i D1

QO sak D

I.Ai D a/

Pn

i D1

Pn DO sak D

i D1

ıik PO .Ai Da/ Qk O .T k / i PO .Ai DajLi / G i

I.Ai D a/

I.Ai D a/

Pn

ıik PO .Ai Da/ O .T k / PO .Ai DajLi / G i

ıik PO .Ai Da/ Dk O .T k / i PO .Ai DajLi / G i

i D1 I.Ai D a/

ıik PO .Ai Da/ O O P .Ai DajLi / G .Tik /

:

O estimator can be estimated using bootstrap estimation methods [30]. The variance of the INB

6. Simulation results 6.1. Estimating difference in costs Using simulated data, we estimated the difference in mean costs between two groups using two methods: partitioned regression adjusting for confounders (PReg-C) and twice-weighted multiple interval MSM (2WMI-MSM) estimation (Table I). The average difference in mean aggregate costs between treatment

Table I. Simulation: estimation of difference in costs (sample size D 300). Method 1: PReg-C 40% censoring

Method 2: 2WMI-MSM

30% censoring

40% censoring

30% censoring

Bias

Mean SE

Cov Prob

Bias

Mean SE

Cov Prob

Bias

Mean SE

Cov Prob

Bias

Mean SE

Cov Prob

Gamma cost More confounding Less confounding

24.5 18.7

751 737

0.92 0.94

15:5 45:2

711 706

0.93 0.93

37.0 21.1

842 792

0.93 0.94

3:0 43:8

778 744

0.94 0.93

Uniform cost More confounding Less confounding

13.2 22.9

735 718

0.93 0.94

41.4 3.4

697 687

0.94 0.94

23.0 34.6

827 782

0.92 0.94

43.3 0:4

765 730

0.94 0.93

Gamma cost More confounding Less confounding

47.6 10.1

598 591

0.91 0.94

7.5 24:2

548 539

0.96 0.95

57.1 12.1

661 628

0.91 0.93

7.1 29:1

598 568

0.96 0.95

Uniform cost More confounding Less confounding

19.0 18.1

576 574

0.94 0.94

17:5 18.9

528 518

0.94 0.94

26.4 39.1

637 614

0.93 0.95

9:0 30.5

576 544

0.94 0.96

Exponential survival

Uniform survival

Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1222–1241

1231

True difference in costs between treatment and control was zero. The survival times did not differ between treatment and control. Total costs were a function of period costs and survival times. Different simulation scenarios included more and less censoring, uniform and exponential survival times, uniform or Gamma distributed period-specific costs, and varying levels of confounding determined by strength of covariate relationship with treatment selection. Mean is the mean estimate of difference between treatment and control. Cov Prob is the proportion of 95% confidence intervals that included 0. PReg-C, partitioned regression adjusting for confounders; 2WMI-MSM, twice-weighted multiple interval marginal structural models; SE, standard error.

K. S. GOLDFELD

and nontreatment groups, and average estimated standard error (SE) of the estimated difference and the coverage rate are shown for different scenarios. Each scenario was replicated 1000 times. Variances were estimated for 25% of these replications, and the mean SE was calculated. The estimated SE of the regression estimator was based on an analytically derived estimator [11]. The estimated SE of the twice-weighted estimator was based on a bootstrap method. In the simulated data, survival times were simulated using two different distributions: exponential with mean of 7 years and uniform with a range of 0 to 12 years. Treatment had no effect on survival times. In the simulation with exponential survival times, the study period was 15 years, and in the simulation with uniform survival times, the study period was 12 years. Censoring times were simulated using two different uniform distributions to represent more and less censoring: more censoring (40%) was based on censoring times ranging from 0 to 15 years; less censoring (30%) was based on times ranging from 0 to 20 years. For each individual, costs were generated for each year until death—but only costs incurred during the study period were included in the analysis. The costs were generated from four different components: an initial diagnostic cost (U.0; 1000/), an ongoing annual expenditure, a period cost associated with factors X1 and X2 (an additional $500 if X1 D 1 and an additional $50 per unit of X2 ), and an added cost associated with dying in the last period if the individual died (U.0; 3000/). The annual expenditure was generated in two ways: a uniform distribution with a range of $0 to $1000 and a Gamma distribution with average $500 and variance $500. For a particular individual, 1-year’s annual expenditures were independent of other years. The total cost for each period was the sum of each of these components, and an individual’s aggregate (fully observed) costs were the sum of the total costs from each observed period. All simulations were based on average aggregate costs of $8500 for each treatment group. As a result, in this simulation, treatment had no effect on costs. Although this may not be realistic for cases involving high cost treatment, it is plausible in a case where ‘treatment’ is a protocol that may or may not result in higher overall costs. Treatment assignment was a function of two factors, X1 and X2 . The first confounder (X1 ) was binomial with probability 0.5. The second confounder (X2 ) was uniform (0,10). Treatment assignment was related to the two confounders in two scenarios, one under an assumption of ‘more confounding’ and the other under an assumption of ‘less confounding’. The probability of assignment to treatment (A) depended on whether there was more or less confounding: e 0:6CX1 C0:10X2 1 C e 0:6CX1 C0:10X2 e 0:2C0:5X1 C:05X2 P .Aless / D 1 C e 0:2C0:5X1 C:05X2

P .Amore / D

The estimation of difference in costs was conducted using a logistic estimation of the probability of treatment given the two covariates. The results suggest that both estimation methods are unbiased, though the 2WMI-MSM estimator showed more variance than the partitioned regression adjusted for PS (PReg-PS). Across all simulation scenarios, the 2WMI-MSM estimator had higher average SEs. The following scenarios led to higher SEs in the estimates: more censoring, more skew in distributions of cost and survival, and more confounding. 6.2. Incremental net benefit

1232

A second simulation evaluated the twice-weighted MSM estimates of the INB under a variety of scenarios. The effects of sample size, censoring, and interval length, and different modeling approaches (2WMI-MSM, PReg-C, and PReg-PS) on the estimates were explored. The simulation generated survival times, period-specific costs, and period-specific quality of life measures, which were all dependent on three binary ‘health related’ confounders that also had effects on the probability of treatment. The survival times were generated as a function of treatment and the presence of any of the three confounders. Survival time under treatment with all confounders absent was exponentially distributed with mean 24 months and was exponentially distributed with mean 20 months in the presence of at least one confounder. Survival time for those not treated averaged 3 months less than those treated. The base period costs were Gamma distributed with mean and variance $250 per month. In this simulation, where treatment was associated with longer survival, treatment was more costly on a monthly basis and resulted in lower quality of life. The confounders were associated with higher costs and lower Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1222–1241

K. S. GOLDFELD

quality of life. The additional monthly cost due to treatment averaged $870 per month in the absence of all confounders and was $1650 per month in the presence of all confounders. The base period quality of life outcomes was uniformly distributed between 0:01 and 0.10. The reduction in quality of life due to treatment averaged 0.06 units per month in the absence of all confounders and was reduced by an average of 0.09 units per month in the presence of all confounders. ‘True’ INBs over a 24-month observation period were calculated using 50,000 individuals with treatment and 50,000 without treatment, all without censoring, using WTP levels of $75,000 and $125,000. The results for  D $75; 000 and  D $125; 000 were $3,142 and $5,937, respectively. The different scenarios varied by sample size (n D 150, 300, and 600), censoring (none, 20%, and 40%), interval lengths (2 years, 1 year, and 1 month), and estimation method (2WMI-MSM, PReg-PS, and PReg-C). Defaults for sample size (n D 300), censoring (20%), interval length (1 year), and estimation method (2WMI-MSM) were used. Five thousand replications were used for each scenario. The 2WMI-MSM estimates yielded relatively unbiased results across a variety of scenarios (Table II). The average of the estimates from the 5000 replications was less than 10% from the true INB. The standard deviation (SD) of the estimates increased as sample size declined, as percentage of censoring increased, and as the number of intervals decreased. The PReg-PS estimates appeared to have slightly more bias than the twice-weighted MSM estimates, and the regression estimates appeared to have the most bias.

7. Sensitivity analysis Treatment assignment in observational studies will almost certainly not be random, and the factors associated with treatment choice may also directly affect the outcome. As described, MSM is one approach that provides an unbiased estimate of the causal effect of treatment, assuming that we can identify and measure all of the confounders. However, it is impossible to test the validity of this assumption of no unmeasured confounders, so it is essential to explore the sensitivity of the findings to various levels of confounding and to evaluate the robustness of the conclusions. Robins presented a formal approach to conducting these sensitivity tests [18, 19], which was described and applied by Brumback et al. [23]. We extend these methods so that we can explore the sensitivity of the cost-effectiveness results to unmeasured confounding. 7.1. Quantifying unmeasured confounding Unmeasured confounding can be quantified in a number of ways. Working with nonnegative values (i.e., costs and QAS), we have chosen the following approach: 

 EŒM a jA D a; L D l q.a; l/ D ln ; EŒM a jA D 1  a; L D l where M a is the potential outcome under treatment a; a 2 .0; 1/, L represents a level of measured covariates, and q.a; l/ is a hypothetical quantification of unmeasured confounding. Under no unmeasured confounding, the expected potential outcome of treatment for those treated is the same as the potential outcome of treatment for those not treated and the value of q.a; l/ is 0. However, when the potential outcome for those treated is greater on average than those untreated (e.g., before the treatment decision, the treated are healthier), the value of q.a; l/ is greater than 0. By using algebraic manipulation, probability theory, and assumptions of consistency, it is possible to show that the mean potential outcome M a given the covariates is equal to a function of the observed outcomes, the probability of treatment and the function q.:/. The relationship is EŒM a jL D l D EŒM jA D a; L D l ŒP .A D ajL D l/ C exp.q.a; l// .1  P .A D ajL D l// :

MR D M ŒP .A D ajL D l/ C exp.q.a; l// .1  P .A D ajL D l// : Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1222–1241

1233

Although the function q.a; l/ is unknown, we can adjust the observed outcomes M in a sensitivity analysis by estimating the probability of treatment P .A D ajL D l/ across a range of values for the function q.a; l/ using

K. S. GOLDFELD

Table II. Simulation results: incremental net benefit.  (WTP)

Bias

SD

% bias

% INB > 0

$75,000 $125,000

$109 $141

$5430 $9880

3.5 2.4

27 72

300

$75,000 $125,000

$247 $385

$3670 $6694

7.9 6.5

18 80

600

$75,000 $125,000

$283 $440

$2563 $4687

9.0 7.4

9 88

$75,000 $125,000

$161 $223

$4384 $8011

5.1 3.7

22 76

20%

$75,000 $125,000

$247 $385

$3670 $6694

7.9 6.5

18 80

None

$75,000 $125,000

$223 $331

$3382 $6166

7.1 5.6

16 82

Interval length (24 months total) 2 years $75,000 $125,000

$252 $378

$3745 $6840

8.0 6.4

18 79

Sample size 150

Censoring 40%

1 year

$75,000 $125,000

$247 $385

$3670 $6694

7.9 6.5

18 80

1 month

$75,000 $125,000

$281 $437

$3552 $6479

8.9 7.4

17 80

$75,000 $125,000

$247 $385

$3670 $6694

7.9 6.5

18 80

PReg-PS

$75,000 $125,000

$288 $634

$3414 $6267

9.2 10.7

16 80

PReg-C

$75,000 $125,000

$558 $1093

$3366 $6190

17.8 18.4

14 78

Estimation method 2WMI-MSM

The true INBs at  D $75; 000 and  D $125; 000 were $3; 142 and $5,937, respectively. The estimates were based on 5000 replications for each scenario. Where not specified, defaults were used for sample size (n D 300), censoring O estimated (20%), intervals (2), and estimation method (2WMI-MSM). SD represents the standard deviation of INB from each replication. INB, incremental net benefit; WTP, willingness-to-pay. 2WMI-MSM, twice-weighted multiple interval marginal structural models; PReg-PS, partitioned regression adjusted for propensity score; PReg-C, partitioned regression adjusting for confounders.

In the case that q.a; l/ D 0, which implies no unmeasured confounding, MR D M . It can be shown that both EŒMR jA D 1; L D l D EŒM 1 jL D l and EŒMR jA D 0; L D l D EŒM 0 jL D l:

1234

At a particular level of the measured covariates, the expected value of the adjusted observed outcomes for treatment is the expected value of the potential outcome under treatment for the entire study population, treated and untreated. The sensitivity analysis can be extended to the case where outcomes are estimated at intervals during the study with little modification: MR k D M k ŒP .A D ajL D l/ C exp.q.a; l// .1  P .A D ajL D l// :

Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1222–1241

K. S. GOLDFELD

7.2. Functional forms of q.a; l/ The function q.a; l/ cannot be observed, so the sensitivity analysis is conducted by using a range of assumptions about q and assessing how much bias is permitted before the conclusions of the study are changed. This function q.a; l/ can take a number of functional forms. At this point, we consider three that are independent of the covariates l: q .1/ .a; l/ D ˛.2a  1/ q .2/ .a; l/ D ˛ q .3/ .a; l/ D ˛a; where a 2 .0; 1/. In each of these functions, ˛ represents a scale parameter, which quantifies the direction and amount of unmeasured confounding. In the case of q .1/ , the causal effects (defined in terms of ratios) for those treated and not treated are the same. That is, EŒM 1 jA D 1; L D l EŒM 1 jA D 0; L D l D ; 0 EŒM jA D 1; L D l EŒM 0 jA D 0; L D l whereas in the cases of q .2/ and q .3/ , the causal effects for those treated and not treated are not the same. 7.3. Sensitivity analysis of cost-effectiveness analysis The sensitivity analysis can be applied in the context of CEA. Both the observed outcomes for cost and QAS can be adjusted as follows: DR k D D k ŒP .A D ajL D l/ C exp.q.a; l// .1  P .A D ajL D l// QR k D Qk ŒP .A D ajL D l/ C exp.q.a; l// .1  P .A D ajL D l// where D k and Qk are observed period costs and QAS, respectively. L represents measured confounders. DR k and QR k are calculated on the basis of the different functional forms of q.a; l/, as well as a range of ˛’s for each functional form. A CEA is conducted for each set of DR k and QR k , and the CEACs associated with the different sets are provided in a single plot to show the potential sensitivity of the analysis to confounding. Figure 3 shows sample plots based on two different simulated data sets. Panel A shows the plot of the cost and QAS effects generated by a bootstrap of a simulated data set. These bootstrap points provide the information required for the CEAC. The three plots on the right show the estimated CEAC with the sensitivity CEACs. Each plot represents a different functional form for q.a; l/, and each solid line represents a different value of ˛. In this first example, the treatment would be viewed as cost-effective at all levels of WTP over $50,000, and results would not be considered sensitive to unmeasured confounding. Panel B shows the results for a second simulated data set that would be considered cost-effective at a WTP of $70,000 or higher, because 90% of the bootstrap iterations resulted in a positive INB. However, under certain conditions the results are quite sensitive to unmeasured confounding, particularly when q.a; l/ takes the form of q .1/ . In this case, it may be inappropriate to conclude that there is sufficient evidence to suggest that treatment is cost-effective.

8. Choices, attitudes, and strategies for care of advanced dementia at the end-of-life results

Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1222–1241

1235

The CASCADE study was a prospective cohort study conducted between February 2003 and February 2009 that described the experience of 323 nursing home residents with advanced dementia and their families [1]. The detailed methods of this study are provided elsewhere [1,31]. Residents were recruited from 22 nursing homes with more than 60 beds and within 60 miles of Boston, Massachusetts. Assessments of each resident were conducted at baseline and quarterly for up to 18 months through chart reviews, nurse interviews, and clinical assessments. The two outcomes necessary for the CEA, expenditures and QAS, were estimated from the CASCADE data. Medicare health services used during the intervals between assessments were collected from residents’ charts. These services included hospital admissions, emergency department visits,

K. S. GOLDFELD

Figure 3. Sample sensitivity analyses of cost-effectiveness. Panel A shows the results for a simulated data set that would be considered cost-effective at a willingness-to-pay of $50,000 or higher, because 90% of the bootstrap iterations resulted in a positive incremental net benefit. Furthermore, at levels of willingness-to-pay over $50,000, the results are not sensitive to unmeasured confounding. Given this sensitivity analysis, it would be reasonable to conclude that treatment is cost-effective. Panel B shows the results for a simulated data set that would be considered cost-effective at a willingness-to-pay of $70,000 or higher, because 90% of the bootstrap iterations resulted in a positive incremental net benefit. However, at all levels of willingness-to-pay, the results are quite sensitive to unmeasured confounding. A conservative conclusion would be that there may not be enough evidence to suggest that treatment is cost-effective.

1236

primary care provider visits in the nursing home, and hospice enrollment. The costs based on this utilization were estimated in a subsequent research study as described elsewhere [32]. In addition, the CASCADE study did not include the collection of a preference-based quality of life measure. In order to facilitate the CEA, we developed a mapping from the two health status measures (Symptom Management at the End-of-Life in Dementia and Comfort Assessment in Dying with Dementia) to a preference based measure, the Health Utility Index Mark 2. This mapping is described in further detail elsewhere [33]. Hospitalization of nursing home patients can result in substantial health care expenditures. A recent analysis using the CASCADE study indicated that approximately 30% of all Medicare expenditures can be attributed to hospitalizations [32]. In general, nursing home patient hospitalizations can occur frequently and are associated with a wide range of resident, physician, and facility characteristics. These hospitalizations are often avoidable because appropriate treatments can be provided in the nursing home [34]. At least one study has shown that the do-not-hospitalize (DNH) order, an advance directive that is intended to prevent a transfer from a nursing home to the hospital in the case of illness, may reduce the odds of hospitalization by more than half [35]. Because not having a DNH may result in a higher likelihood of being transferred to a hospital in the case of an acute event, it is important to ask whether the DNH policy is cost-effective. Are the costs associated with gains in QAS equal to or less than what society is willing to pay? The CEA of the aggressive treatment strategy (not having a DNH) was conducted using the CASCADE study by comparing costs and QAS for residents with and without a DNH order. The challenges of conducting this CEA were two-fold. First, the study was observational so that the selection of a DNH order was not randomized. Second, we were interested in conducting a CEA over a 15-month period, and because of the DNH election process, some individuals were followed for only 9 or 12 months, resulting in missing data or censoring. The methods described here facilitated this CEA under these data constraints. A total of 268 participants in CASCADE were included in the DNH CEA, 144 (54%) without a DNH and 124 (46%) with a DNH. The unadjusted average cost over a 15-month period for complete observations was $10,576 for those without a DNH (n D 130) and $5187 for those with a DNH (n D 107). The Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1222–1241

K. S. GOLDFELD

unadjusted average QAS for complete observations was 56.8 quality-adjusted days for those without a DNH and 54.9 quality-adjusted days for those with a DNH. Detailed results of this study are provided elsewhere [36]. The probability of treatment values used in the 2WMI-MSM and PReg-PS models were both estimated using parameters from a logistic model where the log odds of treatment was modeled as a linear function of gender, race, and feeding tube status. These three factors were associated with cost, QAS, and DNH status. 8.1. Estimate of cost-effectiveness Estimates from the 2WMI-MSM and PReg-PS models indicate that forgoing a DNH order in favor of potentially more aggressive care does not appear to be cost-effective (Figure 4). The 2WMI-MSM estimated incremental increases in average expenditures of $5972 (SD $1,569) and incremental gains in QAS of 3.7 quality-adjusted days (SD 4.1) for those without a DNH. The PReg-PS model estimated an incremental increase in average expenditures of $5462 (SD $1722) and incremental gains in QAS of 2.3 quality-adjusted days (SD 3.9). The CEAC shows the proportion of bootstrap samples with a positive INB for WTP ranging from $25,000 to $300,000 per QALY (Figure 4). For both models, the proportion of positive INBs is below

Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1222–1241

1237

Figure 4. Cost-effectiveness analysis of not having a do-not-hospitalize order. The top panel shows the bootstrap plot and the cost-effectiveness acceptability curve, based on the twice-weighted multiple interval marginal structural models estimate. At values of willingness-to-pay near $100,000 per year, the likelihood of being costeffective is less than 5%. Even at the extreme levels of willingness-to-pay, the likelihood of cost-effectiveness does not exceed 20%. The bottom panel, which shows the results using the PS partitioned history estimate, essentially mirrors the top panel, except for some slight differences at extreme levels of willingness-to-pay.

K. S. GOLDFELD

Figure 5. Sensitivity of cost-effectiveness analysis (CEA) of not having a do-not-hospitalize (DNH) order to unmeasured confounding. Each plot shows the potential effect of confounding on the cost-effectiveness acceptability curves (CEACs). The estimated CEAC in each plot is the dashed curve. Plot A presents the hypothetical condition whereby average potential expenditures for the treatment group (residents forgoing a DNH) would have been 70% of those for the nontreatment group (residents with a DNH) under the condition that they both received the same treatment. Plot B assumes no unmeasured confounding related to expenditures. Plot C presents the hypothetical condition whereby average potential expenditures for the treatment group would have been 130% of those for the nontreatment group under the condition that they both received the same treatment. The plots indicate that the CEA results are quite sensitive to unmeasured confounding under certain conditions. Within each plot, the plotted curves are based on setting q.a; l/ associated with the quality-adjusted survival (QAS) outcome equal to 0:10, 0:22, 0:36, 0:51, and 0:69, respectively. These values of q.a; l/ are equivalent to setting the ratio of mean QAS for the non-DNH group to levels of 90%, 80%, 70%, 60%, and 50% of the mean QAS of the DNH group had they foregone the DNH. The curve closest to the estimated CEAC curve is based on q.a; l/ D 0:10.

20% for all levels of WTP up to $300,000. At more typical levels (less than $125,000 per year), less than 3% of the bootstrap samples show positive benefits. On the basis of these results, it appears that aggressive treatment (not having a DNH) is not cost-effective. 8.2. Sensitivity to unmeasured confounding

1238

Sensitivity analysis of the MSM suggests that not having a DNH order is not cost-effective at lower levels of WTP even assuming low to moderate levels of unmeasured confounding (Figure 5 plots A–C). Plot A presents the hypothetical condition whereby average potential expenditures for the treatment group (residents forgoing a DNH) would have been 70% of those for the nontreatment group (residents with a DNH) under the condition that they both received the same treatment. This could be interpreted as healthier, less costly patients forgo the DNH. Plot B assumes no unmeasured confounding related to expenditures. Plot C presents the hypothetical condition whereby average potential expenditures for the treatment group would have been 130% of those for the nontreatment group under the condition that they both received the same treatment. This could be interpreted as healthier, less costly patients select a DNH. Each plot includes a curve representing different levels of confounding related to QAS: we assumed that the average QAS for residents forgoing a DNH order ranged from 50% to 90% of the average QAS for residents with a DNH order. For example, 50% means that if the average QAS for the DNH group had been 10 quality-adjusted days had they forgone a DNH, the average QAS for the non-DNH group would have been five quality-adjusted days. At a WTP level of $75,000, the treatment approach was not cost-effective when the potential QAS for the non-DNH group was assumed to be between 50% and 90% of the potential QAS for the DNH group (< 50% of the INBs were positive for all levels of confounding related to cost). At a WTP of $100,000, the treatment approach was not cost-effective when the potential QAS for the non-DNH group was assumed to be between 80% and 90% of the potential QAS for the DNH group. When assuming potential expenditures for those without DNH orders would have been 70% of those with DNH orders (plot A) or expenditures between the two groups would have been the same (plot B), the plots show that forgoing DNH orders is not cost-effective at higher levels of WTP and assuming higher levels of unmeasured confounding with respect to QAS. Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1222–1241

K. S. GOLDFELD

9. Discussion

Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1222–1241

1239

Conducting a CEA using incomplete (censored), observational data can result in biased estimates if care is not taken, even if censoring is purely administrative. This article considers an approach using twiceweighted multiple interval estimation that can be used to maximize the use of the available data and addresses some concerns about confounding that often arise in the context of observational studies. The simulation studies show that twice-weighted multiple interval estimates of a MSM can provide relatively unbiased marginal estimates of cost-effectiveness when all confounders are measured. In addition, the MSM approach, unlike other approaches such as a PS model, facilitates a sensitivity analysis of the CEA. Methods that ignore the censored observations completely will result in estimators that are less efficient, or have higher variance, particularly with high levels of censoring. And other methods, such as Kaplan–Meier estimation techniques, that treat accumulating costs and QAS as a simple counting process can result in biased estimates of the average because it turns out that censored costs are not independent of true costs. The partitioned methods overcome these two shortcomings to provide an unbiased estimator with relatively low variance. The principal requirement for using a partitioned history method is that the data be available for distinct periods, and the more time periods, the more information that is available for censored observations. Other methods can facilitate causal inference on results drawn from observational data. Both PS methods and MSM methods were applied in conjunction with partitioned estimation weighted by the probability of observation. In the case of the CASCADE CEA, the MSM approach is preferable for two primary reasons. First, cost-effectiveness can be viewed as a population-wide question, where we are interested in the impact of a broad policy (e.g., a Medicare recommendation) on expenditures and benefits. This research question is well suited to a marginal analysis, and MSM explicitly provides a marginal estimate of cost-effectiveness. Second, the MSM provides a unique framework for a sensitivity analysis. In the context of PS models, sensitivity tests have been designed for matched samples [37] but not when the PS is used as a dependent variable in regression. The sample size in this application was too small to apply matching methods. One researcher extended Robins’ method to the use of PS weights, and it is similar to the methods used here [38]. The sensitivity analysis allows us to better understand the potential influence of unmeasured confounding on the shape of the CEAC. This is critical when we suspect or know that treatment decisions are based on factors that have not been captured in the data. In order to identify the causal effect from using the MSM, we needed to make three assumptions about identifiability: exchangeability, consistency, and positivity. The consistency assumption requires that the observed expenditures and QAS for an individual are the same as the potential outcomes for that individual under the treatment (or nontreatment) actually provided. Under different causal frameworks (e.g., a graphical framework), consistency may be explicitly embedded in the model (as opposed to an external assumption) [39], but in the potential-outcome framework, consistency is not explicit, and such an assumption is required [40, 41]. One possible violation of consistency may occur when an individual would respond differently to an intervention that they chose versus an intervention that was randomly assigned. It is conceivable that the cost-effectiveness of a DNH order will depend on how it is assigned. Further exploration of this assumption might be warranted in a future study. However, it may ultimately be impossible to assess given the ethical considerations associated with the random assignment of a DNH order. The positivity assumption requires that all treatments are possible for all members of the study population. This is generally not an issue for data sets with large sample sizes, but it is a real concern for small sample sizes. In fact, even in cases where there is a minute (not zero, but very close) or extremely large (not one, but very close) probability of treatment, then the weights used in the MSM can be quite unstable, and the estimates can be biased with high variance. Fortunately, it is possible to check the data to assess if this assumption has been met. It is impossible to verify empirically the lack of unmeasured confounders to assess exchangeability, because by definition this concerns data, which the study has not collected. One of the strengths of the MSM in analyzing cost-effectiveness using a CEAC is the ability to conduct sensitivity analyses. In the CASCADE analyses, we were able to conduct sensitivity analyses to assess the level of unmeasured confounding that would have altered our conclusions. All models will be sensitive to some level of confounding—the key is to use levels (i.e., q.a; l/ in the sensitivity test) that are realistic. The levels applied in the case of the CASCADE analysis were based on the notion that hospitalization expenditures do not vary more than 30% given that reimbursement is a case rate based on a discharge diagnosis. The

K. S. GOLDFELD

confounding levels for QAS were allowed to vary more. More work needs to be conducted in order to better understand conditions that would make conclusions more or less susceptible to unmeasured confounding. In addition to the requirements of identifiability, unbiased estimation of the causal effect requires that the probability of treatment be correctly modeled. For both MSM and PS methods, the correct model assumptions are difficult to verify. Parameter estimation must be performed with care, and goodness of fit tests must be conducted to ensure that the model is appropriate. However, the key to any observational data analysis is to ensure that the comparison groups are balanced across all measured confounders, and this is possible to verify from the data. The 2WMI-MSM shows great promise in estimating cost and QAS outcomes when the analysis is based on observational data. The method provides unbiased estimates, and with adequate sample sizes, the variance of the estimate may be acceptable. Most importantly, the method provides a useful framework for evaluating the sensitivity of the results to potentially unmeasured confounders, a real concern in any observational study.

Acknowledgements I thank the three anonymous reviewers and the associate editor for their constructive comments and suggestions, and Dr. Bruce Levin, Dr. Michele Shaffer, and Dr. Susan Mitchell for their valuable advice.

References

1240

1. Mitchell S, Kiely DK, Jones RN, Prigerson HG, Volicer L, Teno J. Advanced dementia research in the nursing home: the CASCADE study. Alzheimer Disease and Associated Disorders 2006; 20(3):166–175. 2. Stinnett A, Mullahy J. Net health benefits: a new framework for the analysis of uncertainty in cost-effectiveness analysis. Medical Decision Making: an International Journal of the Society for Medical Decision Making 1998; 18(2 Suppl):S68–80. 3. Fenwick E, O’Brien BJ, Briggs AH. Cost-effectiveness acceptability curves–facts, fallacies and frequently asked questions. Health Economics 2004; 13(5):405–415. DOI: 10.1002/hec.903. 4. Briggs AH. A Bayesian approach to stochastic cost-effectiveness analysis. Health Economics 1999; 8(3):257–261. 5. Etzioni R, Feuer EJ, Sullivan SD, Lin D, Hu C, Ramsey SD. On the use of survival analysis techniques to estimate medical care costs. Journal of Health Economics 1999; 18(3):365–380. 6. Zhao H, Tsiatis A. A consistent estimator for the distribution of quality adjusted survival time. Biometrika 1997; 84(2):339–348. 7. Zhao H, Tsiatis A. Estimating mean quality adjusted lifetime with censored data. Sankhya: The Indian Journal of Statistics, Series B 2000; 62(1):175–188. 8. Lin DY, Feuer EJ, Etzioni R, Wax Y. Estimating medical costs from incomplete follow-up data. Biometrics 1997; 53(2):419–434. 9. Bang H, Tsiatis A. Estimating medical costs with censored data. Biometrika 2000; 87(2):329–343. 10. Lin DY. Linear regression analysis of censored medical costs. Biostatistics 2000; 1(1):35–47. DOI: 10.1093/biostatistics/1.1.35. 11. Willan AR, Lin DY, Cook RJ, Chen EB. Using inverse-weighting in cost-effectiveness analysis with censored data. Statistical Methods in Medical Research 2002; 11(6):539–551. DOI: 10.1191/0962280202sm308ra. 12. Jiang H, Zhou XH. Bootstrap confidence intervals for medical costs with censored observations. Statistics in Medicine 2004; 23(21):3365–3376. DOI: 10.1002/sim.1556. 13. Zhao H, Bang H, Wang H, Pfeifer P. On the equivalence of some medical cost estimators with censored data. Statistics in Medicine 2007; 26(24):4520–4530. DOI: 10.1002/sim. 14. Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 1974; 66(5):688–701. DOI: 10.1037/h0037350. 15. Black N. Why we need observational studies to evaluate the effectiveness of health care. BMJ 1996; 312(7040): 1215–1218. 16. Pearl J. Causal inference in statistics: an overview. Statistics Surveys 2009; 3(3):96–146. DOI: 10.1214/09-SS057. 17. Hernán M. A definition of causal effect for epidemiological research. Journal of Epidemiology and Community Health 2004; 58(4):265–271. DOI: 10.1136/jech.2002.006361. 18. Robins J. Association, causation, and marginal structural models. Synthese 1999; 121(1):151–179. 19. Robins J, Rotnitzky A, Scharfstein DO. Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In Statistical Models in Epidemiology, the Environment and Clinical Trials, Halloran ME, Berry D (eds). Springer-Verlag: New York City, 1999; 1–92. 20. Robins J, Hernán M, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology 2000; 11(5):550–560. 21. Angrist J. Identification of causal effects using instrumental variables. Journal of the American Statistical Association 1996; 91(434):444–455.

Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1222–1241

K. S. GOLDFELD 22. Snowden JM, Rose S, Mortimer KM. Implementation of G-computation on a simulated data set: demonstration of a causal inference technique. American Journal of Epidemiology 2011; 173(7):731–738. DOI: 10.1093/aje/kwq472. 23. Brumback B, Hernán M, Haneuse SJPA, Robins J. Sensitivity analyses for unmeasured confounding assuming a marginal structural model for repeated measures. Statistics in Medicine 2004; 23(5):749–767. DOI: 10.1002/sim.1657. 24. Lin DY, Psaty BM, Kronmal RA. Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics 1998; 54(3):948–963. 25. Rosenbaum PR. Sensitivity analysis for matched observational studies with many ordered treatments. Scandinavian Journal of Statistics 1989; 16(3):227–236. 26. Rosenbaum PR, Krieger AM. Sensitivity of two-sample permutation inferences in observational studies. Journal of the American Statistical Association 1990; 85(410):493–498. 27. Rosenbaum PR. Sensitivity analysis for matched case-control studies. Biometrics 1991; 47(1):87–100. 28. Mitra N, Indurkhya A. A propensity score approach to estimating the cost-effectiveness of medical therapies from observational data. Health Economics 2005; 14(8):805–815. DOI: 10.1002/hec.987. 29. Robins J, Sued M, Lei-Gomez Q, Rotnitzky A. Comment: performance of double-robust estimators when “inverse probability” weights are highly variable. Statistical Science 2007; 22(4):544–559. DOI: 10.1214/07-STS227. 30. Efron B, Tibshirani R. Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statistical Science 1986; 1(1):54–75. DOI: 10.1214/ss/1177013815. 31. Mitchell S, Teno J, Kiely DK, Shaffer ML, Jones RN, Prigerson HG, Volicer L, Givens JL, Hamel MB. The clinical course of advanced dementia. New England Journal of Medicine 2009; 361(16):1529–1538. 32. Goldfeld KS, Stevenson DG, Hamel MB, Mitchell S. Medicare expenditures among nursing home residents with advanced dementia. Archives of Internal Medicine 2011; 171(9):824–830. DOI: 10.1001/archinternmed.2010.478. 33. Goldfeld KS, Hamel MB, Mitchell S. Mapping health status measures to a utility measure in a study of nursing home residents with advanced dementia. Medical Care 2012; 50(5):446–451. 34. Grabowski DC, Stewart KA, Broderick SM, Coots LA. Predictors of nursing home hospitalization: a review of the literature. Medical Care Research and Review 2008; 65(1):3–39. 35. Dobalian A. Nursing facility compliance with do-not-hospitalize orders. The Gerontologist 2004; 44(2):159–165. 36. Goldfeld KS, Hamel MB, Mitchell SL. The cost-effectiveness of the decision to hospitalize nursing home residents with advanced dementia. Journal of Pain and Symptom Management 2013. DOI: 10.1016/j.jpainsymman.2012.11.007. 37. Rosenbaum PR. Sensitivity analysis for certain permutation inferences in matched observational studies. Biometrika 1987; 74(1):13–26. DOI: 10.2307/2336017. 38. Li L, Shen C, Wu AC, Li X. Propensity score-based sensitivity analysis method for uncontrolled confounding. American Journal of Epidemiology 2011; 174(3):345–353. DOI: 10.1093/aje/kwr096. 39. Pearl J. On the consistency rule in causal inference: axiom, definition, assumption, or theorem? Epidemiology 2010; 21(6):872–875. DOI: 10.1097/EDE.0b013e3181f5d3fd. 40. VanderWeele TJ. Concerning the consistency assumption in causal inference. Epidemiology 2009; 20(6):880–883. DOI: 10.1097/EDE.0b013e3181bd5638. 41. Cole SR, Frangakis CE. The consistency statement in causal inference: a definition or an assumption? Epidemiology 2009; 20(1):3–5. DOI: 10.1097/EDE.0b013e31818ef366.

1241

Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2014, 33 1222–1241

Twice-weighted multiple interval estimation of a marginal structural model to analyze cost-effectiveness.

Cost-effectiveness analysis is an important tool that can be applied to the evaluation of a health treatment or policy. When the observed costs and ou...
1MB Sizes 0 Downloads 0 Views