Mediation Analysis With Matched Case-Control Study Designs.

American Journal of Epidemiology © The Author 2016. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: [email protected].

Vol. 183, No. 9

Research Letter MEDIATION ANALYSIS WITH MATCHED CASE-CONTROL STUDY DESIGNS The expressions are sometimes evaluated at the mean level of the covariates C. The assumption of normally distributed M is only necessary for the NDE and only if there is exposuremediator interaction so that θ3 ≠ 0, and even then this assumption can sometimes be relaxed (3). VanderWeele and Vansteelandt (1) noted that the approach to mediation analysis described above is applicable in unmatched case-control designs if the mediator model is fitted among the control subjects, since the control subjects constitute either a sample of the underlying population (under incidence density sampling) or, under a rare outcome, a close approximation of the underlying population (if controls are sampled from the noncases). In a matched case-control design, controls are matched to cases on a subset of the measured covariates C. The argument generally given for such matching is one of efficiency (7), and it is still possible to fit the outcome model for Y and to estimate (θ1, θ2, θ3) with such matched case-control data. However, with the mediator model in this setting, the matched controls no longer constitute a sample of the underlying population. One way around this would be to fit the mediator model to the data from the underlying sample of controls from which the matched controls were drawn, if such data were already available prior to the matching. Suppose, instead, that the mediator model is fitted to the matched controls. Under model 2 or model 3, it is still possible to obtain valid estimates of β2 for both the components of C that are matched to the cases and those that are not. Provided that, for the underlying population, the error term in the regression model (model 3) is normally distributed with constant variance σ2, conditional on the covariates, then it is also possible to use the error variance from the regression model (model 3), fitted among the matched controls, as an estimate of σ2. It is thus possible to estimate all of the parameters in the expressions for the direct and indirect effects above and thus to estimate natural direct and indirect effects conditional on the covariates C = c. This allows for the use of standard causal mediation analysis software (2), even with matched case-control designs. With the approach to matched case-control designs described above, the conditional direct and indirect effects could be reported at any specific covariate level or at several covariate levels. However, the practice of setting the covariates C = c to their average value among the controls in the direct and indirect effect expressions, in a matched case-control study, does need to be interpreted with some caution. This is because the average value of the covariates among the matched controls will not equal their average value for the population, because some of the covariates have been matched to the cases. One way around this would again be to fit the mediator model to the data from the underlying sample of controls from which the matched controls were drawn. Alternatively, if the mean

Prior work on mediation analysis has considered the estimation of direct and indirect effects using parametric models with a binary outcome (1–4) and has considered case-control designs within either a causal (1) or path-diagram (5) framework. Here we discuss the use of approaches for causal mediation analysis when applied to a matched case-control study design. We assume throughout that the outcome is rare; this assumption is needed for closed-form analytical formulas for the direct and indirect effects described below, and the assumption cannot be relaxed under incidence density sampling for the formulas below to remain valid. For cohort data, with a binary outcome Y and a binary mediator M, with exposure A, and with baseline covariates C, consider the following models: logitfPðY ¼ 1ja; m; cÞg ¼ θ0 þ θ1 a þ θ2 m þ θ3 am þ θ04 c: ðModel 1Þ logitfPðM ¼ 1ja; cÞg ¼ β0 þ β1 a þ

β02 c:

ðModel 2Þ

Under the assumptions that, on a causal diagram (6), conditional on the measured covariates C, there is 1) no exposureoutcome confounding, 2) no mediator-outcome confounding, and 3) no exposure-mediator confounding and 4) there is no mediator-outcome confounder affected by the exposure, the natural direct effect (NDE) and natural indirect effect (NIE) on an odds ratio (OR) scale are given approximately (2) by NDEOR ≅ expðθ1 aÞf1 þ expðθ2 þ θ3 a þ β0 þ β1 a þ β02 cÞg : expðθ1 a Þf1 þ expðθ2 þ θ3 a þ β0 þ β1 a þ β02 cÞg NIEOR ≅ f1 þ expðβ0 þ β1 a þ β02 cÞgf1 þ expðθ2 þ θ3 a þ β0 þ β1 a þ β02 cÞg : f1 þ expðβ0 þ β1 a þ β02 cÞgf1 þ expðθ2 þ θ3 a þ β0 þ β1 a þ β02 cÞg

For a binary outcome and a continuous mediator under models logitfPðY ¼ 1ja; m; cÞg ¼ θ0 þ θ1 a þ θ2 m þ θ3 am þ θ04 c E½Mja; c ¼ β0 þ β1 a þ β02 c;

ðModel 3Þ

where M is normally distributed conditional on A and C, with constant variance σ2, under assumptions 1–4, the natural direct and indirect effects, conditional on C = c, are given approximately (1) by logfNDEOR g ≅ fθ1 þ θ3 ðβ0 þ β1 a þ β02 c þ θ2 σ2 Þg ða a Þ þ 0:5θ32 σ2 ða2 a2 Þ logfNIEOR g ≅ ðθ2 β1 þ θ3 β1 aÞða a Þ:

869

Am J Epidemiol. 2016;183(9):869–870

870 Research Letter

values of the covariates for the controls were otherwise known, these could be used in the direct and indirect effect expressions above. Finally, if the mean values of the covariates among the matched controls were used, then the direct and indirect effect estimates could still be interpreted as conditional direct and indirect effects, conditional on the covariates taking the value of the mean level of the matched controls. If this were done, it would be good to report those mean covariate values along with the direct and indirect effect estimates themselves so that a reader could appropriately interpret the effects. ACKNOWLEDGMENTS The research was supported by National Institutes of Health grants ES017876 and AI104459. Conflict of interest: none declared. REFERENCES 1. VanderWeele TJ, Vansteelandt S. Odds ratios for mediation analysis for a dichotomous outcome. Am J Epidemiol. 2010; 172(12):1339–1348. 2. Valeri L, VanderWeele TJ. Mediation analysis allowing for exposure-mediator interactions and causal interpretation:

3. 4. 5.

6. 7.

theoretical assumptions and implementation with SAS and SPSS macros. Psychol Methods. 2013;18(2):137–150. Tchetgen Tchetgen EJ. A note on formulae for causal mediation analysis in an odds ratio context. Epidemiol Methods. 2014;2(1): 21–31. VanderWeele TJ. Explanation in Causal Inference: Methods for Mediation and Interaction. New York, NY: Oxford University Press; 2015. Wang J, Spitz MR, Amos CI, et al. Method for evaluating multiple mediators: mediating effects of smoking and COPD on the association between the CHRNA5-A3 variant and lung cancer risk. PLoS One. 2012;7(10):e47705. Pearl J. Causality. 2nd ed. New York, NY: Cambridge University Press; 2009. Rothman KJ, Greenland S, Lash TL, eds. Modern Epidemiology. 3rd ed. Philadelphia, PA: Lippincott Williams & Wilkins; 2008.

Tyler J. VanderWeele and Eric J. Tchetgen Tchetgen (e-mail: [email protected]) Departments of Epidemiology and Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA DOI: 10.1093/aje/kww038; Advance Access publication: April 13, 2016

Am J Epidemiol. 2016;183(9):869–870

BA Crossover Designs.

A simulation study on matched case-control designs in the perspective of causal diagrams.

Causal mediation analysis with multiple mediators.

[Modern study designs and analysis methods in clinical research].

Rare-variant association analysis: study designs and statistical tests.

Analysis of secondary outcomes in nested case-control study designs.

Power analysis for cross-sectional and longitudinal study designs.

A Comparison of the Cross-Sectional and Sequential Designs when Assessing Longitudinal Mediation.

Study designs for dependent happenings.

Functional Causal Mediation Analysis With an Application to Brain Connectivity.

Socioeconomic Associations with ADHD: Findings from a Mediation Analysis.

Introduction to mediation analysis with structural equation modeling.

Integrative genomics with mediation analysis in a survival context.

Study designs and systematic reviews of interventions: building evidence across study designs.

Trend analysis for repeated measures designs.

Matched Peptides: Tuning Matched Molecular Pair Analysis for Biopharmaceutical Applications.

On the analysis of phylogenetically paired designs.

G-computation demonstration in causal mediation analysis.

Direction of effects in mediation analysis.

Mediation Analysis for Health Disparities Research.

Terminating observation within matched pairs of subjects in a matched cohort analysis: a Monte Carlo simulation study.

Matched longitudinal analysis of biomarkers associated with survival.

Mediation analysis demonstrates that trans-eQTLs are often explained by cis-mediation: a genome-wide analysis among 1,800 South Asians.

Bayesian sensitivity analysis for unmeasured confounding in causal mediation analysis.