Special Issue Paper Received 30 November 2012,

Accepted 30 September 2013

Published online 21 October 2013 in Wiley Online Library

(wileyonlinelibrary.com) DOI: 10.1002/sim.6025

A unified model for estimating and testing familial aggregation Myeongjee Lee,a Paola Rebora,b Maria Grazia Valsecchi,b Kamila Czenea and Marie Reillya * † Investigations of familial aggregation of disease can provide important clues for genetic mechanisms, and many such studies have been published in the epidemiological literature using various statistical methods. We developed a unified model for familial risk by extending a Cox regression model to enable estimation of the detailed effects of kinship. By appropriate parameterisation of the model, we show how the risks to all specific firstdegree kinships can be estimated and formally compared using simple interaction terms and how the model can be extended to accommodate higher-degree relatives. The correlation due to observations from family members and from the potential for repeated observations is accommodated by a robust sandwich variance estimator or a bootstrap estimate. Hazard ratios for different kinships are formally compared using a robust Wald test. We illustrate the method with applications to studies of adult leukemia and non-Hodgkin’s lymphoma in the Swedish population and display our results on a pedigree diagram. Our estimates are consistent with published work that used simpler stratified methods, and our model enabled the detection of a number of statistically significant effects of kinship. The recognition of such kindred-specific disease risk could be a first step in the design of more informative genetic biomarker studies. Copyright © 2013 John Wiley & Sons, Ltd. Keywords:

kindred-specific risk; survival analysis; robust sandwich variance; bootstrap; leukemia; nonHodgkin’s lymphoma

1. Introduction Studies of familial aggregation are used in a wide range of diseases [1–4] and make an important contribution to several aspects of medical and health research. An observed higher risk of disease in relatives of patients than in relatives of unaffected controls provides a clue to genetic etiology that can be followed up with targeted linkage studies [5] or closer examination of putative biomarkers [6]. From a health services perspective, information from family studies can be used to offer screening and counseling to high-risk families [7, 8] or for a disease with a population screening program, more targeted screening [9] or genetic testing [10]. While many studies assume and estimate a common risk to all relatives of the same degree (e.g. all first-degree relatives), more informative studies investigate how the familial risk depends on sex of the patient and/or the relative [3] or the kinship [2, 4, 11], providing additional etiological insight and more refined information for counseling efforts or screening recommendations. Analysis of familial aggregation of disease traditionally used simple standardised incidence rates to compare the risk in relatives of affected and unaffected individuals, with stratified analyses to estimate effects in specific kinships [11]. While this approach is simple to implement, estimates are generally not adjusted for covariates other than sex and age categories. Such adjusted estimates can be obtained from a case-control design, where adjustment is achieved through matching or inclusion of covariates in the logistic regression model of the family history of disease in cases and matched controls [12]. However, the odds ratios from such retrospective models only offer a measure of the strength of familial aggregation, while the focus of many family studies is the prospective risk to relatives of patients. Thus, a popular approach is to conduct a prospective study to compare the affected and unaffected

a Department

of Medical Epidemiology and Biostatistics, Karolinska Institutet, 17177 Stockholm, Sweden of Biostatistics for Clinical Epidemiology, Department of Health Science, University of Milano-Bicocca, Monza, Italy *Correspondence to: Marie Reilly, Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Nobels väg 12A, Stockholm, Sweden. † E-mail: [email protected] b Center

Statist. Med. 2013, 32 5353–5365

5353

Copyright © 2013 John Wiley & Sons, Ltd.

M. LEE ET AL.

families in a well-defined cohort [13] or the relatives of case probands and relatives of a comparison group of (matched) unaffected probands [2]. Simple Cox regression is routinely used for analysis [14], although the standard error estimates are biased downward because of the correlation between family members and the possibility for an individual to appear in several data records, particularly for non-rare diseases that have high familial aggregation. A number of approaches have been used to model this correlation, including frailty models [15, 16] and Copula models [17, 18]. In frailty models, the exposure/familial effect is estimated as a conditional hazard ratio (HR) (depending on the variance estimate for the random effect), that is, the factor by which an individual’s hazard is increased if a member of his family is known to have failed rather than survived at a certain time. By assuming a gamma distribution for the frailty effect, this HR is constant and is thus compatible with the usual HR from a Cox model. Because family studies are often based on large national registers, case-cohort designs [19] and other sampling strategies [20] have been proposed to reduce the computational demands. However, these models do not readily permit the estimation of the risks for different kinships, except by simple stratified analyses. In 1991, Liang [21] used Clayton’s copula to provide a population model for familial risk, which is consistent with a Cox proportional hazards model for matched case and control families. However, it is based on the assumption that the increased hazard in relatives is the same for all family members and regardless of which family member is diagnosed. It also considers independent survival times within the cluster, but if the cluster is a family, then this assumption may not be fulfilled, given the difference between relationships such as parent–child and sibling–sibling. More recently, this method has been described in detail [22] and applied to studies of the relative risk of lymphoproliferative cancers [23–25] in parents, siblings and offspring of cases. In this paper, we extend this method to develop a unified model for familial risk, incorporating the detailed effects of all specific kinships and enabling formal comparisons of the risk to different relatives. We illustrate how careful construction of matched clusters and specification of a design matrix for relatives facilitates a standard survival analysis. Standard errors are estimated as in [22], adjusting for correlation between family members using a sandwich formula and further adjusting for matching using bootstrap estimates. We compare the risks for different kinships using robust Wald tests. We illustrate the method with applications to studies of familial aggregation of adult leukemia and non-Hodgkin’s lymphoma (NHL) in the Swedish population, displaying the risks to all relatives on a pedigree diagram.

2. Materials and methods 2.1. Sampling design

5354

We estimate the familial aggregation of disease by comparison of the incidence in relatives of affected individuals (case relatives) and relatives of non-affected individuals (control relatives). For each case of the disease of interest, we select a number of control individuals from the population who are free from the disease at the time of diagnosis of the corresponding case. The nested design adjusts for changing incidence over calendar time, and controls may be further matched for age, gender and various other potential confounders. We will refer to the cases and their matched controls as probands. Because it is the relatives of these probands that are analysed, only probands with at least one relative contribute to the analysis. Figure 1 provides an illustration of the sampling strategy, where d is a diseased individual whose five matched controls are denoted by C1d; C 2d ,    , C 5d . If we are to compare the risk of disease in first-degree relatives of case and control probands, then a; b; c; f; g are ‘exposed’ relatives and h; i; j; k; l;    are ‘unexposed’ relatives. Such exposed and unexposed relatives are identified for each case in the population and their set of matched controls. Where there are two cases in the same family, each will have their own (matched) controls, and we include both sets of relatives in the data to be analysed. Clearly, this will result in some individuals appearing more than once, and possibly in different roles, in the analysis data set. For example, if individual f in Figure 1 is also affected by the disease of interest, he will not only contribute as an exposed relative of d , but when considered as a proband, he will provide his exposed relatives .d; e; g/ and unexposed relatives of his matched controls (m; n; o; p; q,    ). To accommodate the dependence in the data, a cluster is defined, which consists of the combined set of all first-degree relatives of case probands who are in a family and all first-degree relatives of their matched control probands. Thus, all the family members depicted in Figure 1 constitute a single cluster if f and d are both cases. Where there is only one case in a family, the identifier (ID) of the case can Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2013, 32 5353–5365

M. LEE ET AL.

Figure 1. An example of a matching cluster with two potential probands (d,f, gray colored) in a case family. Squares and circles denote male and female, respectively.

be used to identify all members of a cluster, while for two or more cases in a family, the case IDs can be concatenated to assign a unique ID to all members of the same cluster. 2.2. Statistical methods Although we sampled controls using a nested and matched design, it is the relatives of the probands and not the probands themselves who constitute the analysis data set. These relatives are essentially a ‘matched cohort’ [26], where the matching helps ensure comparable sets of relatives in the two groups analysed. The familial aggregation of the disease of interest is estimated by the HR from unstratified Cox regression analysis of incident cancers in the relatives of cases and controls, using age as the time scale. Individuals born before the start of cancer registration are at risk from their age at register start-up and are censored at their age at the end of follow-up in the register, death or emigration, whichever occurs first. For a simple model that assumes that risk in relatives depends on exposure (whether one is a relative of a case or control proband) and other covariates, the instantaneous hazard ij for the j th record in the ith cluster can be written as ij .tij jXij ; Zij / D 0 .tij /exp.ˇXij C Zij /

(1)

Copyright © 2013 John Wiley & Sons, Ltd.

Statist. Med. 2013, 32 5353–5365

5355

where tij is the age at disease onset or censoring, 0 .tij / is the baseline hazard, Zij is the dichotomous indicator of whether the individual is exposed or unexposed and Xij is the row of the design matrix describing the other covariates, including type of first-degree relationship (parent, sibling, child), sex of the person at risk and sex of the proband. The variables defining the kinship are categorical and so can be represented by dummy indicators such as X1 and X2 in Table I, where the reference group (in this case, siblings) depends on the chosen parameterisation. P If we include the two-way interactions 2kD1 ık Xij k Zij , ı3 Xij 3 Zij or ı4 Xij 4 Zij in model (1), we have models that address common research objectives in studies of familial disease: estimation of risks to different kinships (parent, sibling, child) or the effects of sex of relative and sex of proband. In addition, tests of the (ı1 ; ı2 ), ı3 and ı4 parameters enable these risks to be formally compared with the reference group. More generally, all kinships can be described in detail by two-way and three-way products of .X1 ; X2 ; X3 ; X4 /, and inclusion in our model of appropriate interactions between these variables and exposure .Z/ is the key to the ability to formally test and compare the risk to different relatives.

5356

Case Control

Parents Siblings Children

Male Female

Male Female

(Proband–relative) Sister–sister Daughter–mother Mother–daughter Sister–brother Brother–sister Daughter–father Son–mother Mother–son Father–daughter Brother–brother Son–father Father–son

Exposure

Relationship

Sex of relative

Sex of proband

Specific family relation

1 0

Z

Copyright © 2013 John Wiley & Sons, Ltd.

0 1 0 0 0 1 1 0 0 0 1 0

1 0 0

X1

0 0 1 0 0 0 0 1 1 0 0 1

0 0 1

X2

0 0 0 1 0 1 0 1 0 1 1 1

1 0

X3

0 0 0 0 1 0 1 0 1 1 1 1

1 0

X4

Contrasts of exposed versus unexposed relatives for each relationship from model (2)   C ı1  C ı2  C ı3  C ı4  C ı1 C ı3 C 13  C ı1 C ı4 C 14  C ı2 C ı3 C 23  C ı2 C ı4 C 24  C ı3 C ı4 C 34  C ı1 C ı3 C ı4 C 13 C 14 C 34 C 1  C ı2 C ı3 C ı4 C 23 C 24 C 34 C 2

Table I. Coding system for exposure Z and variables Xk .k D 1; 2; 3; 4/ identifying specific familial relationships, with the corresponding parameters of model (2) denoting the contrasts between exposed and unexposed subjects.

M. LEE ET AL.

Statist. Med. 2013, 32 5353–5365

M. LEE ET AL.

Thus, a unified model of disease risk for first-degree relatives can be constructed as follows: 0 4 X X @ .t jX; Z/ D 0 .t /exp ˇX C Z C ık Xk Z C kk 0 Xk Xk 0 X

C

k¤k 0 ;k

A unified model for estimating and testing familial aggregation.

Investigations of familial aggregation of disease can provide important clues for genetic mechanisms, and many such studies have been published in the...
483KB Sizes 0 Downloads 0 Views