HMG Advance Access published June 27, 2014 Human Molecular Genetics, 2014 doi:10.1093/hmg/ddu312

1–7

Most common ‘sporadic’ cancers have a significant germline genetic component Yi Lu1,{, Weronica E. Ek1,{, David Whiteman2, Thomas L. Vaughan6, Amanda B. Spurdle3, Douglas F. Easton7,8, Paul D. Pharoah7,8, Deborah J. Thompson7, Alison M. Dunning8, Nicholas K. Hayward4, Georgia Chenevix-Trench5, Q-MEGA and AMFS Investigators{, ANECS-SEARCH{, UKOPS-SEARCH{, BEACON Consortium{ and Stuart Macgregor1,∗ 1

Received March 16, 2014; Revised May 22, 2014; Accepted June 16, 2014

Common cancers have been demarcated into ‘hereditary’ or ‘sporadic’ (‘non-hereditary’) types historically. Such distinctions initially arose from work identifying rare, highly penetrant germline mutations causing ‘hereditary’ cancer. While rare mutations are important in particular families, most cases in the general population are ‘sporadic’. Twin studies have suggested that many ‘sporadic’ cancers show little or no heritability. To quantify the role of germline mutations in cancer susceptibility, we applied a method for estimating the importance of common genetic variants (array heritability, h 2g) to twelve cancer types. The following cancers showed a significant (P < 0.05) array heritability: melanoma USA set h 2g 5 0.19 (95% CI 5 0.01 –0.37) and Australian set h 2g 5 0.30 (0.10– 0.50); pancreatic h 2g 5 0.18 (0.06– 0.30); prostate h 2g 5 0.81 (0.32–1); kidney h 2g 5 0.18 (0.04 –0.32); ovarian h 2g 5 0.30 (0.18 – 0.42); esophageal adenocarcinoma h 2g 5 0.24 (0.14 – 0.34); esophageal squamous cell carcinoma h 2g 5 0.19 (0.07 – 0.31); endometrial UK set h 2g 5 0.23 (0.01 –0.45) and Australian set h 2g 5 0.39 (0.02 – 0.76). Three cancers showed a positive but non-significant effect: breast h 2g 5 0.13 (0 – 0.56); gastric h 2g 5 0.11 (0 – 0.27); lung h 2g 5 0.10 (0 – 0.24). One cancer showed a small effect: bladder h 2g 5 0.01 (0 –0.11). Among these cancers, previous twin studies were only able to show heritability for prostate and breast cancer, but we can now make much stronger statements for several common cancers which emphasize the important role of genetic variants in cancer susceptibility. We have demonstrated that several ‘sporadic’ cancers have a significant inherited component. Larger genome-wide association studies in these cancers will continue to find more loci, which explain part of the remaining polygenic component.

INTRODUCTION Common cancers are frequently demarcated into ‘hereditary’ (‘familial’) or ‘sporadic’ (‘non-hereditary’) types (1). Such distinctions initially arose from work identifying rare highly penetrant germline mutations causing ‘hereditary’ cancer (such as CDKN2A mutations in melanoma and BRCA1/2 mutations in

breast cancer). These rare mutations are important in particular families but do not explain many cases in the general population. The more common ‘sporadic’ cancers were thought to have little or no germline genetic component, as the large twin studies failed to detect significant heritability for many cancers, with the exceptions of breast, colorectal and prostate cancers (2,3). Studies



To whom correspondence should be addressed at: QIMR Berghofer Medical Research Institute, Locked Bag 2000, Herston, QLD 4029, Australia. Tel: +61 738453563; Fax: +61 733620101; Email: [email protected] † These authors contribute equally. ‡ Q-MEGA and AMFS Investigators, ANECS-SEARCH members, UKOPS-SEARCH members and BEACON Consortium members are listed in Supplementary Material.

# The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]

Downloaded from http://hmg.oxfordjournals.org/ at Georgetown University on March 2, 2015

Statistical Genetics, 2Cancer Control Group, 3Molecular Cancer Epidemiology, 4Oncogenomics and 5Cancer Genetics, QIMR Berghofer Medical Research Institute, Herston, QLD, Australia, 6Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA, 7Department of Public Health and Primary Care, Centre for Cancer Genetic Epidemiology, and 8Department of Oncology, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, UK

2

Human Molecular Genetics, 2014

RESULTS For twelve cancer types, we were able to obtain genome-wide array data for .1000 cases and controls (Table 1). Eight of the twelve cancers studied demonstrated a significant polygenic component underlying cancer risk (Table 2). As with previous twin studies, we found a substantial genetic component for the most common cancer, prostate cancer, although our estimate (h 2g ¼ 0.81, 95% CI ¼ 0.32 – 1), albeit with wide confidence interval, was markedly higher than those reported in twin studies (0.42 in Lichtenstein et al. (2) and this estimate was updated to 0.58 using the Nordic Twin Registry of Cancer (NorTwinCan) (10); Table 2). For most other cancers, previous twin studies had insufficient sample size to draw any conclusions (2), but that was improved in the analysis of NorTwinCan (10). Using GWAS data, we showed that endometrial cancer, esophageal adenocarcinoma, esophageal squamous cell carcinoma (ESCC), melanoma, pancreatic, renal cell carcinoma (RCC) and ovarian cancer all had significant array heritability (range from 0.10 to 0.40). Our data showed positive but nonsignificant estimates for breast, gastric and lung cancers; thus, we could not make a clear statement on these three cancers. Despite there being published genome-wide significant loci for bladder cancer, these only account for ,1% of the variance, and overall this cancer had only a small polygenic component (estimate 0.01, with upper 95% confidence limit of 0.11). Significant X chromosome effects were seen in esophageal adenocarcinoma, ESCC and gastric cancer (GC), under all three dosage compensation models (Supplementary Material, Table S1). When the effects of genome-wide significant SNPs were removed, the variance explained for prostate cancer and

melanoma decreased substantially, although only in the smaller of the two melanoma datasets was the remaining polygenic variance not statistically significant (Table 2). In most other cases, the genome-wide significant SNPs accounted for very little of the estimated polygenic variance. Following the removal of genetic relatedness and the adjustment of the first 20 principal components, we found the remaining population structure only contributed to a negligible proportion of genetic variance in each dataset. In most cases, the parameters indicating cryptic relatedness (b0) and population stratification (b1) were not significantly different from zero (Supplementary Material, Table S2). Hence, we conclude that it is unlikely that our estimates of h 2g were spuriously inflated by unaccounted for population structure. Furthermore, there was no significant array heritability in the first two control–control contrast studies after applying the standard analysis protocol (Supplementary Material, Table S3). We found, however, false array heritability in the comparison of two subsets of Welcome Trust Case-Control Consortium (WTCCC) controls (Supplementary Material, Table S3). This was removed by applying stringent quality control (QC) in addition to the standard protocol, which underlines the necessity of applying stringent QC to the analyses using WTCCC as controls. Overall, these results demonstrated that our method is robust.

DISCUSSION The goal of the current study was to quantify the genetic contribution to the lifetime risk of cancer. In eight of the twelve cancers studied, there was a significant polygenic component, that is, a large number of genes of weak effect are involved. Given a polygenic basis to disease, it has been shown that most people who develop cancer would be ‘sporadic cases’ (no affected first-, second- or third-degree relatives) (11). This may go some way toward explaining the misconception that most ‘sporadic’ cancers have little or no germline genetic component. The polygenic components estimated for these datasets were not largely attributable to the genome-wide significant variants identified by recent GWASs. Even in conditions such as prostate cancer, where the polygenic component decreased substantially when genome-wide significant loci were removed, a significant polygenic component remained. One corollary of this is that conducting larger GWASs in most cancers will continue to find ever larger numbers of loci, which will explain more of the polygenic component. It has been suggested that an enhanced predictive power, gained from incorporating a larger number of genetic loci, will help fulfill the promise of personalized risk prediction (12). In common with methods for estimating heritability from twin studies, we estimated the variance attributable to common SNPs (array heritability, h 2g) on an assumed underlying quantitative liability scale where individuals are modeled as becoming affected once they exceed a particular threshold. However, there are two ways in which estimates of heritability derived from twin studies differ from array heritability. First, when using twin/family data, expected sharing of the genome among close relatives is used, meaning that all of the genome contributes to estimates of heritability. In contrast, estimating the genetic relationship between unrelated individuals uses information on only the portion of the genome tagged by SNPs on the microarray

Downloaded from http://hmg.oxfordjournals.org/ at Georgetown University on March 2, 2015

examining affected relative pairs have shown a familial component to some cancers (4,5), but unlike twin studies, these cannot reliably distinguish between the effects of genes and the effects of common environment on heritability estimates. Recently, genome-wide association studies (GWAS) have identified some novel cancer susceptibility loci (1–120, depending on the cancer type), but these typically explain only a small proportion of the heritability. Some studies have considered the contribution of all genetic variants simultaneously instead of just few known loci, using approaches such as polygenic risk prediction (6), but only showed a small heritability for prostate cancer (7,8) and not for breast cancer (8). Thus, it remains unclear whether common germline genetic variants, on the whole, make significant contribution to ‘sporadic’ cancers. In the current study, we apply a recently developed method to several cancer GWAS datasets. This method links case – control status to relatedness derived from dense genetic marker data (9), allowing the estimation of the magnitude of the germline genetic component. Since most of the genotyping arrays used to date only include single-nucleotide polymorphisms (SNPs), which represent the common variation (frequency .1 –5%), the resultant estimate of ‘array heritability’ represents a lower bound for the overall heritability. Importantly, unlike studies that determine the contribution of genes based on estimates of risk among family members (e.g. sibling relative risk), our method excludes closely related individuals, yielding estimates less likely to be biased by shared environment.

Human Molecular Genetics, 2014

3

Table 1. Sample size (post-QC), number of SNPs (post-QC) and genotyping arrays for each cancer Cases

Controls

SNPs

Arrays

Bladder Breast Australian endometrial cancer UK endometrial cancer Esophageal adenocarcinoma ESCC GC Kidney Lung

2848 1081 564 670 1397 1388 1342 1080 2557

3159 1085 574 2558 1947 1614 1624 2489 2895

505729 489247 540995 495905 795160 442551 451186 498235 493060

Melanoma QLD Melanoma USA Ovary Pancreas Prostate

2009 1849 1710 1926 1093

1963 965 2562 1971 992

298477 809287 471552 499792 493308

HumanHapmap550 HumanHapmap550 Human610-Quad, Human1M-Duo Human610-Quad, Human1M-Duo Omni1-Quad Human660W-Quad Human660W-Quad HumanHapmap550, Human610-Quad, Human660W-Quad HumanHapmap550, HumanHapmap240, HumanHapmap300, Human610-Quad, Human1M-Duo Omni1-Quad, HumanHapmap610 Omni1-Quad Human610-Quad, Human1M-Duo HumanHapmap550, HumanHapmap610 HumanHapmap300, HumanHapmap240

Table 2. Array heritability explained by all autosomes (h 2g) compared with the total heritability estimates from the largest twin studies Cancer types

Bladder Breast Australian endometrial cancer UK endometrial cancer Esophageal adenocarcinoma ESCC GC Kidney Lung Melanoma QLD Melanoma USA Ovary Pancreas Prostate

Total heritability from twins studies Lichtenstein (2000) % Mucci (2013) %

Array heritability h 2g (95% CI) %

31 (0 –45) 27 (4 –41) 0 (0– 42)

n.a. 28 (12–52) 24 (14–87)

n.a. n.a. n.a. n.a. 26 (0 –49) n.a.

n.a. n.a. n.a. 23 (11–42) 25 (12–44) 39 (8– 81)

22 (0 –41) 36 (0 –53) 42 (29–50)

28 (15–47) n.a. 58 (52–63)

1 (0–11) 13 (0– 56) 39 (2– 76) 23 (1– 45) 24 (14–34) 19 (7– 31) 11 (0– 27) 18 (4– 32) 10 (0– 24) 30 (10–50) 19 (1– 37) 30 (18–42) 18 (6– 30) 81 (32–100)

h 2g (95% CI), after removing known loci % 0 (0 –10) 5 (0 –46) 39 (2–76) 23 (1–45) 24 (14– 34) 19 (7–31) 8 (0 –22) 15 (1–31) 8 (0 –22) 21 (1–41) 8 (0 –28) 29 (17– 41) 16 (4–28) 59 (12– 100)

Ones in bold are significantly different from zero (P , 0.05).

(i.e. common variants). Rare genetic variants (such as rare BRCA1/2 mutations in breast cancer) are generally not tagged on commercial microarrays, and hence, they do not contribute to our array heritability estimates. Second, twin studies sample from the general population and hence provide heritability estimates for the cancer up to the age at which the twins were ascertained (13) (twin studies typically ascertain individuals who are ,85 years of age as assumed for our h 2g calculations), which means that some twin study-derived estimates of heritability are artificially lowered for late onset cancers such as prostate. For example, when adding 10 years of follow-up (plus expanding the original twin cohorts from the Nordic countries), the heritability of prostate cancer was found significantly higher than the previous estimate (approximately one-third increase), which may be due to different age ascertainment (Table 2). Results from our study should be interpreted in the context of limitations. Despite having ensured of .1000 cases and controls for all 12 cancer types, the statistical power for detecting a significant polygenic component varies dramatically with disease prevalence and true heritability. All but bladder, breast, gastric and lung cancer sets had reasonable statistical power (≥0.7);

hence, the estimates of array heritability we obtained here should be relatively robust. However, for those four cancers, we had limited power (,0.5) to draw clear conclusion on their genetic variance. It is worth noting that, since the information for the estimation of array heritability derives from the differences between cases and controls, when cases form a more extreme sample from the population [i.e. cancers with lower lifetime risk (K)], the information content of a sample is increased. Mathematically, for K , 0.5, the variance of the array heritability reduces as K reduces [is proportional to (K (1 2 K))4] (9). Hence, for the two most common cancer types, prostate and breast cancer, a sample size of 1000 cases and controls are less informative than for rarer cancer types; thus, the estimates had wide confidence intervals. The prostate cancer dataset is from the Prostate, Lung, Colon and Ovarian (PLCO) Cancer Screening Trial. It contains a larger proportion of aggressive disease than observed in the population (aggressive disease was defined as cases with a Gleason Sore of ≥7 or Stage of ≥III at the time of diagnosis; 60% in the current sample set as compared with 20– 30% in prostate cancer patients from the general population). To our

Downloaded from http://hmg.oxfordjournals.org/ at Georgetown University on March 2, 2015

Cancer types

4

Human Molecular Genetics, 2014

making it unclear how to best categorize cancer cases so they form a homogeneous group. In breast cancer and ovarian cancer, there is evidence for different subtypes having distinct genetic bases (17,18). Ideally for the polygenic analysis, subtypes would be tested separately but we lacked sufficient sample size to, for example, differentiate between different ovarian cancer histologies. Similarly, there are multiple ways of dealing with the smoking and non-smoking groups in lung cancer, but further investigation is limited by the sample size. There are differences in prevalence by sex in some cancers. We modeled this by fitting sex as a covariate. In addition to modeling an overall difference in prevalence between sexes, array heritability can be estimated in each sex separately, although for most of the cancers presented here, we had insufficient data to make a clear statement on sex-specific polygenic effects. We kept Australian and United States melanoma samples as separate groupings because the lifetime risk, a key part of the array heritability estimate, varies markedly between countries. The UK and Australian endometrial cancer sets were also analyzed separately because different QC was applied. In summary, we show that common genetic variants contribute significantly to susceptibility to most of the cancer types that we examined. For most cancers examined here, the descriptor ‘sporadic’ or ‘non-hereditary’ should be replaced by ‘polygenic’.

MATERIALS AND METHODS Individual study descriptions are listed as follows.

Melanoma (Australian) Data consist of a subset of samples from an Australian melanoma study (19). Melanoma cases in individuals of European descent (n ¼ 2168) were selected from the Queensland study of Melanoma: Environment and Genetic Associations (Q-MEGA) (20) and the Australian Melanoma Family Study (21). Samples from Australian individuals of European descent from three different sources were used as controls (n ¼ 2146) (20– 22).

Melanoma (United States) Data consist upon an extensive resource of melanoma cases and hospital based controls collected over several years at the U.T. M.D. Anderson Cancer Centre. This dbGaP study (dbGaP accession number phs000187.v1.p1) contains samples from 1965 European ancestry cases and 1038 European ancestry controls using the Illumina OMNI1-Quad SNP chip (23,24).

Breast cancer Data consist of a subset from the Nurses’ Health Study (25,26). Controls were matched to cases based on age, blood collection variables (time, date and year of blood collection, as well as recent (,3 months) use of postmenopausal hormones), ethnicity (all cases and controls are self-reported Caucasians) and menopausal status (all cases and controls were menopausal at blood draw). The dbGaP accession number is phs000147.v1.p1.

Downloaded from http://hmg.oxfordjournals.org/ at Georgetown University on March 2, 2015

knowledge, there is no existing heritability estimate for prostate cancer stratified by aggressive and nonaggressive disease. We estimated the array heritability separately for these two diseases and found that the aggressive disease (h 2g ¼ 0.80, 95% CI ¼ 0.15– 1, P ¼ 0.009, based on 626 aggressive disease and 992 controls) had a higher heritability than the nonaggressive disease (h 2g ¼ 0.55, 95% CI ¼ 0 – 1, P ¼ 0.1, based on 467 nonaggressive cases and same controls). Due to the limited sample size, we cannot distinguish whether the heritability of aggressive and nonaggressive disease were significantly different. It warrants further investigation in larger prostate cancer datasets. If the heritability of aggressive disease is confirmed to be higher than that of nonaggressive disease, our dataset with a large proportion of aggressive disease might partly explain why our estimate is higher than those in the literature. A further consideration with prostate cancer is that there are very few familial genes identified. This is in contrast with, for example, breast and ovarian cancers, where the effect of high- to moderate-penetrance alleles including BRCA1/2 mutations explain .20% of the familial relative risk (14). Since the contribution of rare highpenetrance genes to prostate cancer appears to be small, our estimate of the array heritability explained by common variants might be high and very close to the total heritability. The heritability of breast cancer from the twin studies was consistently 0.3 (2,10). The estimate was higher in family studies (0.4–0.6) (15), but this may be due to shared environmental factors. Our estimate of array heritability accounts for approximately half of the estimates from twin studies, which is not surprising given familial genes such as BRCA1/2 with high penetrance account for a sizable proportion of the total heritability. For these common cancers, ideally the Genome-wide Complex Trait Analysis (GCTA) approach is applied to large datasets from the same or very similar populations, through consortium effort such as Collaborative Oncological Gene-environment Study (COGS) (16). At present, the genotyping array (iCOGS) does not have sufficient coverage as it contains mainly the finemapping loci without tagging the whole genome. If applying GCTA to these data, the estimate of array heritability would be much smaller due to insufficient array coverage. However, it might be feasible when the second OncoArray with reasonable tagging of the whole genome becomes available. Our estimate of array heritability for bladder cancer was 0 – 0.11. Despite there are known risk variants, these only explain a very small proportion of heritability. The current sample set was underpowered to detect a low heritability: with the sample size of 2848 cases and 3159 controls, we had .80% power to detect a true heritability of 0.1, but only 30% power if the true heritability is 0.05. Although we did not exclusively show a significant heritability, the result did suggest that the heritability of bladder cancer is likely to be much lower than other cancer types. We have used publically available data for a number of cancer types, including bladder, breast, esophagus squamous cell carcinoma, gastric, kidney, lung, melanoma (USA), pancreas and prostate. Some of these datasets consisted of multiple small cohorts, which were genotyped on different arrays. Using different arrays for cases and controls, respectively, may inflate estimates of array heritability. However, our robustness checks indicated that such inflation was unlikely to be large. For some cancers, ongoing research is providing a molecular taxonomy for cancer, which is currently in a state of evolution,

Human Molecular Genetics, 2014

Prostate cancer Data consist of a subset from the Cancer Genetic Markers of Susceptibility prostate cancer GWAS (27) with patients of European ancestry drawn from the PCLO cancer screening trial. The dbGaP accession number is phs000207.v1.p1. The aggressive subtype was defined as cases with a Gleason Sore of ≥7 or Stage of ≥III at the time of diagnosis, whereas the nonaggressive subtype was defined as cases with a Gleason Score of ,7 and Stage of ,III.

5

datasets used for the analyses described in this manuscript were obtained from dbGaP through dbGaP accession number phs000346.v1.p1. Ovarian cancer All cases were from the UK and were confirmed as invasive epithelial ovarian cancer (33). Genotyping for the cases was conducted using the Illumina Infinium 610 K array at Illumina Corporation. Existing data from the WTCCC, genotyped using the Illumina Human1M-Duo array, were used as controls.

Pancreatic cancer Esophageal adenocarcinoma The data used were collected by investigators in the Barrett’s and Esophageal Adenocarcinoma Consortium (BEACON). A Subset of individuals with European ancestry from 17 epidemiologic studies from three cohorts (Australia, Europe and North America) was selected for this study (1509 cases and 3202 controls). Cases were clinically diagnosed and histologically confirmed with esophageal adenocarcinoma.

Lung cancer Data consist of a subset from the national cancer institute (NCI) GWAS study, specifically the consent group ‘Cancer in all age groups, other diseases in adults only, and methods’, containing a total of 7622 subjects (30). Subjects were drawn from one population-based case – control study and three cohort studies. Lung cancer cases were included based on a lung cancer diagnosis established on clinical criteria and confirmed by pathology reports from surgery, biopsy or cytology samples in 95% of cases and on clinical history and imaging for the remaining 5%. The datasets used for the analyses described in this manuscript were obtained from dbGaP through dbGaP accession number phs000336.v1.p1.

Gastric adenocarcinoma and esophageal squamous cell carcinoma Data used for these two cancers were collected from the Asian UGI GWAS (34,35). Esophageal squamous cell carcinomas include incident, primary esophageal cancers with squamous histology. Gastric cancers include incident, primary gastric adenocarcinomas. Controls were all cancer-free at the time of enrolment (and blood draw). The datasets used for the analyses described in this manuscript were obtained from dbGaP through dbGaP accession number phs000361.v1.p1. Endometrial cancer

Kidney cancer (renal cell carcinoma) Data consist of a subset of the NCI GWAS of RCC (31). The GWAS study included 1453 RCC cases and 3531 controls of European background from 4 studies (3 prospective cohorts and 1 case– control study). All cases were diagnosed with pathologically confirmed renal cancer (International Classification of Diseases for Oncology, Second Edition, Topography C64). The datasets used for the analyses described in this manuscript were obtained from dbGaP through dbGaP accession number phs000351.v1.p1. Bladder cancer Data consist of a subset of the GWAS study (32). Cases and controls used in this study were all from European ancestry (3532 cases and 5119 controls) (32). Cases were defined as individuals having histological confirmed primary carcinoma of the urinary bladder. Scan data were obtained from two case – control studies carried out in Spain and the United States and three prospective cohort studies in Finland and the United States. Samples were genotyped on Illumina SNP array (Human_1M, HumanHap550, HumaHap250, HumanHap300 or Human610-Quad). In this study, we only used individuals typed on the 550 chip. The

GWAS was conducted using cases with endometrioid histology endometrial cancer from Australia and the UK (36). Genotyping was conducted using the Human 610 K array on the Illumina Infinium platform. We extracted control data for SNPs included on the 610-K platform from existing Illumina 1.2 M genomewide scan data for controls of European ancestry from two UK population-based studies genotyped by the WTCCC (37). We also matched Australian cases on the basis of ancestry with 1244 individuals from the Hunter Community Study (38). Methods We applied a standardized QC pipeline to all datasets: individuals with .5% missing genotypes were excluded, as were SNPs with minor allele frequency of ,0.01 or call rate of ,0.99 or P-value from testing Hardy –Weinberg equilibrium of ,0.0001. Specific to two datasets, UK endometrial cancer and ovarian cancer, both using the WTCCC as controls, we have applied more stringent QC regarding differential missingness between cases and controls (P , 0.001) and two-locus QC (P , 0.02) as described previously (39). The numbers of SNPs passed QC were summarized in Table 1. We used GCTA (9,40) to estimate the genetic relatedness between pairs of individuals that were derived from all autosomal

Downloaded from http://hmg.oxfordjournals.org/ at Georgetown University on March 2, 2015

Participants were drawn from 12 cohort studies and 8 case– control studies (28,29). Cases were defined as those individuals having primary adenocarcinoma of the exocrine pancreas. Those with non-exocrine pancreatic tumors were excluded from the study. The datasets used for the analyses described in this manuscript were obtained from dbGaP through dbGaP accession number phs000206.v3.p2

6

Human Molecular Genetics, 2014

extends beyond this distance, removing all SNPs in such regions will result in all signals at the locus being removed. The methods for estimating the ‘polygenic’ component are potentially sensitive to QC issues; hence, we tested the robustness of this method by estimating array heritability in control sets from different studies: first, we contrasted controls from the breast and prostate cancer studies; second, we combined controls from the breast and the prostate cancer studies and compared this combined set with controls from the pancreatic cancer study; third, we split the WTCCC controls into two sets, 1958 British Birth cohort and the UK Blood Service control group. The analysis protocol was identical to the one used in case – control studies. We arbitrarily defined one set of controls as ‘cases’ and tested whether the array heritability was significantly different from zero. Since ‘lifetime risk’ in the situation cannot be defined, we did the calculation on the observed case/control scale; note the significance of the genetic component is not affected by scaling for ‘lifetime risk’.

SUPPLEMENTARY MATERIAL Supplementary Material is available at HMG online.

ACKNOWLEDGEMENTS We thank Jian Yang and Sang Hong Lee for helpful discussions. Additional acknowledgements for each study are listed in Supplementary Material. Conflict of Interest statement. None declared.

FUNDING S.M., N.K.H., A.B.S. and G.C.T. are supported by Australian National Health and Research Council fellowships. S.M. and D.W. are supported by an Australian Research Council fellowship.

REFERENCES 1. Roukos, D.H., Murray, S. and Briasoulis, E. (2007) Molecular genetic tools shape a roadmap towards a more accurate prognostic prediction and personalized management of cancer. Can. Biol. Ther., 6, 308–312. 2. Lichtenstein, P., Holm, N.V., Verkasalo, P.K., Iliadou, A., Kaprio, J., Koskenvuo, M., Pukkala, E., Skytthe, A. and Hemminki, K. (2000) Environmental and heritable factors in the causation of cancer--analyses of cohorts of twins from Sweden, Denmark, and Finland. N. Engl. J. Med., 343, 78– 85. 3. Mack, T.M., Hamilton, A.S., Press, M.F., Diep, A. and Rappaport, E.B. (2002) Heritable breast cancer in twins. Br. J. Can., 87, 294– 300. 4. Schildkraut, J.M., Risch, N. and Thompson, W.D. (1989) Evaluating genetic association among ovarian, breast, and endometrial cancer: evidence for a breast/ovarian cancer relationship. Am. J. Hum. Genet., 45, 521– 529. 5. Hemminki, K., Vaittinen, P., Dong, C. and Easton, D. (2001) Sibling risks in cancer: clues to recessive or X-linked genes? Br. J. Can., 84, 388–391. 6. International Schizophrenia, C., Purcell, S.M., Wray, N.R., Stone, J.L., Visscher, P.M., O’Donovan, M.C., Sullivan, P.F. and Sklar, P. (2009) Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature, 460, 748–752. 7. Chatterjee, N., Wheeler, B., Sampson, J., Hartge, P., Chanock, S.J. and Park, J.H. (2013) Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nat. Genet, 45, 400– 405, 405e401–403.

Downloaded from http://hmg.oxfordjournals.org/ at Georgetown University on March 2, 2015

SNPs post-QC. By relating observed case/control status to the estimated genetic relationship matrix, the variance explained by all autosomal SNPs was estimated using restricted maximum likelihood in GCTA (9). We define array heritability (h 2g) as variance explained by autosomal SNPs divided by the total variance. Intuitively, h 2g is high if pairs of individuals with high genetic relatedness (estimated from SNPs) have more similar phenotypes than those with lower genetic relatedness. All datasets were restricted to individuals with European ancestry, except for the GC and ESCC, which were restricted to individuals with Asian ancestry. To limit the impact of population substructure (cryptic relationship and population stratification) on the estimation of array heritability, we included only those strictly unrelated samples (genetic relatedness , 0.025, approximately equivalent to second cousin; the numbers of cases and controls after removing genetic relatedness were summarized in Table 1) and adjusted for the first 20 principal components. We also applied the simple approach based on genome partitioning (41) to quantify remaining population substructure. This approach involves modeling the differences in by-chromosome h 2g, which are estimated from separately fitting GRM of one chromosome at a time and from jointly fitting the GRMs of all 22 autosomes as random effects, against the chromosome length, i.e. h 2c_sep 2 h 2c_joint ¼ b0 + b1 × Lc + e, with b0 indicating inflation due to cryptic relationship (irrespective of chromosomal length Lc) and b1 indicating inflation due to population stratification (correlates with Lc). The genetic variance attributable to remaining population structure can thus be calculated from b0 × 22/21 + b1 × sum (Lc)/21 (41). To model differing prevalence between sexes, sex was fitted as a fixed effect if appropriate. We assume causal variants have a similar frequency spectrum as genotyped SNPs (42). Variance explained by X chromosome SNPs, defined as h 2g_x, was estimated separately, under a full dosage compensation model (that is, each allele in females has only a half of the effect in males, h 2g_x for females ¼ 12 × h 2g_x for males) (41). We also estimated h 2g_x under alternative models: no dosage compensation (that is, each allele has a similar effect on the trait; thus, the genetic variance in females is twice of that in males, h 2g_x for females ¼ 2 × h 2g_x for males) and equal variance between males and females (that is, h 2g_x for females ¼ h 2g_x for males) (41). We initially estimated variance explained on the observed scale (cancer case or control). For such binary traits, heritability is usually parameterized on an unobserved continuous ‘liability’ scale (typically, a Normal distribution, where individuals above a threshold are ‘affected’). This is desirable as it allows the heritability to describe the relative importance of genetic and nongenetic factors independent of disease prevalence. Hence, to allow us to compute a lower bound for heritability (on the continuous scale), we transform the estimate obtained on the observed scale using the estimated lifetime risk from population studies as the ‘prevalence’ (Supplementary Material, Table S4). To assess how much array heritability has already been explained by the specific SNPs unambiguously identified by GWAS, for each cancer, we analyzed the data with the known GWAS loci removed. To do this, for each SNP in the GWAS catalog (http://www.genome.gov/gwastudies/ accessed 21 August 2013), we removed the associated SNPs and all SNPs within 1 megabase. Since linkage disequilibrium very rarely

Human Molecular Genetics, 2014

27. Yeager, M., Orr, N., Hayes, R.B., Jacobs, K.B., Kraft, P., Wacholder, S., Minichiello, M.J., Fearnhead, P., Yu, K., Chatterjee, N. et al. (2007) Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat. Genetics, 39, 645– 649. 28. Amundadottir, L., Kraft, P., Stolzenberg-Solomon, R.Z., Fuchs, C.S., Petersen, G.M., Arslan, A.A., Bueno-de-Mesquita, H.B., Gross, M., Helzlsouer, K., Jacobs, E.J. et al. (2009) Genome-wide association study identifies variants in the ABO locus associated with susceptibility to pancreatic cancer. Nat. Genetics, 41, 986– 990. 29. Petersen, G.M., Amundadottir, L., Fuchs, C.S., Kraft, P., Stolzenberg-Solomon, R.Z., Jacobs, K.B., Arslan, A.A., Bueno-de-Mesquita, H.B., Gallinger, S., Gross, M. et al. (2010) A genome-wide association study identifies pancreatic cancer susceptibility loci on chromosomes 13q22.1, 1q32.1 and 5p15.33. Nat. Genet., 42, 224– 228. 30. Landi, M.T., Chatterjee, N., Yu, K., Goldin, L.R., Goldstein, A.M., Rotunno, M., Mirabello, L., Jacobs, K., Wheeler, W., Yeager, M. et al. (2009) A genome-wide association study of lung cancer identifies a region of chromosome 5p15 associated with risk for adenocarcinoma. Am. J. Hum. Genetics, 85, 679– 691. 31. Purdue, M.P., Johansson, M., Zelenika, D., Toro, J.R., Scelo, G., Moore, L.E., Prokhortchouk, E., Wu, X., Kiemeney, L.A., Gaborieau, V. et al. (2011) Genome-wide association study of renal cell carcinoma identifies two susceptibility loci on 2p21 and 11q13.3. Nat. Genet., 43, 60–65. 32. Rothman, N., Garcia-Closas, M., Chatterjee, N., Malats, N., Wu, X., Figueroa, J.D., Real, F.X., Van Den Berg, D., Matullo, G., Baris, D. et al. (2010) A multi-stage genome-wide association study of bladder cancer identifies multiple susceptibility loci. Nat. Genetics, 11, 978– 984. 33. Song, H., Ramus, S.J., Tyrer, J., Bolton, K.L., Gentry-Maharaj, A., Wozniak, E., Anton-Culver, H., Chang-Claude, J., Cramer, D.W., DiCioccio, R. et al. (2009) A genome-wide association study identifies a new ovarian cancer susceptibility locus on 9p22.2. Nat. Genet., 41, 996–1000. 34. Wang, L.D., Zhou, F., Li, X.M., Sun, L.D., Song, X., Jin, Y., Li, J.M., Kong, G.Q., Qi, H., Cui, J. et al. (2010) Genome-wide association study of esophageal squamous cell carcinoma in Chinese subjects identifies susceptibility loci at PLCE1 and C20orf54. Nat. Genetics, 42, 759– 763. 35. Abnet, C.C., Freedman, N., Hu, N., Wang, Z., Yu, K., Shu, X.O., Yuan, J.M., Zheng, W., Dawsey, S.M., Dong, L.M. et al. (2010) A shared susceptibility locus in PLCE1 at 10q23 for gastric adenocarcinoma and esophageal squamous cell carcinoma. Nat. Genetics, 42, 764–767. 36. Spurdle, A.B., Thompson, D.J., Ahmed, S., Ferguson, K., Healey, C.S., O’Mara, T., Walker, L.C., Montgomery, S.B., Dermitzakis, E.T., Fahey, P. et al. (2011) Genome-wide association study identifies a common variant associated with risk of endometrial cancer. Nat. Genet., 43, 451– 454. 37. 2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447, 661– 678. 38. McEvoy, M., Smith, W., D’Este, C., Duke, J., Peel, R., Schofield, P., Scott, R., Byles, J., Henry, D., Ewald, B. et al. (2010) Cohort profile: The Hunter Community Study. Int. J. Epidemiol., 39, 1452– 1463. 39. Lee, S.H., Nyholt, D.R., Macgregor, S., Henders, A.K., Zondervan, K.T., Montgomery, G.W. and Visscher, P.M. (2010) A simple and fast two-locus quality control test to detect false positives due to batch effects in genome-wide association studies. Genet. Epidemiol., 34, 854– 862. 40. Yang, J., Lee, S.H., Goddard, M.E. and Visscher, P.M. (2011) GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet., 88, 76–82. 41. Yang, J., Manolio, T.A., Pasquale, L.R., Boerwinkle, E., Caporaso, N., Cunningham, J.M., de Andrade, M., Feenstra, B., Feingold, E., Hayes, M.G. et al. (2011) Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet., 43, 519 –525. 42. Yang, J., Benyamin, B., McEvoy, B.P., Gordon, S., Henders, A.K., Nyholt, D.R., Madden, P.A., Heath, A.C., Martin, N.G., Montgomery, G.W. et al. (2010) Common SNPs explain a large proportion of the heritability for human height. Nat. Genet., 42, 565–569.

Downloaded from http://hmg.oxfordjournals.org/ at Georgetown University on March 2, 2015

8. Witte, J.S. and Hoffmann, T.J. (2011) Polygenic modeling of genome-wide association studies: an application to prostate and breast cancer. Omics: j. Integrat. Biol., 15, 393–398. 9. Lee, S.H., Wray, N.R., Goddard, M.E. and Visscher, P.M. (2011) Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet, 88, 294–305. 10. Mucci, L.A., Kaprio, J., Harris, J., Czene, K., Kraft, P., Scheike, T., Graff, R., Brandt, I., Holmes, N., Havelick, D. et al. (2013) In American Society of Human Genetics. Boston. 11. Yang, J., Visscher, P.M. and Wray, N.R. (2010) Sporadic cases are the norm for complex disease. Eur. J. Hum. Genet, 18, 1039–1043. 12. Fletcher, O. and Houlston, R.S. (2010) Architecture of inherited susceptibility to common cancer. Nat. Rev. Can., 10, 353–361. 13. So, H.C., Gui, A.H., Cherny, S.S. and Sham, P.C. (2011) Evaluating the heritability explained by known susceptibility variants: a survey of ten complex diseases. Genet Epidemiol, 35, 310 –317. 14. Bahcall, O.G. (2013) Common variation and heritability estimates for breast, ovarian and prostate cancers. Nat. Genet., doi:10.1038/ngicogs.1. 15. Risch, N. (2001) The genetic epidemiology of cancer: interpreting family and twin studies and their implications for molecular genetic approaches. Can. Epidemiol. Biomark. Prev., 10, 733– 741. 16. Bahcall, O.G. (2013) iCOGS collection provides a collaborative model. Foreword Nat. Genet., 45, 343. 17. Kraft, P. and Haiman, C.A. (2010) GWAS identifies a common breast cancer risk allele among BRCA1 carriers. Nat. Genet., 42, 819– 820. 18. Bolton, K.L., Tyrer, J., Song, H., Ramus, S.J., Notaridou, M., Jones, C., Sher, T., Gentry-Maharaj, A., Wozniak, E., Tsai, Y.Y. et al. (2010) Common variants at 19p13 are associated with susceptibility to ovarian cancer. Nat. Genet., 42, 880–884. 19. Macgregor, S., Montgomery, G.W., Liu, J.Z., Zhao, Z.Z., Henders, A.K., Stark, M., Schmid, H., Holland, E.A., Duffy, D.L., Zhang, M. et al. (2011) Genome-wide association study identifies a new melanoma susceptibility locus at 1q21.3. Nat. Genet., 43, 1114–1118. 20. Baxter, A.J., Hughes, M.C., Kvaskoff, M., Siskind, V., Shekar, S., Aitken, J.F., Green, A.C., Duffy, D.L., Hayward, N.K., Martin, N.G. et al. (2008) The Queensland Study of Melanoma: environmental and genetic associations (Q-MEGA); study design, baseline characteristics, and repeatability of phenotype and sun exposure measures. Twin. Res. Hum. Genet, 11, 183–196. 21. Cust, A.E., Schmid, H., Maskiell, J.A., Jetann, J., Ferguson, M., Holland, E.A., Agha-Hamilton, C., Jenkins, M.A., Kelly, J., Kefford, R.F. et al. (2009) Population-based, case-control-family design to investigate genetic and environmental influences on melanoma risk: Australian Melanoma Family Study. Am. J. Epidemiol, 170, 1541– 1554. 22. Painter, J.N., Anderson, C.A., Nyholt, D.R., Macgregor, S., Lin, J., Lee, S.H., Lambert, A., Zhao, Z.Z., Roseman, F., Guo, Q. et al. (2011) Genome-wide association study identifies a locus at 7p15.2 associated with endometriosis. Nat. Genet., 43, 51–54. 23. Li, C., Liu, Z., Wang, L.E., Gershenwald, J.E., Lee, J.E., Prieto, V.G., Duvic, M., Grimm, E.A. and Wei, Q. (2008) Haplotype and genotypes of the VDR gene and cutaneous melanoma risk in non-Hispanic whites in Texas: a case-control study. Int. J. Can., 122, 2077–2084. 24. Li, C., Zhao, H., Hu, Z., Liu, Z., Wang, L.E., Gershenwald, J.E., Prieto, V.G., Lee, J.E., Duvic, M., Grimm, E.A. and Wei, Q. (2008) Genetic variants and haplotypes of the caspase-8 and caspase-10 genes contribute to susceptibility to cutaneous melanoma. Hum. Mutat., 29, 1443–1451. 25. Haiman, C.A., Chen, G.K., Vachon, C.M., Canzian, F., Dunning, A., Millikan, R.C., Wang, X., Ademuyiwa, F., Ahmed, S., Ambrosone, C.B. et al. (2011) A common variant at the TERT-CLPTM1L locus is associated with estrogen receptor-negative breast cancer. Nat. Genet., 43, 1210–1214. 26. Hunter, D.J., Kraft, P., Jacobs, K.B., Cox, D.G., Yeager, M., Hankinson, S.E., Wacholder, S., Wang, Z., Welch, R., Hutchinson, A. et al. (2007) A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat. Genetics, 39, 870–874.

7

Most common 'sporadic' cancers have a significant germline genetic component.

Common cancers have been demarcated into 'hereditary' or 'sporadic' ('non-hereditary') types historically. Such distinctions initially arose from work...
113KB Sizes 0 Downloads 3 Views