HHS Public Access Author manuscript Author Manuscript

Cancer Res. Author manuscript; available in PMC 2017 April 01. Published in final edited form as: Cancer Res. 2016 April 1; 76(7): 1714–1723. doi:10.1158/0008-5472.CAN-15-0338.

Genomic landscape of somatic alterations in esophageal squamous cell carcinoma and gastric cancer

Author Manuscript

Nan Hu1, Mitsutaka Kadota2, Huaitian Liu2, Christian C Abnet1, Hua Su1, Hailong Wu2, Neal D Freedman1, Howard H Yang2, Chaoyu Wang1, Chunhua Yan2, Lemin Wang1, Sheryl Gere2, Amy Hutchinson1,3, Guohong Song2, Yuan Wang4, Ti Ding4, You-Lin Qiao5, Jill Koshiol1, Sanford M Dawsey1, Carol Giffen6, Alisa M Goldstein1, Philip R Taylor1,*, and Maxwell P Lee2,* 1Division

of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA

2Center

for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA

3Cancer

Genomics Research Laboratory, Leidos, Gaithersburg, Maryland, USA

4Shanxi

Cancer Hospital, Taiyuan, Shanxi, PR China

5Cancer

Institute, Chinese Academy of Medical Sciences, Beijing, PR China

6Information

Management Services, Inc., Silver Spring, MD 20904, USA

Author Manuscript

Abstract

Author Manuscript

Gastric cancer and esophageal cancer are the 2nd and 6th leading causes of cancer death worldwide. Multiple genomic alterations underlying gastric cancer and esophageal squamous cell carcinoma (ESCC) have been identified, but the full spectrum of genomic structural variations and mutations have yet to be uncovered. Here we report the results of whole genome sequencing of 30 samples comprising tumor and blood from 15 patients, four of whom presented with ESCC, seven with gastric cardia adenocarcinoma (GCA), and four with gastric noncardia adenocarcinoma (GNCA). Analyses revealed that an A>C mutation was common in GCA, and in addition to the preferential nucleotide sequence of A located 5 prime to the mutation as noted in previous studies, we found enrichment of T in the 5 prime base. The A>C mutations in GCA suggested that oxidation of guanine may be a potential mechanism underlying cancer mutagenesis. Furthermore, we identified genes with mutations in gastric cancer and ESCC, including well-known cancer genes, TP53, JAK3, BRCA2, FGF2, FBXW7, MSH3, PTCH, NF1, ERBB2, and CHEK2, and potentially novel cancer-associated genes, KISS1R, AMH, MNX1, WNK2, and PRKRIR. Finally, we identified recurrent chromosome alterations in at least 30% of tumors in genes including MACROD2, FHIT, and PARK2 that were often intragenic deletions. These structural alterations were validated using the TCGA dataset. Our studies provide new insights into understanding the

*

Correspondence should be sent to Philip R. Taylor, Genetic Epidemiology Branch, DCEG, NCI, 9609 Medical Center Drive, Rm 6E444 MSC 90892, Rockville, MD 90892-9769, Tel. 240-276-7235, Fax. 240-276-7832 ; Email: [email protected] or Maxwell P. Lee, CCR, NCI, 9609 Medical Center Drive, Rm 1W586, Rockville, MD 20850. Tel.240-276-5239, ; Email: [email protected]. Conflict of interest statement: The authors disclose no potential conflicts of interest.

Hu et al.

Page 2

Author Manuscript

genomic landscape, genome instability, and mutation profile underlying gastric cancer and ESCC development.

Introduction

Author Manuscript

Gastric cancer and esophageal cancer cause an estimated 783,000 and 407,000 deaths respectively each year, and represent the 2nd and 6th leading causes of cancer death worldwide (1). In China, gastric cardia adenocarcinoma (GCA) and esophageal squamous cell carcinoma (ESCC) occur together in the Taihang Mountains of north central China, including Shanxi and Henan Provinces, at some of the highest rates reported for any cancer (2), and historically over 20% of all deaths in this region have been attributed to these diseases (3). However the cause of the high rates and geographical overlap of these two anatomically adjacent but histologically distinct tumors has not been determined. Gastric cancers in this area occur primarily in the uppermost portion of the stomach and are referred to as gastric cardia adenocarcinoma (GCA), while those in the remainder of the stomach are referred to as gastric noncardia adenocarcinoma (GNCA). In addition to being anatomically adjacent, GCA and ESCC share many of the same etiologic risk factors, and before the widespread use of endoscopy and biopsy, they were diagnosed as a single disease referred to as “esophageal cancer” (4). The reason for the high rates of GCA and ESCC in this geographic area and their relation to each other remains unclear, but there are almost certainly common etiologically important environmental exposures, and a recent genomewide association study of germline DNA found that the same single nucleotide polymorphisms (SNPs) in the PLCE1 gene had the strongest associations with risk for both GCA and ESCC (5). This led to our concurrent examination of these two cancers plus GNCA in the current study.

Author Manuscript

Recent advances in next generation sequencing technology have revolutionized how we study cancer genomes. The identification of IDH1/2 mutations, initially in glioma (6, 7) and more recently in many other cancers such as AML (8), has transformed our understanding of cancer by relating mutations to metabolic control and epigenetic regulation (9). IDH1/2 encode isocitrate dehydrogenases (IDH), which convert isocitrate to 2-oxoglutarate. But mutant IDHs produce 2-hydroxyglutarate, which inhibits the methyl cytosine hydroxylase TET2 as well as H3K36 demethylases, thus changes the global epigenetic landscape. Whole genome sequencing (WGS) is particularly useful for elucidating complex genomic changes, including translocations, inversions, tandem duplications, and large deletions. The importance of these structural changes has been well documented in the case of BCR-ABL in leukemia, TMPRESS2-ERG in prostate cancer (10), and EML4-ALK in lung cancer (11).

Author Manuscript

For gastric adenocarcinoma and ESCC, several publications have reported genomic scale analyses of cancers using exome or whole genome sequencing technology. Wang et al. performed exome sequencing of 22 gastric cancer samples and identified frequent mutations in ARID1A (12). Mutations in ARID1A were particularly high in gastric cancers with microsatellite instability (MSI) (83%) or with Epstein-Barr virus (EBV) infection (73%). Exome sequencing of 15 gastric adenocarcinomas and their matched normal DNAs by Zang et al also identified frequent mutations of ARID1A (13). In addition, they found 5% of

Cancer Res. Author manuscript; available in PMC 2017 April 01.

Hu et al.

Page 3

Author Manuscript Author Manuscript

gastric cancer contained FAT4 mutations. A study by Agrawal et al reported exomic sequencing of 11 esophageal adenocarcinomas (EAC) and 12 esophageal squamous cell carcinomas (ESCC) and found frequent NOTCH1 mutations in ESCC (14). Nagarajan et al performed whole genome sequencing analysis for two gastric cancer samples and found three mutational signatures (15). Dulak et al did exome sequencing for 149 esophageal adenocarcinoma (EAC) tumor-normal pairs and whole-genome sequencing for 15 EAC and matched normals. They found a high prevalence of A>C transversions at AA dinucleotides (16). A recent study by Wang et al analyzed 100 tumor-normal pairs of gastric cancer with whole-genome sequencing and identified MUC6, CTNNA2, GLI3, RNF43, and RHOA as significantly mutated driver genes. They found that RHOA mutations are specific for diffuse-type tumors (17). Recently, the International Cancer Genome Consortium research team published a study involving whole genome sequencing of 17 ESCC cases and exome sequencing of 71 ESCC cases which identified two novel cancer driver genes, ADAM29 and FAM135B (18). Lin et al. performed exome sequencing on 20 ESCC cases and targeted sequencing of 139 ESCC cases and identified several mutated genes that were previously unknown, including FAT1, FAT2, ZNF750 and KMT2D (19), and Gao et al. did exome sequencing on 113 ESCC cases and noted that histone modifier genes were frequently mutated, including KMT2D, KMT2C, KDM6A, EP300, and CREBBP. (20). Zhang et al. recently reported exome sequencing of 90 ESCC and whole genome sequencing of 14 ESCC and found that an APOBEC-mediated mutational signature in 47% of tumors (21).

Author Manuscript

Despite this progress, the full spectrum of genomic alterations in gastric cancer and ESCC, particularly genomic structural variations and intergenic and intronic mutations, remains to be characterized. To discover and characterize genomic alterations in GCA, GNCA, and ESCC, we analyzed whole genome sequences of tumors and matched blood DNA samples generated by Complete Genomics Inc (CGI). We present here our findings of novel mutation substitution pattern, driver mutations, and recurrent structural variations. Complete characterization of the genomic landscape of these cancers will hopefully provide new strategies for early diagnosis and therapy for these deadly diseases.

Materials and Methods Study population

Author Manuscript

This study was approved by the Institutional Review Boards of the Shanxi Cancer Hospital, the Cancer Institute and Hospital of the Chinese Academy of Medical Sciences (CICAMS), and the US National Cancer Institute (NCI). Fifteen cases were analyzed with whole genome sequencing (WGS). These cases came from a larger study sample and were selected based on high-quality, sufficiently large amount of DNA (requiring at least 10 ug) available for WGS, and patients being deceased. Three cases with esophageal squamous cell carcinoma (ESCC), seven cases with gastric cardia adenocarcinoma (GCA), and four cases with gastric noncardia adenocarcinoma (GNCA) diagnosed between 1998 and 2001 in the Shanxi Cancer Hospital in Taiyuan, Shanxi Province, PR China, were recruited to participate in this study. One ESCC case from Yaocun Commune Hospital in Linxian, Henan Province was also recruited. None of the cases had therapy prior to their surgical resection. After obtaining informed consent, cases were interviewed to obtain information on demographics,

Cancer Res. Author manuscript; available in PMC 2017 April 01.

Hu et al.

Page 4

Author Manuscript

cancer risk factors (eg, detailed family history of cancer), and clinical information. Only deceased cases were selected for study. Clinical data are described in Supplementary Table 1. Biological specimen collection and processing Venous blood (10 ml) was taken from each case prior to surgery and germline DNA from whole blood was extracted and purified using the standard phenol/chloroform method. Tumors obtained during surgery were snap-frozen in liquid nitrogen and stored at −130°C until used. The specimens were chosen for this study based on two criteria: (a) histological diagnosis of ESCC or gastric cancer confirmed by pathologists at the Shanxi Cancer Hospital or CICAMS, and the NCI; (b) availability of high purity tumor tissue (at least >75%).

Author Manuscript

Tissue DNA isolation DNA from frozen tumors were extracted using AllPrep DNA/RNA/Protein Mini Kit (Qiagen, Inc., Valencia, CA, USA). DNA was dissolved in 100 ul Buffer BE. Concentrations of DNA were measured with the NanoDrop 2000 Spectrophotomer (Thermo Fisher Scientific, Wilmington, DE, USA) according to the manufacturer’s instructions. DNAs were run on 0.7% agarose gel (UltraPure Agarose Powder, Invitrogen) in 1X TAE buffer to identify high-molecular weight genomic DNA (>20Kb single band) and photographed using AlphaImager EC system (Biosciences, Inc., Santa Clara, CA 95051). DNA was quantified using Quant-iT PicoGreen Kit (Invitrogen). Whole genome sequencing

Author Manuscript

10 ug DNA was sent to CGI (Complete Genomics Incorporation) for whole genome sequencing. CGI delivered data for structural variations, DNA copy number variations, and single nucleotide variations. A detailed description of whole genome sequencing data can be found in the company’s website http://www.completegenomics.com/. Summary of the WGS data are described in Supplementary Table 2. We focused on somatic alterations, i.e., tumorspecific changes present in tumor but absent in blood. We used cgatools, CallDiff, and JunctionsDiff, to generate somatic single nucleotide variants (SNVs) and somatic structural variations (SVs), respectively. Somatic copy number alterations (CNAs) data came from CGI reports. These three types of somatic changes for each patient are shown in Circos plots in Figure 1. Genotyping with the Affymetrix SNP5.0 array

Author Manuscript

We performed genotyping experiments for 12 samples using the Affymetrix GeneChip Human Mapping 5.0 arrays. Briefly, 250 ng of DNA was digested with Nsp I or Sty I, ligated to an adaptor, and amplified by PCR. PCR products were processed for fragmentation and labeling; labeled DNA was hybridized onto the chips. The chip was scanned with the Affymetrix GeneChip Scanner 7G Plus using Affymetrix GeneChip Command Console software, and the data files were automatically generated. Genotype calls were generated by Genotyping Console v4.1 software (Affymetrix). SNP calls generated by the SNP array and whole-genome sequencing agreed very well, with a median of 99.65% of

Cancer Res. Author manuscript; available in PMC 2017 April 01.

Hu et al.

Page 5

Author Manuscript

the time across all 12 genomes (Supplementary Table 3). We noted that agreement was slightly lower in tumors (median 98.85%); this reduced concordance was due to genomic regions containing SVs and CNAs. Concordances for CC0996, EC1475, and EC8413 were below 99%. GEO accession number for the SNP array is GSE43470. PCR and Sanger sequencing

Author Manuscript Author Manuscript

To validate somatic single nucleotide variants (SNVs) and SVs identified from the whole genome sequencing data, we used PCR to amplify genomic DNA spanning SNVs or SV junctions. In addition, we also amplified SV junction sequences from cDNAs. Target regions spanning SNVs and SVs were amplified by PCR using genomic DNA or cDNA as a template. The primers were designed using Primer3 (http://www.broadinstitute.org/ genome_software/other/primer3.html), and they are summarized in Supplementary Tables 7–8. PCR products were purified using QiaQuick PCR purification kit (Qiagen), and sequenced using ABI BigDye Terminator BDT 3.1 (Applied Biosystems). Sequencing reactions were carried out at 96°C for 10 sec, 50°C for 5 sec, and 60°C for 2 min for 25 cycles, and the reaction product was analyzed in 3730 XL DNA Analyzer (Applied Biosystems). We carried out validation for a subset of somatic missense mutations in subjects CC0996 and EC0379 using Sanger sequencing. High quality sequencing data were obtained for 62 mutations from CC0996; 51 (82%) validated and 11 did not. For EC0379, we examined 30 mutations, and 17 (63%) were validated. Mutations called by WGS but not validated by Sanger sequencing often had a low fraction of the mutant allele (below 15% of total sequence reads). The discrepancy between the two methods is partly attributable to the stringent criteria of mutation calling we used for the Sanger sequencing data, which usually requires a minimal of 15% of the mutant allele. Validated mutations are summarized in Supplementary Table 4. Comparison of somatic mutation call using blood control vs. adjacent normal tissue control

Author Manuscript

In addition to blood DNA controls, we had whole genome sequence data for six adjacent normal tissues. We compared somatic variant call using blood DNA as control vs. adjacent normal tissue as control. Two tumors with very low somatic mutation rates were excluded from this analysis because they had higher false positive calls. We found that the concordance between blood DNA controls vs. adjacent normal tissue controls ranged from 86% to 95%. The tumors with higher somatic mutation rates also had higher concordance. This is also consistent with the validation data from Sanger sequencing described in the previous section.

Data Analysis Whole genome sequence analysis We used cgatools (http://www.completegenomics.com/sequence-data/cgatools/) and custombuilt Perl scripts to analyze whole genome sequencing data. The calldiff and junctiondiff (cgatools) were used to identify somatic SNVs and SVs, respectively; the junctions2events

Cancer Res. Author manuscript; available in PMC 2017 April 01.

Hu et al.

Page 6

Author Manuscript

method (cgatools) was used to identify genomic events such as deletions, duplications, inversions, and complex events that underlie observed SV junction sequences. The Circos plots were generated as described (http://circos.ca/). Ingenuity Variant Analysis

Author Manuscript

For functional intepretation of somatic mutations, we used Ingenuity Variant Analysis (IVA). IVA uses a cascade of filters to identify potential cancer driver mutations; these filters consist of selecting variants based on common variants database (1000 genome, CGI database, and NHLBI ESP exomes), predicted deleterious effects (literature, SIFT, phyloP, etc), genetic analysis (gain or loss function, number of alleles, absence of variant in the control), and cancer driver variants (literature, COSMIC, and TCGA). (for details see http:// www.ingenuity.com/products/ingenuity_variant_analysis.html). We used the following cascade of filters to identify potential cancer driver mutations: (1) Selection of variants based on common variants database (1000 genome, CGI database, and NHLBI ESP exomes); (2) Selection of variants based on predicted deleterious effects (literature, SIFT, phyloP, etc). Variants were selected based on phenotypes (pathogenic or unknown significance), gain of function (literature, Ingenuity knowledge base, or BSIFT), or loss of function (non-synonymous, damaging by SIFT, splicing sites, or affecting microRNA); (3) Selection of variants based on genetic analysis (gain or loss of function, number of alleles, absence of variant in the control). We excluded variants found in any blood sample. We included homozygous, hemizygous, or compound heterozygous; (4) Selection of variants were based on cancer driver variants. Miscellaneous analyses

Author Manuscript

All statistical analyses and plots were generated using the R environment, and we used shell scripts and Perl scripts for general bioinformatics data processing.

Results We analyzed 30 genomes from DNAs isolated from two tissues (tumor and blood) from 15 patients, seven with gastric cardia adenocarcinoma (GCA), four with gastric noncardia adenocarcinoma (GNCA), and four with ESCC, by whole genome sequencing (WGS), generated by the Complete Genomics Inc (CGI). Patient clinical data are described in Supplementary Table 1. Total sequences generated for each genome ranged from 193.7 Gb to 363.4 Gb (median = 321.7 Gb). Details of our WGS data for each genome are described in Supplementary Table 2.

Author Manuscript

We focused on identification and characterization of somatic changes, i.e., changes in tumors that were absent in matched germline (blood) DNA. Figure 1 shows the Circos plots of three types of somatic changes of interest (SNVs, SVs, and CNAs) for the 15 cancer genomes Gastric cancer, in particular GNCA, had more structural alterations than ESCC.

Characterization of somatic single nucleotide substitutions Figure 2a shows somatic mutation rates per million bases. Mutation rates ranged between 12.7 and 70.9 per million bases (median=33.4) for intergenic regions, between 10.2 and 41.5

Cancer Res. Author manuscript; available in PMC 2017 April 01.

Hu et al.

Page 7

Author Manuscript

(median=21.7) for introns, and between 9.4 and 27.8 (median=17.3) for exons. These mutation rates were similar to those reported for colon cancer (22), pancreatic cancer (23), liver cancer (24) gastric cancer (12, 13), and esophageal cancer (14). Mutation rates were highest in intergenic regions, followed by introns and exons. This is consistent with the role of a transcription-coupled repair process in reducing intragenic mutations. The coding mutation rates ranged from 3.9 to 14.0 per million bases (median=8.6) for missense mutations, and from 0.13 to 0.64 (median=0.36) for nonsense mutations. The mutation rates of synonymous, missense, nonsense, and other changes are summarized in Figure 2b.

Author Manuscript

We noted that BC5439, CC1649, CC1730, and EC1475 had much higher mutation rates, exceeding 40 million per Mb in intergenic regions. To investigate a potential mutagenic mechanism for the higher mutation rates, we calculated mutation rates for each type of single nucleotide substitution (Figure 2c). Transition rates, A>G and C>T, were high as expected. However, we saw high A>C mutation rates in six of the 11 gastric cancers. This increased A>C mutation was also observed in several recent studies involving EAC (14, 16) and gastric cancer (17). We also calculated the nucleotide substitution rates with their dependency on the flanking nucleotide sequences, and displayed the result in a heatmap (Figure 2d). We found that the A>C mutation showed a preference of nucleotide sequence of A located 5 prime to the mutation (Figure 2d).

Characterization of cancer driver mutations

Author Manuscript

We characterized somatic mutations to identify potential cancer driver mutations by analyzing our WGS data using Ingenuity Variant Analysis (IVA). Our initial preliminary results from IVA analysis identified 200 genes. We further filtered the genes by requiring that they be mutated in at least three tumors. This resulted in a final list of 24 genes. Tumors with mutations in the 24 genes are illustrated in Figure 3. Half of these genes are wellknown cancer genes, including TP53, JAK3, BRCA2, FGF2, FBXW7, MSH3, PTCH, NF1, ERBB2, and CHEK2. We also identified multiple potentially novel cancer genes, including KISS1R, AMH, MNX1, WNK2, and PRKRIR. The full list of the 200 genes is summarized in Supplementary Table 5. This list is enriched for genes that affect diseases of the stomach (Supplementary Table 6).

Characterization of DNA structural variations

Author Manuscript

Compared to exome sequencing, an advantage of WGS is its ability to identify structural variations (SVs). We focused our analysis on tumor-specific SVs (somatic events), and they are summarized in Figure 4. Figure 4 depicts the numbers of SVs per tumor summarized with respect to gene location (Figure 4a), relation to repeats (Figure 4b), interchromosomal vs. intrachromosomal (Figure 4c), and genetic events (Figure 4d), and recurrence in multiple tumors (Figure 4e). Note that here the complex type also contained some interchromosomal SVs. We noted a wide range of SVs across the tumors. Such variation in frequency of SVs among different tumors is consistent with the presence of two types of cancers: high and low genomic instability; an observation we previously reported in another series of ESCC tumors from this same high-risk region evaluated by LOH and CNV (25).

Cancer Res. Author manuscript; available in PMC 2017 April 01.

Hu et al.

Page 8

Author Manuscript

To validate SVs identified from the WGS data, we selected SVs with junction sequences derived from two different genes, but without containing DNA repeats. We successfully PCR-amplified 12 SV junction fragments and sequenced all of them by the Sanger method. All 12 SV junction sequences were confirmed, i.e., they are identical to the WGS data (data not shown).

Author Manuscript

SVs that recur in multiple cases are of greatest interest. The genes across breakpoints are summarized in terms of frequency of tumors affected (Figure 4e). Those affecting at least 5 tumors are also shown (Figure 4e), and there are 14 of those genes with SVs occurring in at least 5 tumors. The details of SVs (only deletions are shown here that span exonic regions) are shown for MACROD2, FHIT, and PARK2 (Figure 5). Four tumors deleted exon 6 of MACROD2 (NM_080676), which removed amino acids 140–180 and also caused frameshift (Figure 5a). The change likely resulted in a loss-function mutation of MACROD2. Seven tumors deleted exon 5 of FHIT (NM_002012), which removed the first 35 amino acids and also shifted the reading frame (Figure 5b). The deletion likely inactivated the protein. The locations and sizes of PARK2 deletions varied (Figure 5c). Two removed exons 3 and 4 (NM_004562), two removed exon 2, one deleted exon 4, and one deleted exons 2–4. The latter three deletions also caused reading frame shift. These deletions likely also inactivated the protein.

Author Manuscript

To seek additional supporting evidence for these structural variations, we analyzed 39 tumors from the TCGA gastric cancer dataset that had high coverage whole-genome sequencing data, which is the subset of the 441 STAD tumors with copy number data that will be discussed below. We used BreakDancer to identify structural variations. All three genes, MACROD2, FHIT, and PARK2, contained deletions involving coding exons with 16, 9, and 6 tumors having deletions in MACROD2, FHIT, and PARK2, respectively. Since these structural variations are caused by deletions, we reviewed the deletion analysis reports from TCGA gastric cancer data performed by the Broad GDAC team, which has 441 STAD tumor samples analyzed. FHIT, MACROD2, and PARK2 were in the 6th, 7th, and 12th most significantly deleted regions (http://gdac.broadinstitute.org/runs/analyses__2014_10_17/ reports/cancer/STAD-TP/CopyNumber_Gistic2/nozzle.html).

Discussion

Author Manuscript

Whole genome sequencing (WGS) data provide a unique opportunity to combine single nucleotide variations (SNVs) with large structural variations (SVs) and to perform an integrated analysis. There are three major findings from this study. First, A>C mutations were common in GCA and GNCA. In addition to the 5 prime A noted in previous studies of esophageal adenocarcinoma and gastric adenocarcinoma (14, 16, 17), we found enrichment of 5 prime T, which was weakly associated with ESCC. Second, we identified 24 driver mutations, including a subset that have not been previously reported. Third, we identified recurrent chromosome alterations that occurred in at least 30% of tumors in 14 genes, including CAMK1D, MACROD2, ANKRD30BL, FHIT, KCNB2, and PARK2. A>C mutation rates are often low (range 2.6–6.6% of all mutations) (23). However, a recent exome sequencing study of esophageal cancer found higher A>C mutation rates in EAC than

Cancer Res. Author manuscript; available in PMC 2017 April 01.

Hu et al.

Page 9

Author Manuscript

in ESCC (14). Another recent study of EAC also found a high A>C mutation rate, particularly with the 5 prime A (16). Our study extended this observation to gastric adenocarcinoma. We found high A>C mutations in both GCA and GNCA. In addition, we found the enrichment of A>C mutations with the prime T in ESCC. The presence of A>C mutations in GCA, GNCA, and ESCC suggest the oxidation of guanine as a potential mutagen for gastric cancer and ESCC. In cancer cells under high oxidative stress, 8-oxodGTP accumulating in the DNA can result in G>T, which is equivalent to C>A. It is conceivable that a similar mechanism may contribute to the observed high A>C mutation rate.

Author Manuscript

A major interest of this study was to identify recurrent SVs. We found 14 genes with SVs occurring in at least 5 tumors (Figure 4). The most frequent SVs appeared as deletions. Deletions that removed coding sequences were of greatest interest. Three examples of these genes are shown in Figure 5. These deletions were clustered in small genomic regions. In the case of MACROD2, exon 6 of MACROD2 (NM_080676), spanning amino acids 140–180, was deleted. The deletion also caused a frame-shift. Similarly, exon 5 of FHIT (NM_002012), containing the first 35 amino acids, was deleted in multiple tumors. The mutation also shifted the reading frame. In the case of PARK2, the deletions were more heterogeneous, and varied from deletions involving one to three exons of exons 2, 3, and 4 (NM_004562). These deletions were very frequent, affecting about 50% of gastric cancer samples. They were supported by deletion events observed in gastric tumors from the TCGA dataset. It will be interesting to develop deletion-specific assays to screen a large number of gastric cancers and to associate the deletions with clinical phenotypes.

Author Manuscript

Our studies provide new insights into understanding the genomic landscape, genome instability, and mutation mechanisms for developing gastric cancer and ESCC. A limitation of the current study is its small sample size. Future studies should include validation of these findings in a larger panel of tumors that are selected from multiple tumor types and use combinations of exome sequencing, RNA-seq analysis, and target-resequencing for both SNVs and SVs.

Supplementary Material Refer to Web version on PubMed Central for supplementary material.

Acknowledgments This research was supported by the Intramural Research Program of the NIH and the National Cancer Institute, Division of Cancer Epidemiology and Genetics (DCEG) and Center for Cancer Research (CCR).

Author Manuscript

References 1. Ferlay J, Shin HR, Bray F, Forman D, Mathers C, Parkin DM. Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008. Int J Cancer. 2010; 127(12):2893–917. Epub 2011/02/26. DOI: 10.1002/ijc.25516 [PubMed: 21351269] 2. Ke L. Mortality and incidence trends from esophagus cancer in selected geographic areas of China circa 1970–90. Int J Cancer. 2002; 102(3):271–4. Epub 2002/10/25. DOI: 10.1002/ijc.10706 [PubMed: 12397650]

Cancer Res. Author manuscript; available in PMC 2017 April 01.

Hu et al.

Page 10

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

3. Li JY. Epidemiology of esophageal cancer in China. Natl Cancer Inst Monogr. 1982; 62:113–20. Epub 1982/01/01. [PubMed: 7167171] 4. Liu SF, Shen Q, Dawsey SM, Wang GQ, Nieberg RK, Wang ZY, et al. Esophageal balloon cytology and subsequent risk of esophageal and gastric-cardia cancer in a high-risk Chinese population. Int J Cancer. 1994; 57(6):775–80. Epub 1994/06/15. [PubMed: 8206671] 5. Abnet CC, Freedman ND, Hu N, Wang Z, Yu K, Shu XO, et al. A shared susceptibility locus in PLCE1 at 10q23 for gastric adenocarcinoma and esophageal squamous cell carcinoma. Nat Genet. 2010; 42(9):764–7. Epub 2010/08/24. doi: 10.1038/ng.649 ng.649 [pii]. [PubMed: 20729852] 6. Parsons DW, Jones S, Zhang X, Lin JC, Leary RJ, Angenendt P, et al. An integrated genomic analysis of human glioblastoma multiforme. Science. 2008; 321(5897):1807–12. Epub 2008/09/06. doi: 10.1126/science.1164382 1164382 [pii]. [PubMed: 18772396] 7. Yan H, Parsons DW, Jin G, McLendon R, Rasheed BA, Yuan W, et al. IDH1 and IDH2 mutations in gliomas. N Engl J Med. 2009; 360(8):765–73. Epub 2009/02/21. doi: 10.1056/NEJMoa0808710 360/8/765 [pii]. [PubMed: 19228619] 8. Mardis ER, Ding L, Dooling DJ, Larson DE, McLellan MD, Chen K, et al. Recurring mutations found by sequencing an acute myeloid leukemia genome. N Engl J Med. 2009; 361(11):1058–66. Epub 2009/08/07. doi: 10.1056/NEJMoa0903840 NEJMoa0903840 [pii]. [PubMed: 19657110] 9. Noushmehr H, Weisenberger DJ, Diefes K, Phillips HS, Pujara K, Berman BP, et al. Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. Cancer Cell. 2010; 17(5):510–22. Epub 2010/04/20. doi: 10.1016/j.ccr.2010.03.017 S1535–6108(10)00108-X [pii]. [PubMed: 20399149] 10. Tomlins SA, Rhodes DR, Perner S, Dhanasekaran SM, Mehra R, Sun XW, et al. Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science. 2005; 310(5748): 644–8. Epub 2005/10/29. doi: 310/5748/644 [pii] 10.1126/science.1117679. [PubMed: 16254181] 11. Soda M, Choi YL, Enomoto M, Takada S, Yamashita Y, Ishikawa S, et al. Identification of the transforming EML4-ALK fusion gene in non-small-cell lung cancer. Nature. 2007; 448(7153): 561–6. Epub 2007/07/13. doi: nature05945 [pii] 10.1038/nature05945. [PubMed: 17625570] 12. Wang K, Kan J, Yuen ST, Shi ST, Chu KM, Law S, et al. Exome sequencing identifies frequent mutation of ARID1A in molecular subtypes of gastric cancer. Nat Genet. 2011; 43(12):1219–23. Epub 2011/11/01. doi: 10.1038/ng.982 ng.982 [pii]. [PubMed: 22037554] 13. Zang ZJ, Cutcutache I, Poon SL, Zhang SL, McPherson JR, Tao J, et al. Exome sequencing of gastric adenocarcinoma identifies recurrent somatic mutations in cell adhesion and chromatin remodeling genes. Nat Genet. 2012; 44(5):570–4. Epub 2012/04/10. doi: 10.1038/ng.2246 ng.2246 [pii]. [PubMed: 22484628] 14. Agrawal N, Jiao Y, Bettegowda C, Hutfless SM, Wang Y, David S, et al. Comparative genomic analysis of esophageal adenocarcinoma and squamous cell carcinoma. Cancer Discov. 2012; 2(10): 899–905. Epub 2012/08/11. doi: 10.1158/2159-8290.CD-12-0189 2159-8290.CD-12-0189 [pii]. [PubMed: 22877736] 15. Nagarajan N, Bertrand D, Hillmer AM, Zang ZJ, Yao F, Jacques PE, et al. Whole-genome reconstruction and mutational signatures in gastric cancer. Genome biology. 2012; 13(12):R115. Epub 2012/12/15. doi: 10.1186/gb-2012-13-12-r115 [PubMed: 23237666] 16. Dulak AM, Stojanov P, Peng S, Lawrence MS, Fox C, Stewart C, et al. Exome and whole-genome sequencing of esophageal adenocarcinoma identifies recurrent driver events and mutational complexity. Nat Genet. 2013; 45(5):478–86. Epub 2013/03/26. DOI: 10.1038/ng.2591 [PubMed: 23525077] 17. Wang K, Yuen ST, Xu J, Lee SP, Yan HH, Shi ST, et al. Whole-genome sequencing and comprehensive molecular profiling identify new driver mutations in gastric cancer. Nat Genet. 2014; 46(6):573–82. Epub 2014/05/13. DOI: 10.1038/ng.2983 [PubMed: 24816253] 18. Song Y, Li L, Ou Y, Gao Z, Li E, Li X, et al. Identification of genomic alterations in oesophageal squamous cell cancer. Nature. 2014; 509(7498):91–5. Epub 2014/03/29. DOI: 10.1038/ nature13176 [PubMed: 24670651] 19. Lin DC, Hao JJ, Nagata Y, Xu L, Shang L, Meng X, et al. Genomic and molecular characterization of esophageal squamous cell carcinoma. Nat Genet. 2014; 46(5):467–73. Epub 2014/04/02. DOI: 10.1038/ng.2935 [PubMed: 24686850]

Cancer Res. Author manuscript; available in PMC 2017 April 01.

Hu et al.

Page 11

Author Manuscript Author Manuscript

20. Gao YB, Chen ZL, Li JG, Hu XD, Shi XJ, Sun ZM, et al. Genetic landscape of esophageal squamous cell carcinoma. Nat Genet. 2014; 46(10):1097–102. Epub 2014/08/26. DOI: 10.1038/ng. 3076 [PubMed: 25151357] 21. Zhang L, Zhou Y, Cheng C, Cui H, Cheng L, Kong P, et al. Genomic analyses reveal mutational signatures and frequently altered genes in esophageal squamous cell carcinoma. Am J Hum Genet. 2015; 96(4):597–611. Epub 2015/04/04. DOI: 10.1016/j.ajhg.2015.02.017 [PubMed: 25839328] 22. Bass AJ, Lawrence MS, Brace LE, Ramos AH, Drier Y, Cibulskis K, et al. Genomic sequencing of colorectal adenocarcinomas identifies a recurrent VTI1A-TCF7L2 fusion. Nature genetics. 2011; 43(10):964–8. Epub 2011/09/06. DOI: 10.1038/ng.936 [PubMed: 21892161] 23. Jones S, Zhang X, Parsons DW, Lin JC, Leary RJ, Angenendt P, et al. Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science. 2008; 321(5897):1801–6. Epub 2008/09/06. DOI: 10.1126/science.1164368 [PubMed: 18772397] 24. Li M, Zhao H, Zhang X, Wood LD, Anders RA, Choti MA, et al. Inactivating mutations of the chromatin remodeling gene ARID2 in hepatocellular carcinoma. Nature genetics. 2011; 43(9): 828–9. Epub 2011/08/09. DOI: 10.1038/ng.903 [PubMed: 21822264] 25. Hu N, Wang C, Ng D, Clifford R, Yang HH, Tang ZZ, et al. Genomic characterization of esophageal squamous cell carcinoma from a high-risk population in China. Cancer Res. 2009; 69(14):5908–17. Epub 2009/07/09. doi: 10.1158/0008-5472.CAN-08-4622 0008-5472.CAN-08-4622 [pii]. [PubMed: 19584285]

Author Manuscript Author Manuscript Cancer Res. Author manuscript; available in PMC 2017 April 01.

Hu et al.

Page 12

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

Figure 1.

Circos plots of somatic single nucleotide variants, copy number alterations, and structural variations in the 15 cancer genomes. The inner ring displays SVs: black for intrachromosomal SVs and red for interchromosomal SVs. The 2nd ring next to SVs is CNA shown in grey. The 3rd ring is SNV shown in green. The outside ring is the chromosome ideogram. The sample description can be found in Supplementary Table 1.

Cancer Res. Author manuscript; available in PMC 2017 April 01.

Hu et al.

Page 13

Author Manuscript Author Manuscript Author Manuscript

Figure 2.

Author Manuscript

Summary of somatic mutations and mutation spectrum in GNCA, GCA, and ESCC genomes and non-negative matrix factorization (NMF) analysis of SNV substitution matrix. Somatic SNV are summarized in Figure 2a–b. The numbers of somatic mutations per million bases in intergenic, intronic, and exonic regions of GCA, GNCA, and ESCC genomes are illustrated with bar graphs. Here somatic mutations include single nucleotide variations and small indels. (a) The graph shows mutations in intergenic, intronic, and exonic regions whereas. (b) The graphs shows mutations of various types of amino-acid changes. SNV substitution patterns are summarized in Figure 2c–d. (c) The graph shows the numbers of somatic mutations for the six types of base substitutions are summarized on the left. We also consider the context of 5 prime base and 3 prime base; there are 96 possible combination (4 times 6 times 4). (d) The heatmap shows the results for the 15 tumors.

Cancer Res. Author manuscript; available in PMC 2017 April 01.

Hu et al.

Page 14

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

Figure 3.

Summary of potential cancer driver mutations. The potential driver mutations were identified using Ingenuity Variant Analysis (IVA). We identified 24 genes that were frequently mutated in ESCC and gastric cancer. Mutations and affected tumors are shown in the heatmap.

Cancer Res. Author manuscript; available in PMC 2017 April 01.

Hu et al.

Page 15

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

Figure 4.

Characterization of somatic SVs in gastric cancer and ESCC genomes. (a) Bar graph of SVs with respect to whether breakpoints are located with genes or not. The count on x-axis refers to the number of SVs. (b) Bar graph of SVs with respect to whether breakpoints are located with repeats or not. (c) Bar graph of interchromosomal versus intrachromosomal SVs. (d) Bar graph of SVs classified by the processes that generated these SVs. (e) Bar graph of recurrent SVs. Sample count refers to the number of tumors affected

Cancer Res. Author manuscript; available in PMC 2017 April 01.

Hu et al.

Page 16

Author Manuscript

by the SVs, and gene count refers to the number of genes that showed recurrent SVs for a given number of sample count.

Author Manuscript Author Manuscript Author Manuscript Cancer Res. Author manuscript; available in PMC 2017 April 01.

Hu et al.

Page 17

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

Figure 5.

IGV views of structural changes of recurrent SVs. IGV views for MACROD2, FHIT, and PARK2. Blue rectangles are exons. Red rectangles are deleted regions, and the number below the box refers to SV id. Tumors are labeled on the left. (a) MACROD2; (b) FHIT; (c) PARK2

Cancer Res. Author manuscript; available in PMC 2017 April 01.

Genomic Landscape of Somatic Alterations in Esophageal Squamous Cell Carcinoma and Gastric Cancer.

Gastric cancer and esophageal cancer are the second and sixth leading causes of cancer-related death worldwide. Multiple genomic alterations underlyin...
680KB Sizes 0 Downloads 10 Views