MEETING REPORT OFFICIAL JOURNAL
Germline & Somatic Mosaicism: The 2014 Annual Scientific Meeting of the Human Genome Variation Society www.hgvs.org
William S. Oetting, Marc S. Greenblatt, Anthony J. Brookes, Rachel Karchin, and Sean D. Mooney 1
Department of Experimental and Clinical Pharmacology, University of Minnesota, Minneapolis, Minnesota; 2 Department of Medicine, University of Vermont, Burlington, Vermont; 3 University of Leicester, Leicester, United Kingdom; 4 Departments of Biomedical Engineering/Oncology and Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland; 5 Buck Institute for Research on Aging, Novato, California
Communicated by Mark H. Paalman Received 8 December 2014; accepted revised manuscript 10 January 2015. Published online 19 January 2015 in Wiley Online Library (www.wiley.com/humanmutation). DOI: 10.1002/humu.22757
C 2015 Wiley Periodicals, Inc. Hum Mutat 36:390–393, 2015.
As genetic sequencing technologies continue to improve, researchers are progressively unraveling the cause and effect of germline variants on the human phenotype, disease, disease progression and efficacy of treatments. However, questions are now being asked as to the involvement of somatic variants on disease risk and how these variants can lead toward disease and aging. This includes determining their prevalence and distribution in what appears to be phenotypically normal tissues. The 2014 annual scientific meeting of the Human Genome Variation Society (HGVS; http://www.hgvs.org) was held on the 18th of October in San Diego, CA, to present research that is attempting to answer some of these questions. The theme of the meeting was “Germline & Somatic Mosaicism”. In this meeting, we focused on technologies that enable analysis of germ line and somatic genetic differences, and on efforts to characterize their distribution, their association with human disease and clinical interpretation. The presentations illustrate the power of the marriage of wholesale genomics and retail genetics. Marc Greenblatt, of the University of Vermont, opened the meeting and chaired the first scientific session. The first presentation was by Laura Conlin from the Children’s Hospital of Philadelphia, University of Pennsylvania, who spoke on “Somatic mosaicism and uniparental disomy” (UPD). UPD is present when all, or part of, a chromosome pair is from a single parent. UPD is detected by analyzing genomic DNA using a single-nucleotide polymorphism (SNP) array and observing a lack of heterozygosity in an extended haplotype of SNPs or a lack of alleles from one parent. UPD is particularly critical in areas that exhibit imprinting such as Prader– Willi syndrome, in which UPD can include all of chromosome 15, or Beckwith–Wiedemann syndrome, which usually involves segmental UPD in chromosome 11. UPD is not always associated with disease. In some cases, 100% UPD is a rescue of aneuploidy (e.g., postzygotic rescue of monosomy or trisomy). This can be observed in some cases of hidden trisomy where there is placental mosaicism but whole-blood DNA shows 100% UPD for the chromosome. UPD is identified as a mosaic in 17% of samples tested by diagnostic laboratories. As low as 5%–10% somatic mosaicism can be detected in a tissue sample. Some common UPD phenotypes include hearing
Correspondence to: William S. Oetting, Department of Experimental and Clinical
Pharmacology MMC 485; 420 Delaware, St. S.E. University of Minnesota, Minneapolis, MN 55455. E-mail: [email protected]
impairment and tumor formation. UPD can also result in recessive disorders being unmasked if the chromosome or chromosome segment associated with UPD contains the disease variant. Issues in detection include determining if loss of heterozygosity (LoH) is due to UPD or a partial deletion of a chromosome. The second talk of this session was presented by Anne Goriely from the Weatherall Institute of Molecular Medicine, University of Oxford. Her topic was entitled “Selfish mosaicism and human disease: impact of somatic mutations occurring in the paternal germline.” There are a few spontaneous dominant disorders that are associated with a very high-mutation rate at specific nucleotides such as Apert (FGFR2) and Costello (HRAS) syndromes and achondroplasia (FGFR3). The de novo mutation rate appears to be approximately 1,000-fold higher than the background mutation rate. In the case of Apert syndrome, either one of two transversions in FGFR2 resulting in p.Ser252Trp or p.Pro253Arg is found in >99% of affected individuals. The mechanism for this apparent high-mutation rate is best explained by a process termed “Selfish selection.” It was found the FGFR2 c.755C>G mutation levels (associated with p.Ser252Trp) were much higher in sperm than blood samples and increased with the age of the donor. This mutation that encodes a known gain of function FGFR2 protein gives a selective advantage to the mutant spermatogonial stem cells, leading to their clonal expansion over the course of time and as a consequence an increased in mutant sperm. It is proposed that the dysregulation of the growth factor receptor-RAS signaling pathway, which is a key regulator of stem cell homeostasis in the testis, mediates this phenomenon and provides an explanation for the paternal age effect associated with these disorders. Selfish mosaicism is a neoplastic process occurring in the testes of all men as they age and can cause testicular tumors as well as an increased risk of transmission of mutant sperm to the next generation. This mechanism may also be relevant to cancer predisposition syndromes and the genetics of complex disorders such as neurodevelopmental disorders. Isaac Nijman of the University Medical Centre Utrecht gave the next talk “Myeloid-lineage-restricted mosaicism of NLRP3 mutations in variant Schnitzler’s syndrome.” Schnitzler syndrome is a rare autoinflammatory syndrome with an average age of onset of 56 years, which results in a monoclonal gammopathy with rash, hives, and bone pain. The pathophysiology is unknown and has been hypothesized to be an acquired disease. In an effort to determine whether there is a genetic predisposition, whole-exome sequencing (WES) was performed on several unrelated individuals with Schnitzler’s syndrome and two individuals were found to have mutations in the NOD-like receptor family, pyrin domain containing 3 (NLRP3) gene. Mutations of NLRP3 are associated with the autosomal-dominant disease cryopyrin-associated periodic C
2015 WILEY PERIODICALS, INC.
syndrome (CAPS). Mutations of NLRP3 activate the inflammasome, which in turn triggers IL-1-beta activation, resulting in systemic inflammation. The Schnitzler syndrome phenotype overlaps with CAPS. Unexpectedly, mutations were restricted to granulocytes and monocytes. Though there is a lack of family history, Schnitzler syndrome does appear to have a genetic cause and is the first report of a myeoloid-lineage-restricted mosaicism in a nonmalignant disorder. Mutations in genes associated with severe genetic disorders can result in enigmatic diseases when present as a mosaic. Clonal expansion of cells with causative mutations is proposed as one mechanism for sporadic late-onset diseases including autoimmune diseases. Aashish Adhikari from the University of California, Berkeley spoke on “Nijmegen breakage syndrome detected by newborn screening for T-cell receptor excision circles (TRECs).” Newborn screening to detect genetic diseases for intervention has been ongoing for many decades, with the list of disorders being screened increasing over time. Newborn screening for severe combined immunodeficiency (SCID) due to low T- and B-cell counts (lymphocytopenia) uses quantitation of cell receptor excision circles (TREC) as an indication for disease. A newborn with a low TREC count and a confirmatory low T-cell count was analyzed with targeted sequencing of known SCID-associated genes but no pathogenic mutations were identified. WES showed the affected individual to be a compound heterozygote for two nonsense mutations in the nibrin (NBN) gene. The nibrin protein (NBS1) is involved in the repair of double-strand breaks and loss-of-function mutations typically result in Nijmegen breakage syndrome. Western blotting showed no detectable protein and the patient cells lines were radiosensitive as seen in the colony survival assay. In this case, the patient was given an early diagnosis of Nijmegen breakage syndrome. Given the typical age of diagnosis is 4 years, an even earlier diagnosis can allow the patient to avoid certain environmental stresses that cause DNA damage, resulting in a reduction in the severity of the disease. The final talk for this session was by Alan Scott from the Johns Hopkins University School of Medicine who spoke on “Methods for analysis and interpretation of an aggressive small cell prostate cancer.” Small cell prostate cancer (SCPC) is a rare tumor with a poor prognosis and no successful therapy. The mutational landscape of SCPC was unknown. High-density genotyping arrays were used to find regions with LoH and abnormal copy number, whereas WES identified causative variants and measured exon copy number more accurately. The tumor suppressors TP53, RB1, and CHD1 occurred in blocks of LoH and somatic mutations were found in TP53, RB1, FOXA1, and CCAR1 among others. Several software tools were used to evaluate both the SNVs and indels and manual inspection of the sequence reads using IGV was performed in comparison to the normal sample to confirm that the mutations were true. While the array was better at assessing the overall genomic landscape, WES provided a more accurate measure of copy number for specific exons and identified a likely complete deletion of CDKN1B in the tumor. Counting sequence reads of the normal and somatic mutations (the allelic fraction) showed that the tumor was at least 95% pure and that there was considerable evidence for tumor heterogeneity. While the total number of mutations found was relatively small (SNVs = 3,577; indels = 2,290), as seen in other “C” class tumors with chromosomal instability, they clustered in the key cancer drivers cited above. The investigators also used a variety of interpretation tools to score genes with mutations as known cancer drivers, possible contributors, or likely passenger mutations. Studies of additional cases using similar methods will be needed to better understand the importance of the results and to design better strategies for treatment of SCPC. The second scientific session was chaired by Anthony Brookes from the University of Leicester. To start this session, Thierry
Voet from the University of Leuven and Wellcome Trust Sanger Institute spoke on “Single-cell genomics reveals genetic heterogeneity in health and disease.” Conventional genome sequencing is done using DNA isolated from a large number of cells. Sequencing from a single cell is much more challenging, but allows the discovery of genetic variation in single cells that is otherwise obscured by bulk DNA or RNA analyses. This is particularly important to understand (functional) genetic heterogeneity and processes of DNA mutation in normal and diseased tissues, for example, cancers. The human genome is classically considered stable throughout normal development and to have only a small probability of acquiring genetic mutations with every cell cycle and division. However, very little is known about the true rate at which somatic mutation occurs, its different natures, and how it varies from cell type to cell type. Mainly by hunting for de novo DNA copy-number variants (CNVs) on the single-cell level, the group has generated considerable evidence indicating that particular cell types during human development, from the earliest stages onwards, acquire numerical and structural chromosomal anomalies at frequencies much higher than classical textbook numbers. By single-cell copy-number profiling of 8-cell human cleavage stage embryos, they have shown that chromosomes are miss-segregated in up to 83% of the embryos, and structural rearrangements are acquired in up to 70% of the embryos following in vitro fertilization. Signatures of chromosome instability in in vivo human conceptions have also been discovered. This chromosome instability during embryogenesis not necessarily undermines normal human development, but may lead to a spectrum of conditions, including loss of conception, genetic disease, and genetic variation development. It is important to note that during early embryogenesis, there can be chromosomal abnormalities or variants acquired in cells that are later selected against, if the mutation is incompatible with survival. In contrast, if the mutant cell is able to contribute to the inner cell mass of the embryo, this may lead to a genetic mosaic or even a constitutional genetic aberration in the individual. Additionally, it was shown that de novo mutations continue to occur during development through subsequent cell divisions. Single-cell genomics is also paving the way for novel clinical applications, for instance, for genetic diagnosis of human preimplantation embryos genome wide. The final talk for this session was by August Huang of the National Institute of Biological Sciences, Beijing, who spoke on “Postzygotic single-nucleotide mosaicisms in whole-genome sequences of clinically unremarkable individuals.” As the technology for WGS of somatic cells becomes available to more investigators, there is an increasing interest in the role of postzygotic single-nucleotide mosaicism (pSNM), especially in regards to mutations in diseasecausing genes found in apparently healthy individuals. In regards to this, differentiating true somatic mutations from false mutations caused by random and systematic errors in read sampling, alignment and base calling is a growing bioinformatics challenge. Errors can also occur when there is a lack of matched control tissues for healthy individuals. In this report, a computational pipeline to identify pSNMs in control-free next-generation sequencing (NGS) data was described. A Bayesian-based mutation identification algorithm was created to distinguish mosaic from germline genotypes. WGS of whole-blood DNA from three healthy individuals was done. Thirty-eight candidate pSNMs were identified and 17 were validated on other sequencing platforms. C to T and C to A substitutions were the most common mutations in clinically unremarkable individuals (24% each). It is important to note that some of these pSNMs affected both somatic and germline cells. In one example, two nonsynonymous pathogenic mutations in the sodium channel, voltage-gated, type I, alpha subunit (SCN1A) gene, HUMAN MUTATION, Vol. 36, No. 3, 390–393, 2015
associated with Dravet syndrome, were inherited from clinically normal parents with germline mosaicism, showing the transgenerational impact of pSNMs in this type of analysis. All individuals are genetic mosaics for disease-associated mutations. As these mutations are identified in healthy individuals, especially in genes associated with disease, the clinical importance of these mutations will need to be determined. The third session was chaired by Rachel Karchin from the Departments of Biomedical Engineering/Oncology and Institute for Computational Medicine, Johns Hopkins University. The first speaker was Paul Flicek from the European Molecular Biology Laboratory, European Bioinformatics Institute, who spoke on “Ensembl’s annotation of the new GRCh38 human genome assembly.” Many of the new genetic technologies including SNP arrays, NGS technologies and comparative genomic hybridization (CGH) require an accurate DNA reference sequence and high-quality annotation. GRCh38 is the most recent update of the human assembly from the Genome Reference Consortium (genomereference.org). Many errors in the previous assembly were corrected including 6183 SNVs, 489 insertions, and 910 deletions not seen in the 1000 Genomes Project. Additionally, many coding variants with a minor allele frequency of less than 5% were updated and more than 100 assembly gaps were closed or reduced. The assembly also includes representation of centromere sequences for the first time and 261 alternate loci from 178 different genomic regions. In Ensembl, the genome sequence has been annotated with a complete set of coding and noncoding genes as well as variant annotation from a large number of sources including citations to publications and includes the variant effect predictor (VEP) tool to annotate the effect of variants. An evidencebased view of regulatory regions (the “Ensembl Regulatory Build”) including distal enhancers and transcription factor-binding sites encompassing 600 MB of genome sequence is accessible by the VEP showing that a bioinformatics approach to determine the effect of variants in regulatory regions is becoming increasingly important. The new genome assembly and analysis tools annotations in Ensembl, including information from the Havana project, results in a robust, integrated database that will help us better understand the effects of variation on the human phenotype. As NGS results produce massive amounts of data, especially in the case of WES, there is a need to identify those coding variants that are functional. Some investigators have bioinformatics support to do this type of analysis, but this is not always the case. In these instances, a bioinformatics service may be of value. Speaking on this was Chunlei Wu of the Scripps Research Institute, La Jolla, who spoke on “MyVariant.info: community-aggregated variant annotations as a service.” There are many annotated variant resources such as dbSNP, PharmGKB, MutDB, ClinVar, and SNPedia. A major challenge is that the annotation is fragmented and incomplete between resources, contains errors and it is difficult to get investigators to contribute to these repositories. To improve on this, a platform called MyVariant.info was created. The goal of the database is to bring together existing information, from multiple sources, on genetic variants. This database contains information from multiple resources. Approximately 100 million variants from major repositories, including dbSNP, COSMIC, ClinVar, dbNSFP, MutDB, GWASSNP, and SNPedia, have been aggregated into the database. The database stores data in JSON format that allows for new fields to be added as needed. High-performance and flexible query APIs are provided for fast programmatic access. Data will be updated regularly. An important subset of genetic variants found in humans is those that influence therapeutic drug response. Courtney French of the Pharmacogenomics Research Network spoke on the “PGRN
HUMAN MUTATION, Vol. 36, No. 3, 390–393, 2015
Network-wide Project: Transcriptome Analysis of Pharmacogenes in Human Tissues.” There are several mechanisms by which drug response can differ between individuals including intrinsic differences in protein targets or drug metabolizing enzymatic activity, differences in the gene expression levels in response to the presence of the drug or differences in post-transcriptional regulation such as splicing. To better understand these differences, including differences between cell types, transcriptome profiling using RNAseq was done on 20–25 samples from different human tissues including kidney, liver, heart, and adipose tissue and also lymphoblastoid cell lines (LCLs). It was found that LCLs transcriptionally look very different than primary tissue and care should be used when using data acquired from the LCLs. Many pharmacogenes were found to be differentially expressed in different tissues, and in some cases very highly expressed in single-cell types. Additionally, expression can vary greatly between different individuals. For example, the CYP genes are generally expressed very highly only in liver, and this expression can be over 100-fold different between individuals. Differences in alternative splicing also need to be considered. Alternative splicing in HMGCR is known to correlate with statin sensitivity in cell lines, and this splicing varies greatly between different liver samples in our data. It is important to note that not only does expression differ between pharmacogenes in different tissues, but the vast majority of genes are alternatively spliced and may be as important source of individual response to drugs are genetic variants. For many years, there have been several methods for predicting the functional impact of genetic variants, but we need to determine their accuracy and make a comparison between the different methods. Steven Brenner, of the University of California, Berkeley, leads a consortium that offers a testing platform for these tools and provided an update in this talk “Findings from the Critical Assessment of Genome Interpretation (CAGI), a community experiment to evaluate phenotype prediction.” The CAGI experiment has been assessing different methods that attempt to determine the effects of genetic variation on protein functionality. CAGI also helps advance the state of the art by posing well-defined challenges to the community and helping spur collaborative efforts. This is accomplished by providing a test set of variants of experimentally measured functionality to test the accuracy of these methods. There have been three CAGI experiments since 2010, each with about 10 challenges. In one such challenge, 84 single amino acid substitutions of experimentally known functionality provided to predictors. The best result had a Spearman’s rank correlation of .66, which is highly significant but of doubtful clinical utility. CAGI ranges well beyond single genetic variants and are now testing the ability to predict risk for common disease. In one example, sequencing results from eight pairs of discordant monozygotic twins for asthma were submitted for testing to six groups. All groups were able to identify the twin pairs, but asthma predictions were no better than random. Another challenge, to match Personal Genome Project, whole genomes with trait profiles showed major advances over the years. CAGI finds that the performance of current variant prediction methods is still not ready for the clinic due to low-predictive value on any given variant despite highly significant results overall. Using results based on consensus (by combining the results from multiple predictive methods) has only marginally more support and using this approach may be misleading. It is clear that CAGI will still have an important role in testing these programs for years to come. The last session was chaired by Sean Mooney of the Buck Institute for Research on Aging. The first talk presented by Matthew Warman of the Boston Children’s Hospital was “Somatic Mosaicism in Skeletal Disorders.” Dr. Warman introduced germinal work by
Mary Lyon and Dorothea Bennett and then explained that several nonheritable skeletal diseases follow Rudolf Happle’s “rules” for genetic lethal mutations that survive due to somatic mosaicism. A recent example of this is Congenital Lipomatous Overgrowth, with Vascular, Epidermal, and Skeletal anomalies (CLOVES) syndrome, which is caused by somatic gain-of-function mutations in a catalytic subunit of PI3 kinase, PIK3CA. Remarkably, these same PIK3CA somatic mutations found in CLOVES syndrome cause other nonheritable disease phenotypes ranging from megalencephaly and cutaneous lymphatic malformation, and they are among the most common somatic mutations found in cancer. Activating a conditional Pik3ca gain-of-function allele during mouse embryonic life is sufficient to recapitulate some phenotypic features found in CLOVES syndrome; activating the mutation postnatally does not have the same effect. Additionally, although PIK3CA gain-of-function mutations are common in cancer, they appear insufficient to cause disease unless mutations affecting proto-oncogenes or tumor suppressor genes are also present. Thus, the phenotype that results from a somatic mutation likely depends upon when and in which cell type the mutation arose during development, and upon the cell’s or an organism’s genetic background. In the second talk of this session, Christos Proukakis of the University College London Institute of Neurology asked the question, “Could somatic copy number alterations contribute to sporadic Parkinson’s disease?” Parkinson disease (PD) is the second most common neurodegenerative disorder. The heritability of PD is 30%, though there are some rare Mendelian forms including mutations in the Alpha-synuclein (SNCA) and parkin RBR E3 ubiquitin protein ligase (PARK2) genes. There have been several genome-wide association study (GWAS) hits for PD, but they typically have low odds ratios. It has been hypothesized that somatic mutations, particularly in the SNCA and PARK2 genes, which are located in fragile sites, could lead to PD. Early embryo mutations in these genes could be present throughout neural tissue, but later somatic mutation events may trigger the spread of PD associated changes from regions where they are present in the brain. A somatic CNV-affecting SNCA, and L1 retrotranspositions affecting both, have been reported in healthy brains. This study has looked for somatic CNVs in different brain regions using SNP and CGH arrays. No obvious relevant mutations were identified in PD brains at this time. Analysis of this hypothesis, including use of FISH, is still ongoing. In the final talk of this session, Arijit Mukhopadyay of the CSIRInstitute of Genomics and Integrative Biology, New Delhi, spoke on
“Somatic variations in the human brain indicate ordered randomness driven by oxidative stress.” There are many known inherited genetic changes that affect the phenotype of the central nervous system (CNS), but do somatic mutations result in any CNS phenotypes in the normal brain? In this study, WES was done on DNA from the corpus callosum and frontal cortex of four individuals. The tissues were collected postmortem but the cause of death was road accident—they were otherwise healthy. In addition, from five different individuals, WES was also done on DNA from peripheral leucocytes and saliva epithelial cells. It was found that somatic G:C to T:A transversions accounted for >80% of the somatic variations in brain and the variant allele was almost exclusive to the frontal cortex. This type of transversion is mediated primarily through oxidized guanine that was present in the frontal cortex in high levels but not the corpus callosum. Additionally, the brain SNVs were enriched for nonsynonymous changes, especially asparagine to tyrosine substitutions or glutamic acid to stop codon substitutions at much higher levels than would be expected based on random mutation rates. Somatic SNVs in the frontal cortex (which contain larger numbers of neurons than other parts of the brain) were enriched in genes involved in CNS-related processes and it has been shown that as individuals age, they accumulate somatic mutations, providing a possible mechanism for the progression of CNS pathology with increased age. Investigations on the impact of genetic variation on disease have focused, for the most part, on constitutive germline mutations. We are finding that somatic mutations, found as a mosaic in the individual, are also important contributors to disease. We need to be concerned not only on their presence, but on the percentage of cells that contain the variant and in which tissues. This is just the beginning of this area of clinical research and much more needs to be done, not only on the detection and quantitation of these variants, but on their impact.
Acknowledgments The 2014 annual meeting of the HGVS was chaired by Marc Greenblatt, Anthony Brookes, Rachel Karchin, and Sean Mooney. The authors would like to thank the speakers for their help in the preparation of this report. This meeting was run in partnership with The Human Variome Project (www.humanvariomeproject.org).
HUMAN MUTATION, Vol. 36, No. 3, 390–393, 2015