TIGS-1166; No. of Pages 8

Review

Human knockout research: new horizons and opportunities Fowzan S. Alkuraya1,2 1 2

Department of Genetics, King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia Department of Anatomy and Cell Biology, College of Medicine, Alfaisal University, Riyadh, Saudi Arabia

Although numerous approaches have been pursued to understand the function of human genes, Mendelian genetics has by far provided the most compelling and medically actionable dataset. Biallelic loss-of-function (LOF) mutations are observed in the majority of autosomal recessive Mendelian disorders, representing natural human knockouts and offering a unique opportunity to study the physiological and developmental context of these genes. The restriction of such context to ‘disease’ states is artificial, however, and the recent ability to survey entire human genomes for biallelic LOF mutations has revealed a surprising landscape of knockout events in ‘healthy’ individuals, sparking interest in their role in phenotypic diversity beyond disease causation. As I discuss in this review, the potentially wide implications of human knockout research warrant increased investment and multidisciplinary collaborations to overcome existing challenges and reap its benefits. In search of a medical understanding of gene function The quest to understand the function of the human genome was the primary motive for the Human Genome Project, but more than a decade after the ‘complete’ sequencing of the human genome our knowledge of the function of its individual components remains limited. Notwithstanding the debate surrounding what constitutes the ‘functional genome’ [1–4], even the classical functional units of the genome, that is protein-coding genes, are far from being fully understood as most of their encoded proteins have no established developmental/physiological function. With nearly 20 000 protein-coding genes in the human genome, the overwhelming majority of which encode more than one protein by virtue of alternative splicing [5], assigning a function to each of these genes is clearly a daunting task. The acute requirement for high-throughput approaches to tackle this challenge catalyzed the growth of many branches of functional omics, and although a wealth of data has been generated as a result, its demonstrated medical relevance is at its nascent stages [6]. An alternative approach, founded on the premise that medically relevant genes should result in an abnormal phenotype when their function is perturbed, has provided most of the medically relevant functional annotation of Corresponding author: Alkuraya, F.S. ([email protected]). Keywords: loss of function; autozygome; evolution; druggable targets; adaptation. 0168-9525/ ß 2014 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.tig.2014.11.003

human genes [7,8]. This approach has seen its most resounding success when perturbation of a single gene is sufficient to produce a tractable phenotype, which is the basis of Mendelian genetics. With a mutation rate of 1.2  10 8 per nucleotide per generation (not including the mutation rate at the copy number level) [9], and >130 million births per year, the human population provides a vast resource to study Mendelian genes and the entire spectrum of their phenotypic effects. Combined with the recent advent of powerful genomic analytic tools, Mendelian genetics is proving to be a high-throughput, medically relevant ‘functional annotator’ of the human genome. This explains the resurgence of interest in this field after it was overshadowed transiently by the study of complex genetics where unprecedented investment has assigned functionality to a frustratingly small number of genes despite the numerous potential targets that have been identified [10–12]. A special scenario in Mendelian genetics that is particularly useful in this regard is when an individual gene harbors a LOF mutation in both alleles, rendering the gene completely inactive, equivalent to an experimental knockout in model organisms. This is pervasive in autosomal recessive disorders and has long been leveraged as a benchmark against which all other classes of mutations are compared for their detrimental effect. Although the notion that human knockouts represent the ultimate in vivo environment to investigate gene function has far reaching implications, the potential of these naturally occurring experiments has largely been confined to the context of ‘disease’ and thus not fully unlocked. In part, this may be due to the fact that most human traits are quantitative, and therefore it is not straightforward to connect genotype to phenotype. Another important reason is the failure to view ‘disease’ as a subjective annotation of a particular phenotypic state that is more appropriately viewed as a phenotypic data point along a spectrum that ranges from early embryonic lethality to lack of phenotypic consequence and the entire range between these two extremes (Figure 1) [13]. The current ability to identify knockout events in every individual genome using new sequencing tools provides an unprecedented opportunity to move freely between the genotype and phenotype thus bypassing historical roadblocks in defining the true spectrum of phenotypic consequences of gene loss in humans. This review discusses many aspects of this exciting line of research with special emphasis on its potentially far-reaching medical implications. Existing Trends in Genetics xx (2014) 1–8

1

TIGS-1166; No. of Pages 8

Review

Trends in Genetics xxx xxxx, Vol. xxx, No. x

(A)

(B)

Annotaon of the Mendeliome

Human Interpretaon of genomic reports

Common disease genecs

Knockouts Embryonic lethality (or ferlizaon failure)

Mild Mendelian diseases

Common phenotypic variaon

Phenotypic diversity

Drug development

Phenotypic spectrum Human phylogenecs Severe Mendelian diseases

Predisposion to common diseases

TRENDS in Genetics

Figure 1. (A) Phenotypes associated with human knockout events can span the entire phenotypic spectrum and are not limited to diseases. (B) Research in human knockouts has a wide range of basic and translational applications.

and perceived challenges are also discussed with some solutions suggested. The evolutionary legacy of LOF in the human genome LOF mutations are defined as genetic (or genomic) alterations that render a particular allele completely inactive. Genomic rearrangements that result in whole gene deletion are conspicuous examples of LOF where the gene is physically removed [14]. More often, however, the null nature results from frameshift insertion or deletion or nonsense mutations giving rise to premature truncation of the encoded proteins. Canonical splice-site mutations usually result in alteration of the normal transcript, and although they do not necessarily result in a frameshift they are often counted among LOF mutations. Until recently, the magnitude of LOF mutations in the human genome was largely inferred from epidemiological studies that extrapolated their carrier frequency from analysis of autosomal recessive diseases (e.g., lethal equivalents) [15]. However, we now have empirical data based on whole-genome and whole-exome sequencing of a large number of individuals that revealed the presence of around 400 LOF variants per individual [16]. As I discuss later in detail, defining a variant as LOF is not straightforward so caution has to be exercised in interpreting this and related figures in the literature. Nonetheless, it is clear that LOF variants are remarkably more common than previously thought [17]. Most LOF variants are low in frequency, which is consistent with negative selection, but some exhibit a higher population frequency than would be expected for detrimental alleles, which raises interesting questions about the factors that maintain these variants at such high frequencies [18]. In its simplest form, LOF can reach fixation when it represents a human-specific ‘pseudogenization’ event. Differentiating passive from adaptive pseudogenization is 2

paramount because the latter is particularly relevant to our understanding of how gene loss may have endowed humans with selective advantage, but doing so is not always easy. Inference of positive selection is usually based on such criteria as selective sweep with long haplotypes, short time from allele origin to its fixation, a dearth of polymorphisms in the null versus ancestral alleles, and pre- and post-admixture analysis of human populations [19–23], but these criteria are not foolproof. For example, local recombination hotspots can hide the long haplotype signature of positively selected LOF [24]. Adaptation through LOF is the basis of the ‘less is more’ hypothesis that suggests that loss of some genetic material can accelerate evolution [25]. This phenomenon has been experimentally observed in bacteria that adapt by dispensing of genes involved in the metabolism of nutrients no longer available to them, perhaps in an attempt to contain unnecessary cost associated with the expression of those genes and to rewire cellular networks to adapt to the new metabolic environment [26]. It has been suggested that LOF mutations may have contributed to the distinction between humans and other great apes although recent evidence is not consistent with excess of fixed LOF in the human branch [27]. Several attempts have been made to catalogue human lineage-specific pseudogenization events and these revealed interesting patterns of enrichment of certain classes of genes, including chemoreception and immune response genes, and a case has been made for positive selection of some of these genes, such as CASPASE12, deficiency of which has been shown to reduce the risk of sepsis [28–30]. More recent gene inactivation events in the human lineage can also reach a high population frequency due to several factors. Apart from bottlenecks and genetic drift, which can passively drive LOF alleles with deleterious health effects to a relatively high frequency as shown in

TIGS-1166; No. of Pages 8

Review the Finnish population [31], signatures of positive selection of LOF mutations have been proposed for several genes including attractive adaptation models for some. For example, a common nonsense CD36 mutation in subSaharan Africans has the signature of positive selection, and because CD36 is thought to mediate the pathogenesis of the malaria parasite in erythrocytes, this LOF can be viewed as yet another instance of adaptation in malaria endemic regions as was demonstrated for many other genes such as HBB (hemoglobin beta chain), DARC (Duffy blood group, atypical chemokine receptor), and G6PDH (glucose-6-phosphate dehydrogenase), although this link with CD36 remains controversial [23,32–35]. In a related example, the lack of erythrocyte expression of the Duffy antigen, which confers protection against Plasmodium vivax, reached fixation in central Africa and seems to have driven this originally-African parasite out of Africa such that it is only prevalent in Asia and Latin America where Duffy is expressed [36]. Positive selection was suggested for a 32-base pair deletion in the C–C chemokine receptor (CCR5-D32), which is very common in Europeans (5–14%) and confers demonstrable protection against HIV infection, but the age of this allele suggests that it has been selected for another reason, the nature of which remains unknown [37]. Natural selection has clearly kept most LOF events at a low frequency but it is possible to envision that modern human culture with its mobility, admixture, and technological and medical sophistication can interfere in many ways with this process. For example, eradication of malaria, if ever achieved in the future, may have a profound effect by relaxing the selection pressure on many protective LOF variants. By contrast, the ability to improve the reproductive fitness of many individuals carrying LOF variants by artificial means can allow these variants to escape purifying selection. In essence, the dynamics of LOF evolution in the human race will no longer be under the sole control of natural selection. Human knockouts redefine the phenotypic spectrum The concept of a phenotypic spectrum in genetic disorders is not new. Physicians have long been aware of the phenomenon of variable expressivity and even non-penetrance in individuals with disruptive mutations. The entry point for studying LOF mutations, however, has almost always been a disease phenotype prompting the analysis of a specific gene (or set of genes) in patients to establish a causal link. The inherent bias of this approach is the failure to recognize disease as merely a particular phenotype along a wider spectrum. Defining disease as a departure from the physiologic average can indeed be subjective; a recessive TYRP1 (tyrosinase-related protein 1) mutation causing a disease known as oculocutaneous albinism type 3 is simply considered a phenotypic (blond hair) variant among Melanesians [38]. One of the earliest challenges to the disease-only context of LOF studies was identifying widespread occurrence of nonsense mutations even in the homozygous state as SNPs as well as the complete loss of genes as a function of nullizygous CNVs in healthy individuals [39,40]. Indeed, a number of studies that linked diseases to genes based on the finding of LOF variants

Trends in Genetics xxx xxxx, Vol. xxx, No. x

were later questioned after the same variants were found in healthy individuals [41]. Ideally, the entire phenotypic spectrum should be considered as a potential outcome of LOF. Genome-wide surveys of LOF in clinical cohorts representing this full spectrum will provide an unbiased approach to discern the true phenotypic manifestations of human knockouts [42]. This has only recently become possible, when such a survey could be performed at the individual level, and early results have been surprising in that they revealed additional layers of phenotypic complexity that had not been fully appreciated (Table 1). In a proof of concept study, we have enrolled nearly 80 well-phenotyped individuals whose parents are first cousins in order to maximize the probability of encountering homozygous LOF by virtue of autozygosity, thus representing true knockout events [43,44]. Whole-exome sequencing revealed 175 such events involving 169 genes. As expected, this list was severely depleted for known Mendelian disease genes. In fact, lack of the reported disease phenotype in individuals who are knocked out for some of these genes challenged the original disease link, as was shown for ACY1 (aminoacylase 1) and UPB1 (ureidopropionase beta), the etiological role of which in abnormal neurocognitive phenotypes has also been questioned by other studies [45]. The nature of some of the knocked out genes and available literature on their function raise many interesting hypotheses. For example, although overt immunological phenotypes were not observed in the individuals with biallelic LOF in NLRC3 (NLR family, CARD domain containing 3), evidence from mice suggests that NLRC3 may play a role in response to specific microbial triggers [43,46]. This context-dependent penetrance could also explain the apparent lack of clinical consequence of the biallelic inactivation of some metabolic genes whose biochemical phenotype may be truly benign as suggested for OPLAH (5-oxoprolinase, ATP hydrolyzing), or clinically relevant only under certain circumstances such as starvation or exposure to specific dietary substrates [43,47,48]. Other types of discrete phenotypes that may be related to the knockout events observed in this study include external appearance, sensory perception, and fertility [43]. A common denominator between the above examples is the potential subtlety of the phenotype that may result from the complete knockout of the respective genes. Phenotyping will need to be significantly more sophisticated to enable the proper testing of many hypotheses that wholegenome scans are generating at an ever-increasing rate. Ironically, it appears that the bottleneck in medical genomics, long blamed on sequencing capacity, is quickly shifting back to phenotyping [49]. Building bridges between Mendelian and complex genetics The ability to investigate phenotypic aspects that are within the realm of normal variation and not necessarily disease states is where complex genetics approaches have clearly surpassed their Mendelian counterparts. Genomewide association study (GWAS) design was the first successful attempt to investigate not only multifactorial disorders but also a very wide range of phenotypic variations, 3

TIGS-1166; No. of Pages 8

Review

Trends in Genetics xxx xxxx, Vol. xxx, No. x

Table 1. Examples of human knockout events with established phenotypic consequences other than disease states Gene ACTN3 APOC3 CCR5 P450 genes, e.g., CYP2C9, CYP3A4, and CYP2D6 FUT2 IDOL LPA MC1R OPLAH PCSK9

Phenotype Reduced power and enhanced endurance capacity 40% reduction of plasma triglyceride levels and 40% reduction of risk of ischemic vascular and ischemic heart diseases Protection against HIV infection Variation in drug metabolism

Refs [75] [60,61]

Non-secretor phenotype of ABO antigen in body fluids and may confer protection against Norwalk virus infection Reduction in plasma LDL cholesterol Reduction in plasma lipoprotein A and reduction in risk of cardiovascular disease Variation in skin and hair color Oxoprolinuria with no clinical consequences Reduction in plasma cholesterol and reduction in risk of cardiovascular diseases

[80]

such as height, weight, ear wax composition, hemoglobin level, and others [50]. Despite the unprecedented window into the genetic determinants of human phenotypes those studies have provided, they were inherently limited by design to relatively common polymorphisms on the genotyping platforms that usually exert a small effect individually, which made it difficult to assign a causal role to the genes that harbor (or are in linkage disequilibrium to) those polymorphisms, with a proportionately larger cohort size required to detect signals from smaller effect alleles. However, the expanding phenotypic spectrum that is now being investigated in connection to LOF will enable Mendelian genetics to contribute significantly and in a complementary way to complex genetics approaches to our understanding of human phenotypic variation. The notion that common phenotypes are too complex to be recapitulated by single gene mutations has long been refuted by the identification of numerous Mendelian forms of complex disorders. In fact, there is hardly a complex disorder for which no Mendelian form has been reported, often involving genes in which milder variants are usually found to increase susceptibility in GWAS of the same disease [51,52]. How then could this be leveraged to fully exploit the power of human Mendelian knockouts to accelerate research into common phenotypes? GWAS investigators are all too familiar with the common problem of identifying risk alleles that are non-genic, which often prompts the search for a likely candidate in close proximity to the risk allele [53]. Genes that have been linked to Mendelian forms of the same phenotype are usually compelling candidates as has been shown on many occasions. When there is no such prior knowledge, there is a danger of assigning the risk to the wrong gene. For example, a large GWAS on systemic lupus erythematosus (SLE) revealed several alleles with genome-wide significance, one of which was intergenic and it was assumed that the closest gene, PXK (PX domain containing serine/threonine kinase), may have been the underlying gene [54]. However, the subsequent identification of human patients who are functionally knocked out for DNASE1L3 (deoxyribonuclease I-like 3), which was also in close proximity to the same risk allele and displayed a severe Mendelian form of SLE, hinted strongly 4

[76] [77–79]

[81] [31] [82] [47] [83]

at DNASE1L3 as the true source of that GWAS signal, as later suggested by subsequent studies [51,55,56]. Conversely, because GWAS can only suggest association rather than a causal link, the observation of an overlapping phenotype in human knockouts for the same gene can provide compelling evidence in support of the GWAS association. For instance, variants in LACC1 (laccase domain containing 1) have been associated with Crohn’s disease, but the link between this gene of unknown function and the disease became definitive after a study in which a severe Mendelian form of Crohn’s disease caused by LACC1 mutation was reported [52,57]. Perhaps the historical dichotomy of Mendelian and complex genetics is further challenged when LOF in the same gene that causes a Mendelian trait is shown to influence the risk of a complex phenotype (Box 1). Because these LOF variants are usually rare or population-specific, sequencing represents the ideal approach to genotype them although newer genotyping chips have been introduced that specifically feature these and other rare

Box 1. Human knockout and common diseases There have been several discoveries involving rare LOF that significantly modulate complex phenotypes but their effect is limited to the very few individuals who harbor them. In one of the best examples of a LOF in a Mendelian gene exerting a large effect size on common disease susceptibility in the general population, researchers identified a LOF variant in TBC1D4 (TBC1 Domain Family, Member 4), linked to a Mendelian form of insulin resistance, exerting a remarkable influence on the risk of type 2 diabetes among Greenlandics, perhaps explaining up to 50% of the disease risk [73,84]. Other than the unprecedented high percentage of the explained disease risk, that study was conducted on a sample size that is considered too small for traditional GWAS approaches with just 2575 participants [73]. This dual advantage would not have been possible if the investigators had not adopted a sequencing-based approach, which allowed them to survey variants not represented in genotyping chips. Interestingly, the LOF variant in TBC1D4 was not rare but rather a common Greenlandic variant, probably by virtue of genetic drift, despite its extreme rarity among other populations, which represents a very interesting variation on the common disease/common variant theme [73]. This highly surprising result is a powerful example demonstrating that LOF variants can play a significant role in modulating risk of common diseases and other phenotypes.

TIGS-1166; No. of Pages 8

Review variants (exome arrays). This line of investigation (studying association of LOF with common phenotypes) has been very promising despite being in its very early stages. For example, significant associations have been identified for LOF variants in PCSK9 (proprotein convertase subtilisin/ kexin type 9) and APOB (apolipoprotein B), and APOC3 (apolipoprotein C3) and lower low-density lipoprotein (LDL) cholesterol and triglyceride levels, respectively, and a LOF in ANGPTL8 (angiopoietin-like protein 8) and higher high-density lipoprotein (HDL) cholesterol and lower triglyceride levels in blood [58–61]. In addition, a very recent study established several novel associations between LOF and various phenotypes in a Finnish cohort, including LOF in LPA (lipoprotein A) and lower risk of cardiovascular disease [31]. The demonstrated potential of this approach will not only broaden the scope of GWAS but will also establish a new paradigm in which LOF variants serve as bridges between the Mendelian and complex genetics communities. Medical implications A wider definition of the phenotypic effects of LOF variants, as described above, promises a corresponding broadening of the medical relevance of human knockout events, which should no longer be limited to rare Mendelian disorders (Figure 1). At the very least, identifying biallelic LOF in genes in individuals with no discernible phenotype will aid significantly in the interpretation of genomic medical reports inasmuch as Mendelian disease-associated LOF have been instrumental in guiding the clinical interpretation of medical exome and genome sequencing. Given the potentially life-changing decisions that can be based on these reports, the use of biallelic LOF in apparently normal individuals as a means to challenge less than compelling claims of disease causality involving the same genes is also of profound medical relevance. The medical implications of investigating the role of LOF in common diseases go beyond the prediction of disease risk. As mentioned above, an association between LOF and disease is much more likely to be causal in nature as compared to other classes of variants, and it is this level of evidence that the pharmaceutical industry requires to invest in the development of therapeutics. Although replacing the product of a knocked out gene is difficult and its track record as a treatment strategy has been mixed (for example, leptin as a treatment for obesity inspired by the finding that human knockouts of leptin are morbidly obese has not been successful [62]), neutralizing proteins using antibodies to recapitulate health benefits observed in the corresponding gene knockout is more amenable. With the latter approach, identifying individuals who enjoy health benefits as result of being knocked out for certain genes can be viewed as a lucrative shortcut for drug development especially when no detrimental health consequences are observed in those individuals [63]. Perhaps one of the most celebrated success stories of drug development spurred by a human knockout event is that of PCSK9. When individuals with very low LDL levels were found to carry biallelic LOF in this gene, it immediately attracted attention as a potential opportunity to develop a novel class of cholesterol lowering agents as

Trends in Genetics xxx xxxx, Vol. xxx, No. x

alternatives or adjuvants to statins. Indeed, clinical trials have shown a significant LDL lowering effect from monoclonal antibodies against PCSK9 (Evolocumab) [64,65]. Another widely discussed example of human knockoutinspired therapeutic potential is the resistance to HIV infection conferred by LOF in CCR5, encoding a co-receptor of HIV cellular entry. An HIV-infected patient who seems to have been cured after receiving a bone marrow transplant from a knockout individual for CCR5 was a particularly powerful case for the therapeutic potential of CCR5 inactivation and an FDA-approved monoclonal antibody against CCR5 for the treatment of HIV is already on the market [66]. Very recently, encouraging data from a trial involving gene editing of CD4 T cells to inactivate CCR5 renewed hope in the potential of curing HIV [67]. PCSK9 and CCR5 are unlikely to be isolated examples, for example, LOF in LPA and APOC3 have very recently been shown to be independent protective factors against cardiovascular disease without apparent adverse health consequences [31,60,61]. Challenges and future directions There are many unanswered questions about LOF in humans. From an evolutionary perspective, it is important to carefully discern the factors that led to the fixation of some LOF alleles in the human lineage bearing in mind that evidence for positive selection may not be straightforward. A good example is the demonstration of CMAH (cytidine monophosphate-N-acetylneuraminic acid hydroxylase) LOF in humans, which reached fixation in a two-step fashion where it had to reach sufficient frequency by genetic drift before positive selection drove it to fixation due to the delicate balance between its detrimental influence in low frequency and beneficial effect at high frequency [68]. When a signal of positive selection is convincingly demonstrated for a particular gene, one would predict this to drive multiple LOF alleles to reach high frequency as these are likely to produce the same inactivation upon which selection acts, as is the case for FUT2 (fucosyltransferase 2) and CD63 [16]. However, in many instances it is only one or very few LOF alleles that reach high frequency, such as CCR5. The reason for this pattern remains unclear but deserves further studies. Unlike LOF mutations identified by Sanger sequencing in the context of a suspected Mendelian disorder, the largescale, genome-wide surveys of LOF that are discussed in this review rely on the new next-generation sequencing platforms. Despite their unparalleled throughput, these platforms suffer from a significant decline in accuracy with the very class of mutations that are relevant to LOF research, namely insertions and deletions. In two surveys of LOF, false positives were a major problem, especially in the case of heterozygous variants [16,43]. There were also instances where reference genome annotation errors led to the false calling of insertions and deletions [69]. The very definition of LOF will continue to be a challenge for the near future. For example, whereas MacArthur and colleagues considered canonical splice site mutations as LOF, we used a very conservative definition that excluded this class of mutations because their effect on splicing does not necessarily lead to premature truncation 5

TIGS-1166; No. of Pages 8

Review [16,43]. Even mutations that are conveniently labeled as LOF, such as frameshift and nonsense mutations, can be challenging to prove as truly inactivating depending on the location of the mutation, the number of transcripts they affect, the likelihood of salvaging the reading frame with a downstream start codon, etc. The numerous published examples of missense mutations that completely abolish the protein function add another layer of complexity because defining truly LOF missense mutations based on in silico methods remains a formidable challenge. Perhaps the most challenging class of LOF variants, however, will be noncoding changes. Lactose intolerance and lack of Duffy antigen are powerful examples of how regulatory element mutations can cause LOF with profound genetic epidemiology implications even though they do not fall under the operational definition of LOF from current studies (in the case of Duffy antigen, the promoter LOF mutation is even erythrocyte specific) [70,71]. It is very difficult to predict the number of noncoding LOF mutations in the human genome but their extremely rare occurrence in autosomal recessive diseases suggest that they may not be a common mechanism of creating complete knockout events in humans. Because LOF mutations tend to be rare, identifying human knockouts with biallelic LOF is particularly challenging. For example, 40 000 individuals will have to be sequenced to identify a single event of homozygosity for a LOF that is carried by 1% of the population as dictated by Hardy–Weinberg’s ( p + q)2 = 1. As the frequency of the LOF allele drops, the required sample size to identify a human knockout will increase exponentially, which can be prohibitive. This is particularly relevant to the search for advantageous human knockout phenotypes, which can be extremely rare (e.g., a single individual with enhanced muscle mass due to homozygous LOF in the myostatin gene has been reported [72]). One way to address this problem is to study genetic isolates where bottleneck and genetic drift effects can make the q:q2 correlation dictated by the Hardy–Weinberg equilibrium less relevant, as has recently been shown in the Finnish and Greenlandic populations [31,73]. Another approach is to study genomes enriched for autozygosity as a function of consanguinity. Indeed, the above mentioned study of offspring of consanguineous parents has shown a powerful correlation between autozygosity and LOF frequency; the more rare a LOF allele is, the more likely it is to have been rendered homozygous as a function of autozygosity, allowing for the identification of homozygosity for extremely rare LOF alleles [43]. This is the basis for expanding that study to involve 10 000 healthy Saudis who are born to first-cousin parents, a cohort size that is likely to reveal homozygous knockout events for even very rare or very young LOF alleles [63]. A major challenge in assigning a phenotypic context to LOF is the multilayered complexity of the candidate phenotypes. As mentioned above, some phenotypes have no standards of measurements, others are only penetrant with the right environmental trigger (microbial pathogen, dietary ingredient, etc.), and yet others simply cannot be scored (due to very early embryonic lethality). Not only does this call for clinical cohorts to be phenotyped as 6

Trends in Genetics xxx xxxx, Vol. xxx, No. x

thoroughly as possible, but it should serve as a reminder that complete phenotyping is impossible so studies should have the option to recall those patients with interesting LOF alleles to ascertain specific phenotypic consequences based on the nature of the genes involved. In this regard, it is worth highlighting that a knockout event may act multiple steps away from the key cellular pathway it modulates, making it difficult to predict the relevant phenotype. The concern that an apparently beneficial knockout event does not necessarily guarantee the safe inactivation of that gene in other humans is not entirely unjustified. We have learned from model organisms that LOF in one allele may only be consequential when combined with another genetic perturbation (i.e., synthetic lethality). In tomatoes for instance, the two relatively common LOF alleles in LC (locule number) and OVATE are never seen together as a result of this [74]. Although this phenomenon of buffering can explain why some beneficial LOF alleles fail to reach fixation, it raises concerns that gene inactivation can have potentially adverse consequences in some individuals. Given the complexity of gene–gene interactions and our limited knowledge in this regard, advising against the pursuit of the therapeutic potential of knockout events in humans until we have a full understanding of these interactions would only result in limiting the possible treatment options available to patients. Rather, investigators should proceed cautiously and be aware of potential adverse consequences in a subset of subjects, a concern shared by all trials involving new therapeutics. Concluding remarks Investigating knockout events in humans has taken on new dimensions in recent years thanks to next-generation sequencing tools that enabled a revolutionary approach to genotype/phenotype correlation in humans. With the common variant/common disease paradigm nearly exhausting its potential, at least in the practical sense, interest in rare variants is on the rise and early results clearly place LOF mutations as the most promising in that category. ‘Nextgeneration’ human knockout research will be increasingly relevant to mainstream medicine. Ironically, with this line of research the famous proverb ‘we never appreciate something until it’s gone’ will also be relevant in the reverse sense when we specifically engineer that loss to appreciate its health benefits. Acknowledgments I am grateful to the members of my lab for their help in generating data that I cited in this manuscript. It is inevitable that I missed important and relevant work in preparing this review so I apologize to colleagues whose important work was not cited. I thank my sons Ibrahim and Imen for their help with Figure 1. This work was funded in part by King Abdulaziz City for Science and Technology (KACST) grants #13-BIO111320 and 10-BIO1357-20.

References 1 Rands, C.M. et al. (2014) 8.2% of the human genome is constrained: variation in rates of turnover across functional element classes in the human lineage. PLoS Genet. 10, e1004525 2 Ponting, C.P. and Hardison, R.C. (2011) What fraction of the human genome is functional? Genome Res. 21, 1769–1776 3 Consortium, E.P. (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74

TIGS-1166; No. of Pages 8

Review 4 Doolittle, W.F. (2013) Is junk DNA bunk? A critique of ENCODE. Proc. Natl. Acad. Sci. U.S.A. 110, 5294–5300 5 Wang, E.T. et al. (2008) Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 6 Chen, R. et al. (2012) Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell 148, 1293–1307 7 Ginsburg, D. (2011) Genetics and genomics to the clinic: a long road ahead. Cell 147, 17–19 8 Biesecker, L.G. and Green, R.C. (2014) Diagnostic clinical genome and exome sequencing. N. Engl. J. Med. 370, 2418–2425 9 Campbell, C.D. et al. (2012) Estimating the human mutation rate using autozygosity in a founder population. Nat. Genet. 44, 1277–1281 10 Brunham, L.R. and Hayden, M.R. (2013) Hunting human disease genes: lessons from the past, challenges for the future. Hum. Genet. 132, 603–617 11 McCarthy, J.J. et al. (2013) Genomic medicine: a decade of successes, challenges, and opportunities. Sci. Transl. Med. 5, 189sr4 12 Boycott, K.M. et al. (2013) Rare-disease genetics in the era of nextgeneration sequencing: discovery to translation. Nat. Rev. Genet. 14, 681–691 13 Hartman, J.L. et al. (2001) Principles for the buffering of genetic variation. Science 291, 1001–1004 14 Khalak, H.G. et al. (2012) Autozygome maps dispensable DNA and reveals potential selective bias against nullizygosity. Genet. Med. 14, 515–519 15 McConkey, E.H. (1993) Human Genetics: The Molecular Revolution, Jones & Bartlett Learning 16 MacArthur, D.G. et al. (2012) A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 17 MacArthur, D.G. and Tyler-Smith, C. (2010) Loss-of-function variants in the genomes of healthy humans. Hum. Mol. Genet. 19, R125–R130 18 Fay, J.C. et al. (2001) Positive and negative selection on the human genome. Genetics 158, 1227–1234 19 Pickrell, J.K. et al. (2009) Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 19, 826–837 20 Voight, B.F. et al. (2006) A map of recent positive selection in the human genome. PLoS Biol. 4, e72 21 Tang, K. et al. (2007) A new approach for using genome scans to detect recent positive selection in the human genome. PLoS Biol. 5, e171 22 Carlson, C.S. et al. (2005) Genomic regions exhibiting positive selection identified from dense genotype data. Genome Res. 15, 1553–1565 23 Jin, W. et al. (2012) Genome-wide detection of natural selection in African Americans pre-and post-admixture. Genome Res. 22, 519–527 24 Engelken, J. et al. (2014) Extreme population differences in the human zinc transporter ZIP4 (SLC39A4) are explained by positive selection in Sub-Saharan Africa. PLoS Genet. 10, e1004128 25 Olson, M.V. (1999) When less is more: gene loss as an engine of evolutionary change. Am. J. Hum. Genet. 64, 18–23 26 Hottes, A.K. et al. (2013) Bacterial adaptation through loss of function. PLoS Genet. 9, e1003617 27 Prado-Martinez, J. et al. (2013) Great ape genetic diversity and population history. Nature 499, 471–475 28 Wang, X. et al. (2006) Gene losses during human origins. PLoS Biol. 4, e52 29 International Human Genome Sequencing Consortium (2004) Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 30 Hahn, Y. and Lee, B. (2005) Identification of nine human-specific frameshift mutations by comparative analysis of the human and the chimpanzee genome sequences. Bioinformatics 21 (Suppl. 1), i186–i194 31 Lim, E.T. et al. (2014) Distribution and medical impact of loss-offunction variants in the Finnish founder population. PLoS Genet. 10, e1004494 32 Cabrera, A. et al. (2014) CD36 and malaria: friends or foes? A decade of data provides some answers. Trends Parasitol. 30, 436–444 33 Fry, A.E. et al. (2009) Positive selection of a CD36 nonsense variant in sub-Saharan Africa, but no association with severe malaria phenotypes. Hum. Mol. Genet. 18, 2683–2692 34 Mangano, V.D. and Modiano, D. (2014) An evolutionary perspective of how infection drives human genome diversity: the case of malaria. Curr. Opin. Immunol. 30, 39–47 35 Hedrick, P. (2011) Population genetics of malaria resistance in humans. Heredity 107, 283–304

Trends in Genetics xxx xxxx, Vol. xxx, No. x

36 Liu, W. et al. (2014) African origin of the malaria parasite Plasmodium vivax. Nat. Commun. 5, Published online February 21, 2014. http:// dx.doi.org/10.1038/ncomms4346 37 Sabeti, P.C. et al. (2005) The case for selection at CCR5-D32. PLoS Biol. 3, e378 38 Kenny, E.E. et al. (2012) Melanesian blond hair is caused by an amino acid change in TYRP1. Science 336, 554 39 Yngvadottir, B. et al. (2009) A genome-wide survey of the prevalence and evolutionary forces acting on human nonsense SNPs. Am. J. Hum. Genet. 84, 224–234 40 Conrad, D.F. et al. (2009) Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 41 Piton, A. et al. (2013) XLID-causing mutations and associated genes challenged in light of data from large-scale human exome sequencing. Am. J. Hum. Genet. 93, 368–383 42 Kryukov, G.V. et al. (2009) Power of deep, all-exon resequencing for discovery of human trait genes. Proc. Natl. Acad. Sci. U.S.A. 106, 3871– 3876 43 Alsalem, A.B. et al. (2013) Autozygome sequencing expands the horizon of human knockout research and provides novel insights into human phenotypic variation. PLoS Genet. 9, e1004030 44 Alkuraya, F.S. (2010) Autozygome decoded. Genet. Med. 12, 765–771 45 Nakajima, Y. et al. (2014) Clinical, biochemical and molecular analysis of 13 Japanese patients with b-ureidopropionase deficiency demonstrates high prevalence of the p. 977G> A (p.R326Q) mutation. J. Inherit. Metab. Dis. 37, 801–812 46 Schneider, M. et al. (2012) The innate immune sensor NLRC3 attenuates Toll-like receptor signaling via modification of the signaling adaptor TRAF6 and transcription factor NF-[kappa] B. Nat. Immunol. 13, 823–831 47 Almaghlouth, I. et al. (2012) 5-Oxoprolinase deficiency: report of the first human OPLAH mutation. Clin. Genet. 82, 193–196 48 Calpena, E. et al. (2014) New insights into the genetics of 5oxoprolinase deficiency and further evidence that it is a benign biochemical condition. Eur. J. Pediatr. Published online August 25, 2014. http://dx.doi.org/10.1007/s00431-014-2397-0 49 Hennekam, R. and Biesecker, L.G. (2012) Next-generation sequencing demands next-generation phenotyping. Hum. Mutat. 33, 884–886 50 Singleton, A.B. et al. (2010) Towards a complete resolution of the genetic architecture of disease. Trends Genet. 26, 438–442 51 Al-Mayouf, S.M. et al. (2011) Loss-of-function variant in DNASE1L3 causes a familial form of systemic lupus erythematosus. Nat. Genet. 43, 1186–1188 52 Patel, N. et al. (2014) Study of Mendelian forms of Crohn’s disease in Saudi Arabia reveals novel risk loci and alleles. Gut 63, 1831–1832 53 Cantor, R.M. et al. (2010) Prioritizing GWAS results: a review of statistical methods and recommendations for their application. Am. J. Hum. Genet. 86, 6–22 54 Harley, J.B. et al. (2008) Genome-wide association scan in women with systemic lupus erythematosus identifies susceptibility variants in ITGAM, PXK, KIAA1542 and other loci. Nat. Genet. 40, 204–210 55 Mayes, M.D. et al. (2014) Immunochip analysis identifies multiple susceptibility loci for systemic sclerosis. Am. J. Hum. Genet. 94, 47–61 56 Eyre, S. et al. (2012) High-density genetic mapping identifies new susceptibility loci for rheumatoid arthritis. Nat. Genet. 44, 1336–1340 57 Umeno, J. et al. (2011) Meta-analysis of published studies identified eight additional common susceptibility loci for Crohn’s disease and ulcerative colitis. Inflamm. Bowel Dis. 17, 2407–2415 58 Lange, L.A. et al. (2014) Whole-exome sequencing identifies rare and low-frequency coding variants associated with LDL cholesterol. Am. J. Hum. Genet. 94, 233–245 59 Peloso, G.M. et al. (2014) Association of low-frequency and rare codingsequence variants with blood lipids and coronary heart disease in 56,000 whites and blacks. Am. J. Hum. Genet. 94, 223–232 60 Jørgensen, A.B. et al. (2014) Loss-of-function mutations in APOC3 and risk of ischemic vascular disease. N. Engl. J. Med. 371, 32–41 61 The TG and HDL Working Group of the Exome Sequencing Project, National Heart, Lung, and Blood Institute (2014) Loss-of-function mutations in APOC3, triglycerides, and coronary disease. N. Engl. J. Med. 371, 22–31 62 Paz-Filho, G. et al. (2014) Leptin treatment: facts and expectations. Metabolism Published online August 3, 2014. http://dx.doi.org/ 10.1016/j.metabol.2014.07.014 7

TIGS-1166; No. of Pages 8

Review 63 Kaiser, J. (2014) The hunt for missing genes. Science 344, 687–689 64 Blom, D.J. et al. (2014) A 52-week placebo-controlled trial of evolocumab in hyperlipidemia. N. Engl. J. Med. 370, 1809–1819 65 Robinson, J.G. et al. (2014) Effect of evolocumab or ezetimibe added to moderate-or high-intensity statin therapy on LDL-C lowering in patients with hypercholesterolemia: the LAPLACE-2 randomized clinical trial. JAMA 311, 1870–1882 66 Hu¨tter, G. et al. (2009) Long-term control of HIV by CCR5 Delta32/ Delta32 stem-cell transplantation. N. Engl. J. Med. 360, 692–698 67 Tebas, P. et al. (2014) Gene editing of CCR5 in autologous CD4 T cells of persons infected with HIV. N. Engl. J. Med. 370, 901–910 68 Ghaderi, D. et al. (2011) Sexual selection by female immunity against paternal antigens can fix loss of function alleles. Proc. Natl. Acad. Sci. U.S.A. 108, 17743–17748 69 Balasubramanian, S. et al. (2011) Gene inactivation and its implications for annotation in the era of personal genomics. Genes Dev. 25, 1–10 70 Tournamille, C. et al. (1995) Disruption of a GATA motif in the Duffy gene promoter abolishes erythroid gene expression in Duffy–negative individuals. Nat. Genet. 10, 224–228 71 Tishkoff, S.A. et al. (2006) Convergent adaptation of human lactase persistence in Africa and Europe. Nat. Genet. 39, 31–40 72 Schuelke, M. et al. (2004) Myostatin mutation associated with gross muscle hypertrophy in a child. N. Engl. J. Med. 350, 2682–2688 73 Moltke, I. et al. (2014) A common Greenlandic TBC1D4 variant confers muscle insulin resistance and type 2 diabetes. Nature 512, 190–193 74 Rodrı´guez, G.R. et al. (2011) Distribution of SUN, OVATE, LC, and FAS in the tomato germplasm and the relationship to fruit shape diversity. Plant Physiol. 156, 275–285

8

Trends in Genetics xxx xxxx, Vol. xxx, No. x

75 Yang, N. et al. (2003) ACTN3 genotype is associated with human elite athletic performance. Am. J. Hum. Genet. 73, 627–631 76 Liu, R. et al. (1996) Homozygous defect in HIV-1 coreceptor accounts for resistance of some multiply-exposed individuals to HIV-1 infection. Cell 86, 367–377 77 Werk, A. et al. (2013) Identification and characterization of a defective CYP3A4 genotype in a kidney transplant patient with severely diminished tacrolimus clearance. Clin. Pharmacol. Ther. 95, 416–422 78 Kidd, R.S. et al. (2001) Identification of a null allele of CYP2C9 in an African-American exhibiting toxicity to phenytoin. Pharmacogenet. Genome 11, 803–808 79 Ingelman-Sundberg, M. (2004) Pharmacogenetics of cytochrome P450 and its applications in drug therapy: the past, present and future. Trends Pharmacol. Sci. 25, 193–200 80 Lindesmith, L. et al. (2003) Human susceptibility and resistance to Norwalk virus infection. Nat. Med. 9, 548–553 81 Sorrentino, V. et al. (2013) Identification of a loss-of-function inducible degrader of the low-density lipoprotein receptor variant in individuals with low circulating low-density lipoprotein. Eur. Heart J. 34, 1292– 1297 82 Beaumont, K.A. et al. (2008) Red hair is the null phenotype of MC1R. Hum. Mutat. 29, E88–E94 83 Cohen, J. et al. (2005) Low LDL cholesterol in individuals of African descent resulting from frequent nonsense mutations in PCSK9. Nat. Genet. 37, 161–165 84 Dash, S. et al. (2009) A truncation mutation in TBC1D4 in a family with acanthosis nigricans and postprandial hyperinsulinemia. Proc. Natl. Acad. Sci. U.S.A. 106, 9350–9355

Human knockout research: new horizons and opportunities.

Although numerous approaches have been pursued to understand the function of human genes, Mendelian genetics has by far provided the most compelling a...
593KB Sizes 0 Downloads 5 Views