expert reviews in molecular medicine

The noncoding human genome and the future of personalised medicine Philip Cowie, Elizabeth A. Hay and Alasdair MacKenzie* Non-coding cis-regulatory sequences act as the ‘eyes’ of the genome and their role is to perceive, organise and relay cellular communication information to RNA polymerase II at gene promoters. The evolution of these sequences, that include enhancers, silencers, insulators and promoters, has progressed in multicellular organisms to the extent that cis-regulatory sequences make up as much as 10% of the human genome. Parallel evidence suggests that 75% of polymorphisms associated with heritable disease occur within predicted cis-regulatory sequences that effectively alter the ‘perception’ of cis-regulatory sequences or render them blind to cell communication cues. Cis-regulatory sequences also act as major functional targets of epigenetic modification thus representing an important conduit through which changes in DNA-methylation affects disease susceptibility. The objectives of the current review are (1) to describe what has been learned about identifying and characterising cis-regulatory sequences since the sequencing of the human genome; (2) to discuss their role in interpreting cell signalling pathways pathways; and (3) outline how this role may be altered by polymorphisms and epigenetic changes. We argue that the importance of the cis-regulatory genome for the interpretation of cellular communication pathways cannot be overstated and understanding its role in health and disease will be critical for the future development of personalised medicine. Introduction With the advent of genome wide association (GWA) study technologies (Ref. 1) and the recent publication of the ENCODE consortium findings (Ref. 2), the critical role of gene regulation in health and disease is becoming clearer. The following review will summarise our current understanding of the cell specific cis-regulatory components of the genome that

interpret signal transduction cues and how they are affected by polymorphic variation and epigenetic changes. Thanks to our rapidly increasing understanding of the noncoding regulatory genome, and its role in translating signal transduction information into cell specific gene expression, it is likely that major health benefits will be realised within the life span of the majority of individuals reading this review

The School of Medical Sciences, Institute of Medical Sciences, University of Aberdeen, Foresterhill, Aberdeen AB25 2ZD, UK.

The noncoding human genome and the future of personalised medicine

http://www.expertreviews.org/

*Corresponding author: Alasdair MacKenzie University of Aberdeen, School of Medical Sciences, Foresterhill, Aberdeen, AB25 2ZD, United Kingdom. E-mail [email protected] Accession information: doi:10.1017/erm.2014.23; Vol. 17; e4; January 2015 © Cambridge University Press 2015

1

expert reviews in molecular medicine

and will facilitate delivery of the promises of personalised medicine.

The cis-regulatory genome Recent publication of the ENCODE consortium findings has suggested that up to 80% of the human genome is functional (Ref. 2). Although this high figure has garnered substantial, and sometimes warranted, criticism (Ref. 3), current methods of defining the functional regulatory genome suggest that up to 10% of the human genome is under strong selective pressure; a strong indicator of functionality. In addition, there is likely to be as much as 4.5 times more DNA with gene regulatory function than that coding for proteins (Ref. 2). Much of this conserved genomic information consists of cisregulatory sequences which include promoters, enhancers, silencers and insulator sequences (summarised in Table 1 and Fig. 1); all of which work together to orchestrate and choreograph the cell-specific and inducible regulation of gene expression. A fact that has been largely overlooked is that the main purpose of cisregulatory sequences is to receive, translate and relay information, in the form of activated signal transduction cascades, to the core transcriptional apparatus that is directly responsible for gene transcription (Ref. 4). Therefore, in addition to defining the majority of differences between species, changes in the cis-regulatory genome also define individual differences within species including disease susceptibility and drug response.

Promoters Because of their critical role in transcription, promoters are highly dependent on their distance and orientation to the transcriptional start sites (TSS) of genes (Table 1). Evidently, there is still some level of confusion as to what a promoter is. For example, many publications describe gene promoters of many kilobases (kb) in length. Thus, clarification as to the precise definition of a promoter is urgently needed. For example, the term ‘core’ promoter only refers to the 80 or so base pairs required for the binding of the core transcriptional apparatus that, in the case of the production of mRNA, includes RNA polymerase II plus basal transcription factors (TFIIA-H and mediator) also collectively known as the pre-initiation complex (PIC). Therefore, it is likely that previously described multi-kb promoter sequences consist of a core promoter

region together with proximal regulatory sequences called enhancers or silencers that will be discussed in the next section. Comparative analysis of the vast majority of RNA polymerase II binding gene promoters in the human genome has recognised different types of promoter based on their interactions with nucleosomes (histone complexes associated with DNA) and the rigidity of their TSS (Ref. 5). Other classification criteria include epigenetic markers such as methylation and acetylation signatures at different lysine residues of histone 3 within each nucleosome (Fig. 1). For example, H3K4 mono-, diand tri-methylation (H3K4me1, 2 and 3) are markers of active promoters and H3K9 di- and tri-methylation (H3K9me2 and 3) are associated with repressed promoters (Table 1 and Fig. 1) (Ref. 6). In addition, the presence and relative extent of CpG islands (sequences of DNA close to TSS that are enriched with CpG dinucleotides) have previously been used as an accepted promoter diagnostic. However, only 60% of known promoters are associated with CpG islands and this forms another basis for their classification. Another type of epigenetic modification which is known to affect promoter activity and gene expression, and which has been much more widely explored in disease, is methylation of the DNA of the genome itself. Methylation of CpG sequences within and around promoter regions has been shown to actively repress promoter function. Methylation at promoters is first established by the methyltransferases DNMT3A and DNMT3B and perpetuated through cell divisions by DNMT1 (Ref. 7). Methylated CpGs are then bound by methyl cytosine binding proteins that repress transcription by recruiting histone de-acetylases or by inhibiting transcription factor binding (Fig. 2D) (Ref. 8). It has been observed that many CpG islands in promoters remain heritably hypomethylated through many cell divisions and it has been suggested that secondary folding of CpG rich DNA is responsible (Ref. 9). It has also been hypothesised that continued hypomethylation of CpG island from birth is essential for health and that a contributing factor in the decline of health during the aging process is the gradual accumulation of methylation within these islands (Ref. 10). There is little doubt that epigenetic control through DNA-methylation has an important

Accession information: doi:10.1017/erm.2014.23; Vol. 17; e4; January 2015 © Cambridge University Press 2015

The noncoding human genome and the future of personalised medicine

http://www.expertreviews.org/

2

expert reviews in molecular medicine

Table 1. Summary of the characteristics of cis-regulatory regions including promoters (types I–III), enhancers, silencers and insulators Promoter Type I

Enhancer

Type II

Silencer

Insulator

Type III

Function

Bind PIC and initiate transcription

Up-regulate promoter activity

Downregulate promoter activity

Insulate promoters from enhancer/ silencers

Orientation

Immediately 5′ of gene TSS.

Any

Any

Any

Distance

Specific distance

Up to 1Mb.

ND

ND

CpG island

No

No

No

No

Effects of DNA methylation

Repression

Repression

ND

ND

Associated histone modifications (not exhaustive)

H3K4me3

H3K4me1, H3K27me3, H3.3 and H2A.Z

H3K9me2 and 3

ND

Diagnostic proteins/cofactors

TFIIA-H, RNApolII,

P300

NRSF, PRC1 and PRC2

CTCF and cohesins

Yes

Yes

ND, no data. PIC, pre-initiation complex; TSS, transcriptional start site; TFIIA-H, transcription factors II A to H; RNApolII, RNA polymerase II; NRSF, neurone restricted silencing factor; PRC1 and 2, polycomb repressive complex 1 and 2; CTCF, CCCTC-binding factor.

function during embryonic development and in early post-natal development. A number of studies have demonstrated that environmental factors, such as early life stress, results in altered methylation patterns at promoter sequences in rats (Ref. 11) and humans (Refs 12, 13) with associated physiological and behavioural consequences. Because of their relative ease of identification the effects of methylation has been mostly studied in promoter sequences. However, the amount of the genome represented by core promoters only represents a tiny proportion of the total human genome that can be affected by methylation. In contrast, a number of different lines of evidence suggest that enhancers and silencers represent a much larger component of the human genome. Thus, enhancers and silencers may represent a more promising reservoir of disease causing polymorphisms and epigenetic modification (Ref. 14).

Enhancers Unlike promoter sequences, which have been shown to contain a number of characteristic sequence motifs including TFIIB binding elements (BRE), TATA boxes and downstream core elements (Ref. 15), sequence motifs that permit detection of tissue specific enhancer sequences remain to be identified (Table 1). Enhancers were first described as viral sequences that were able to enhance the expression of human genes in cell culture (Ref. 16). Around this time ‘ground rules’ pertaining to enhancers were laid down such as the concept that their role is to increase promoter activity over great distances and to work in either orientation (Ref. 17). Since then, it has been realised that all eukaryotic genomes contain enhancer sequences and the majority of genes in the human genome are subject to enhancer-promoter interactions. During the 1980s and the 1990s identification and

Accession information: doi:10.1017/erm.2014.23; Vol. 17; e4; January 2015 © Cambridge University Press 2015

The noncoding human genome and the future of personalised medicine

http://www.expertreviews.org/

3

expert reviews in molecular medicine

Ligand–receptor interactions

Nuclear receptor activation Signal transduction pathways

Protein translation

Cytoplasm

Transcription factor activation

Cohesin

P300

PRC1/2

CTCF Insulator

Enhancer K4 me K9 ac K9 me K14 ac K27 me K27 ac

Silencer

mRNA transcription

TFII proteins

Cohesin

Pol II Promoter

CTCF Gene

K4 me3

K9 me2/3

Insulator Nucleosomes

Nucleus

Simplified summary of the flow of information within cells. Expert Reviews in Molecular Medicine © 2015 Cambridge University Press Figure 1. Simplified summary of the flow of information within cells demonstrating the involvement of cis-regulatory regions in this process and demonstrating histone modification marks (K4/9/27 me/ac all refer to histone 3) and co-factor binding (p300 and PRC1/2). TFII proteins, transcription factors II A to H; polII, RNA polymerase II; PRC1/2, polycomb repressive complex 1 and 2; CTCF, CCCTCbinding factor.

characterisation of enhancer sequences was largely undertaken by the painstaking deletion analysis of regions of DNA flanking gene sequences and subsequent testing in cell lines or in transgenic animals using reporter constructs. For example, a deletion analysis of 7 kb of genomic DNA immediately 5′ of the MSX1 gene, that plays a critical role in craniofacial (and limb) development, and its subsequent testing in transgenic mouse embryos, was able to accurately pinpoint the presence of a distal (DE) and proximal enhancer (PE) that drove expression of the reporter gene in either the face or limbs, respectively (Fig. 3A) (Ref. 18). Moreover, there is mounting evidence that the expression of genes within specific tissues may

not be a function of individual enhancer sequences but may involve the co-ordinated interaction of many different enhancers. For example, recent studies of enhancer activity driving the expression of the FGF8 gene in embryonic limb buds suggested that the endogenous expression of FGF8 resulted from the activities of many different enhancer regions with overlapping functions (Ref. 19). The region of the genome containing these enhancers was referred to as a holo-enhancer whereby the whole population of enhancers within this region seemed to work as a ‘coherent integrated regulatory ensemble’. This study further suggested that enhancer–promoter selectivity of the FGF8 promoter was a function of the position

Accession information: doi:10.1017/erm.2014.23; Vol. 17; e4; January 2015 © Cambridge University Press 2015

The noncoding human genome and the future of personalised medicine

http://www.expertreviews.org/

4

expert reviews in molecular medicine

A - Permissive (euchromatin)

B - Reperssive (heterochromatin) Endogenous signalling

Endogenous signalling

Signal transduction systems

Signal transduction systems

Transcription factor activation Transcription factors bind regulatory region

Transcription factor activation Binding sites of regulatory regions unavailable

Prom

CRS

Regulation of promoter activity C - Allelic influence

Coding and regulatory DNA sequences silenced by nucleosome condensation D - Epigenetics (methylation) Endogenous signalling

Endogenous signalling

Signal transduction systems

Signal transduction systems

Transcription factor activation

CRS

Prom

No (mis-) regulation of promoter activity

Transcription factor binding restricted X

Me Me Me

Altered transcription factor binding XV

Transcription factor activation

X

CRS

Prom

No (mis-) regulation of promoter activity

The effects of allelic variation and DNA methylation on the functional state of chromatin. Expert Reviews in Molecular Medicine © 2015 Cambridge University Press Figure 2. A simplified diagram summarising (A) permissive and (B) repressive chromatin states and the effects of (C) allelic variants and CpG methylation on transcription factor binding. CRS, cis-regulatory sequence; A, disease associated allelic variant; Me, methylated CpG dinucleotide; Prom, promoter; V, allelic variant.

of the gene within the locus rather than any intrinsic property of the FGF8 promoter (Ref. 19). Clearly, the role of developmental enhancers, and their

The noncoding human genome and the future of personalised medicine

http://www.expertreviews.org/

overlapping functions is much more complicated that previously thought and it cannot be ruled out that similar levels of complexity may govern the

Accession information: doi:10.1017/erm.2014.23; Vol. 17; e4; January 2015 © Cambridge University Press 2015

5

expert reviews in molecular medicine

Signal transduction pathway

Key

Transcription factor

Protein coding region

Transcriptional start site

Promoter

Cis-regulatory element

Ligand/agonist

Receptor

(A)

PKA

Wnt

Heart

1.7 kb

35 kb KE

TCF4

DE

2.7 kb PE

MSX1

Limb Face

(B) Caps

SP

TRPV1

NK1

ERK

DRG

ECR2

90 kb

152 kb

ECR1

TAC1 Amygdala

TLR4 LPS

MeA.

The regulation of the MSX1 and TAC1 genes. Expert Reviews in Molecular Medicine © 2015 Cambridge University Press Figure 3. A summary of two examples of what is known about the regulation of the MSX1 (A) and TAC1 (B) genes. Described are the regulatory elements responsible for supporting the tissue specific expression of these genes (shown in photographic insets) and the ligand–receptor and signal transduction pathways that modulate the cell specific activity of these enhancers.

Accession information: doi:10.1017/erm.2014.23; Vol. 17; e4; January 2015 © Cambridge University Press 2015

The noncoding human genome and the future of personalised medicine

http://www.expertreviews.org/

6

expert reviews in molecular medicine

expression of genes involved in modulating human behaviour and health.

Comparative genomics By 2004, the availability of the human, mouse and chicken genome sequences had a huge effect on the ability to detect highly conserved cis-regulatory sequences. Aligning the genome sequences of multiple species, a process known as comparative genomics, could not only identify novel coding regions by virtue of their extreme conservation, but could also detect highly conserved intronic or intergenic sequences many of which subsequently proved to be cisregulatory sequences. For example, one of the earliest uses of comparative genomics identified an enhancer region that was found to coregulate the expression of several of the interleukin genes (Ref. 20). Comparative genomics also allows a more rapid and less expensive identification of novel cis-regulatory sequences. For example, an analysis of the tissue specific proximal regulatory elements controlling MSX1 gene expression took at least 18 months to accomplish using transgenic mice (Fig. 3A). In contrast, comparative analysis of chicken, mouse and human genome sequences permitted the accurate identification of these enhancers within 1 day (Ref. 21). Using this technique, other groups have successfully identified a number of highly conserved cis-regulatory regions that lie at significant distances from the genes they control (Ref. 22). However, due to perceived differences in transcription factor binding between species (Ref. 23) and by pilot ENCODE data which suggested a lack of significant evolutionary constraint on regulatory sequences within the human genome (Ref. 24) comparative genomics has largely been replaced in favour of high throughput next generation sequence based technologies to be described in the following paragraphs.

ChIP-seq, FAIRE-seq, DNase-seq and CAGE Enhancer identification by comparative genomics has been largely superseded in favour of genome wide chromatin immunoprecipitation (ChIP) based protocols (Refs 25, 26), detection of enhancer associated transcripts (CAGE) (Ref. 27) or by techniques that identify uncondensed or nucleosome depleted regions of the genome (FAIRE-seq and DNase-seq). Using these

techniques a number of epigenetic and protein interaction markers have emerged as being diagnostic for enhancers. For example, H3K4me1 and H3K27me3 epigenetic markers and substitution of H3 and H2 with more ‘mobile’ histone proteins such as H3.3 and H2A.Z are now accepted and widely used markers for active enhancer regions (Table 1 and Fig. 1) (Ref. 28). Furthermore, binding of cofactors such as the ubiquitously expressed p300 is also a proven enhancer diagnostic (Table 1,) (Ref. 25). p300 does not bind DNA but associates with transcription factors bound at enhancers where it acts as a histone acetyltransferase (Fig. 1). Using p300 analysis Attanasio et al. (2013) were able to confirm the presence of enhancer sequences that had previously been described by deletion analysis next to the MSX1 locus (Refs 18, 29). Protein–DNA associations and epigenetic histone modifications, that are accepted diagnostics of active enhancer regions, are detected and analysed using a technique called ChIP-seq. ChIP-seq analysis initially involves the brief treatment of cultured cells with formaldehyde followed by chromatin recovery. The effects of formaldehyde ensure that protein–DNA interactions in the chromatin are, in essence, ‘freeze framed’ by covalent crosslinking. This chromatin is broken up by sonication and incubated with an antibody specific for the protein or histone modification under analysis. Chromatin–protein–antibody complexes are recovered using beads that can be pelleted by centrifugation or removed from suspension using magnets to permit washing and enrichment. Once formaldehyde cross links are reversed, all of the DNA that was contained in the chromatin–protein–antibody complexes are subjected to next generation sequencing. The presence and degree of binding of the antibody target proteins can then be assessed on a genome wide level by comparison of the immunoprecipitated DNA populations to the human genome sequence (Ref. 30). The protocol for finding specific histone modification or transcription factor and co-factor binding are essentially the same and only differ in the choices of antibodies used for chromatin precipitation (Ref. 31). Another diagnostic of active enhancers takes advantage of the premise that regions of the genome actively involved in controlling

Accession information: doi:10.1017/erm.2014.23; Vol. 17; e4; January 2015 © Cambridge University Press 2015

The noncoding human genome and the future of personalised medicine

http://www.expertreviews.org/

7

expert reviews in molecular medicine

transcription do not associate as strongly with nucleosomes (Fig 2A and B). Thus, DNase1 sensitivity mapping (DNase-seq) and ‘Formaldehyde Assisted Identification of Regulatory Elements’ (FAIRE-seq) are similar in concept in that they involve processes that degrade nucleosome depleted DNA. In the case of DNase-seq, isolated cell nuclei are incubated with the DNase1 enzyme that diffuses into the nuclei and degrades exposed DNA. This fragmented DNA can then be recovered and sequenced using next generation sequencing (Ref. 32). FAIRE analysis also detects regions of nucleosome denuded DNA and can be carried out on whole cells but relies on the cross linking of tightly associated nucleosomes to their DNA (Ref. 33). Chromatin is then recovered from these cells, sonicated and subjected to phenol extraction. Tightly cross linked nucleosomeDNA complexes will partition into the phenol phase whereas naked DNA, that was not so tightly associated with nucleosomes, will partition into the aqueous phase. All of the DNA in the aqueous phase is then sequenced using next generation sequencing. Comparisons of data derived from FAIRE-seq, DNase1-seq and ChIP-seq techniques show considerable consistency in predicting active chromatin and have been used widely by the ENCODE consortium to identify active enhancers across the human genome in a number of human cancer cells and embryonic stem cells (http:// genome.ucsc.edu/). Very recently studies have been published describing the identification of enhancer sequences using a technique called cap analysis of gene expression (CAGE) (Ref. 27). CAGE was originally developed as a method for identifying novel gene transcripts but has been adapted to identify enhancer activity that, in contrast to the single direction transcripts characteristic of protein coding promoters, can be characterised by balanced bi-directional caped transcripts (Ref. 27). In this way Andersson et al. (2014) have been able to produce the PHANTOM5 CAGE expression atlas based on a large number of human primary cell, tissue and transformed cell lines. Data based on CAGE defining the presence of active promoters seems to compare favourably with that derived from the ENCODE consortium but has been derived from a much larger pool of primary cells and tissue samples.

In addition, Andersson et al. also reported significant multispecies conservation of the majority of enhancers identified by CAGE.

Silencer sequences Silencers have not been as well studied as enhancers or promoters because they are harder to detect using cell or transgenic based methods of analysis. Despite this lack of knowledge, their requirement for health and their role in disease may be just as critical as for promoters and enhancers. Generally, repressed DNA has been associated with specific histone marks such as H3K9me2/3 and H3K27me2/3. However, evidence that these markers are specifically diagnostic for silencer elements is not compelling. Silencers have also been suggested through their binding by repressive protein binding sequences such as nuclear restrictive silencing factor (NRSF or REST) (Ref. 34) or by co-repressors such as the polycomb group protein complexes PRC1 and PRC2 (Table 1 and Fig. 2) (Ref. 35). One of the suggested roles of silencer elements is to fine tune the activity of promoter and enhancer elements within regions of DNA that have been de-repressed by the loosening of nucleosome interactions. One other real possibility has not yet been ruled out; that many elements classed as silencers in one cell type may take on an enhancer role in another cell type. Thus a combination of transcription factor binding, histone modification, DNA-methylation and corepressor binding may change, what would be classed as an enhancer in one cell type, to having the characteristics of a silencer element in another cell type (Ref. 36).

Looping, insulators and CTCF Many different independent studies have indicated the significant distances that enhancer sequences are able to influence promoter function. One of the most widely cited and elegant studies centres on the discovery of an enhancer within intron 5 of the LMBR1 gene that controls the expression of the Sonic Hedgehog (SHH) gene within the zone of polarising activity (ZPA) (Ref. 37). The ZPA is a structure essential to the development of the anterior–posterior axis of the mammalian limb bud which is a property governed by expression of SHH. An enhancer required to drive expression of SHH in the ZPA was discovered

Accession information: doi:10.1017/erm.2014.23; Vol. 17; e4; January 2015 © Cambridge University Press 2015

The noncoding human genome and the future of personalised medicine

http://www.expertreviews.org/

8

expert reviews in molecular medicine

following the chance generation of a transgenic mouse (sasquatch) with pre-axial polydactyly (PDD) which closely resembled the human condition (Ref. 38). It was found that the deletions and point mutations associated with human PDD coincided with the transgene integration site in the sasquatch mouse. What was remarkable about this study was that the novel enhancer they discovered did not affect the expression of the LMBR1 gene but, instead, was found to be responsible for controlling the expression of SHH from a distance of 800 kb (Ref. 38). A number of examples of other remote enhancers have also been reported. For example, comparative genomics and transgenic analysis was able to identify remote enhancers of the TAC1 gene, that encodes substance-P (SP), which were responsible for driving TAC1 expression in sensory neurons and amygdala where SP plays a role in modulating pain and anxiety (Refs 39, 40). These enhancers lay 152 and 214 kb away from the TAC1 TSS (Fig. 3B). Further evidence for the distances at which enhancers operate comes from the ENCODE consortium which used chromatin conformation capture (3C) based technologies to survey long range DNA interactions within the human genome. 3C involves cross linking chromatin within cells with formaldehyde then restricting the DNA component with restriction enzymes. Restricted DNA is then ligated using DNA ligase to produce a population of circular DNA molecules whose composition reflects areas of DNA that were in close proximity prior to formaldehyde treatment. The main advance in this technique has been to analyse DNA derived from 3C by next generation sequencing using techniques called carbon copy 3C (5C) and Hi-C (Ref. 41). 5C and HiC permit identification of long distance genomic interactions on a genome wide level. Using these techniques the ENCODE consortium calculated that the average distance at which distal regulatory regions (another name for silencers and enhancers) affect the activity of their target promoters is 120 kb (Ref. 14). One caveat to this observation is that, despite being consistent with other evidence showing long range interactions, analysis of data derived from 5C and Hi-C technologies may be biased against closer interactions. The distances at which these enhancers lie from the genes they control is governed by the looping of intervening DNA by a process involving the binding of the CCCTC-

binding factor (CTCF) to sequences called insulators (Table 1 and Fig. 1). Insulators protect promoters from the influence of enhancers or silencers. Because the cohesin protein is essential for DNA looping and is frequently co-localised with CTCF it is thought that CTCF and cohesin work together to form and stabilise DNA loops thus limiting the influence of enhancers and silencers within these loops (Fig. 1) (Ref. 42). A recent study using random insertion of transposons containing LacZ reporters into the mouse genome by Symmons et al. (2014) helped to confirm these observations and identified large genomic segments called topological activation domains (TADs) that seemed to define the range of influence of groups of enhancer regions. Intriguingly, these TADs showed a strong correlation with long range interaction domains detected using Hi-C and were delimited by concentrations of CTCF and cohesin binding sites (Ref. 43). Up until the turn of the millennium it was widely taken for granted that human RNA polII was distributed homogenously throughout the nucleus. However, immunohistochemical analysis of the distribution of RNA polII in the mammalian nucleus demonstrated extensive clustering. This suggests that transcription does not occur uniformly within the nucleus but is concentrated in immobilised regions called transcription factories. It has been hypothesised that loops of de-condensed chromatin are pulled into transcription factories where enhancers, and the promoters they control, are aligned together to permit tissue specific modulation of RNA polII activity rates (Ref. 44).

Gene regulation and tissue specificity Although there is now a huge amount of information available through the UCSC browser, ENSEMBL and ENCODE; information whose existence would have been inconceivable even 10 years ago, the majority of this information fails to address one of the most important aspects of gene regulation in multicellular species, namely tissue and cell specificity. The role of enhancers in ascribing cell specificity was first reported in 1982 (Ref. 45). Since that time numerous studies have discovered hundreds of promoters and enhancers with very specific cell and tissue activity. Indeed, it is now understood that the purpose of enhancer–promoter interactions is

Accession information: doi:10.1017/erm.2014.23; Vol. 17; e4; January 2015 © Cambridge University Press 2015

The noncoding human genome and the future of personalised medicine

http://www.expertreviews.org/

9

expert reviews in molecular medicine

not only to modulate gene expression within an individual cell, but to make sure that these genes are expressed in the correct cells, at the correct time, in the correct amount and, most importantly, in response to the correct signal transduction stimuli. One criticism of the ENCODE project is that, although the situation is changing and more emphasis is being placed on primary cells, it largely relied on exploring gene regulatory landscapes within cell lines that bear little or no resemblance to late embryonic, neonatal or adult human primary cells in-vivo. In addition, cancer and stem cells represent transcriptionally permissive cell types where RNApolII can be up to 1000 times more active than in normal human somatic cells (Ref. 3). So it could be implied that much of the 75% of the genome undergoing transcription as determined using cancer and stem cells could be as a result of heightened RNA polII processivity within these cells. Most importantly, the genomes of metazoan cells have evolved in heterogeneous cellular environments where the genomes in different cell types interact through highly specific and context dependent ligand–receptor interactions, signal transduction networks and transcription factor–DNA interactions. Attempting to understand these complex interaction using monocultures of highly divergent cancer or embryonic stem cells could be perceived as questionable when exploring the complexity of cell communication required for processes such as brain function, the immune system or embryonic development. The limitations of using cancer cell lines were recognised in the mid 1990s as many of the regulatory regions under examination, which had been shown to act as tissue specific enhancers in embryonic systems, were silent in transformed cell lines. Thus, only by exploring the regulation of the mouse Msx1 gene in transgenic embryos was it possible to detect and characterise the enhancers that drive MSX1 expression in limbs, face and neural crest cells migrating into the developing heart (Fig. 3A) (Refs 18, 21, 46). Further strides in identifying and characterising tissue specific enhancers have recently been taken by the formidable teaming of Visel, Pennachio and Rubin who used ChIP-seq to detect the interaction of the p300 co-factor with chromatin derived from embryonic mouse facial tissues (Ref. 29). p300 was previously found to act as a marker for active

enhancer regions (Fig. 1) and the Visel group were able to use genome wide ChIP-seq in primary cells derived from mouse embryo craniofacial tissues to detect a series of enhancers which were active at embryonic day 11. Importantly, and in contrast to conflicting reports derived from preliminary ENCODE data, Attanasio et al. (Ref. 29) were able to confirm that 87% of the 4399 enhancers discovered in this way had orthologous sequences in humans and showed evolutionary constraint. This ground breaking study not only demonstrated that ChIP-seq analysis of p300 chromatin binding in primary cells could identify active tissue specific enhancers but that these enhancer sequences where highly conserved demonstrating that comparative genomics still has an important role to play in detecting cis-regulatory sequences.

Signal transduction response and enhancer–promoter synergy The cells of the human body are unable to maintain human health in isolation of each other and they must communicate. Cell–cell communication initiates at the cell surface as a result of ligand–receptor interactions and terminates by the activation of transcription factors that subsequently bind cis-regulatory sequences in the genome (Fig. 1). Signal transduction cascades are essential links in this flow of information from the cell surface to the nucleus (Fig. 1). Cis-regulatory sequences act as genomic ‘sense organs’ or ‘receptors’ whose primary functions are to recognise, organise and relay signal transduction information, by interfacing with transcription factors activated by signal transduction pathways. Once assembled in the correct order on cis-regulatory sequences, these activated transcription factors modulate the activity of RNApolII in a structured and cell specific manner. Thus far, extensive analysis has identified, isolated and characterised many hundreds of enhancer regions and many studies have gone onto describe their tissue specificity and to identity the transcription factors that bind to them. However, identifying the signal transduction pathways that maintain the tissue specific activity of these cis-regulatory sequences and how they relay this information to gene promoters is largely unexplored. This aspect of cis-regulatory function is critical to understand how SNPs and epigenetic modification cause

Accession information: doi:10.1017/erm.2014.23; Vol. 17; e4; January 2015 © Cambridge University Press 2015

The noncoding human genome and the future of personalised medicine

http://www.expertreviews.org/

10

expert reviews in molecular medicine

disease as the effects of many SNPs are not obvious until viewed in the correct cellular context or until the correct signal transduction stimulus is applied. The way forward in this respect is to initiate studies that identify how the activity of these cisregulatory sequences is altered by activation or inhibition of specific signal transduction pathways. For example at least 3 enhancer regions have been found that control aspects of the expression of the MSX1 gene in the developing mouse embryo using a combination of deletion analysis and comparative genomics (Fig. 3A) (Refs 18, 46). Subsequent analysis using a combination of transgenic embryo culture and the application of cell signalling agonists and antagonists (soaked into agarose beads or applied in the culture medium) showed that activity of these enhancers was subject to modulation by PKA and Wnt signalling pathways (Fig. 3A) (Refs 21, 46). In addition to understanding the role of tissue specific cis-regulatory regions in relaying signal transduction information during development it is also important to explore their role in adult physiological processes. For example, the TAC1 gene produces neuropeptides, that include substance P (SP) and neurokinin A (NKA), in a highly tissue specific manner and plays an important role in mood modulation, appetite and inflammatory pain (Ref. 4). Publication of the chicken genome sequence in 2004 (Ref. 47) permitted a comparative genomic analysis that was able to identify two regions of extreme conservation (310 million years) that lay 152 and 214 kb 5′ of the TSS of the human TAC1 gene (Fig. 3B). Cloning of the closest sequence to TAC1, called ECR1 (evolutionally conserved region 1), and analysis using transgenic mice and immunohistochemistry demonstrated that it was active in medial amygdala neurones in cells also shown to express SP (Refs 39, 48) and where SP is known to increase anxiety behaviour in rodents (Fig. 3B) (Ref. 49). The second peak of homology that lay 214 kb from the TAC1 TSS (ECR2) proved to be more of a challenge to analyse as ECR2 did not support the activity of any of the generic ‘off the peg’ promoters that were initially used to determine its tissue specific properties. Only when ECR2 was combined with 996 bp of the TAC1 promoter region could transgenic animals that supported expression of the LacZ marker in SP expressing cells of the dorsal root ganglia be successfully produced (Fig. 3B) (Ref. 40). Whole

dorsal root ganglion explant cultures, transgenic for the ECR2 enhancer in combination with the TAC1 promoter, were subsequently used to show that both the transgene and the endogenous TAC1 gene could be stimulated in larger diameter sensory neurons; an expression pattern associated with chronic pain, using the inflammatory mediator capsaicin (Ref. 40). Intriguingly, further experiments suggested that signalling through the MEK/ERK was responsible for relaying the effects of capsaicin and that treatment with a MEK antagonist could effectively reverse the effects of capsaicin on the induction of ECR2-TAC1 promoter activity (Fig. 3B) (Refs 40, 50). These experiments highlight the ability to identify tissue specific cis-regulatory sequences using comparative genomics and the use of in-vivo (transgenic mouse), ex-vivo (transgenic explant) and primary cell transfection based approaches. These observations also emphasised the requirement of many enhancers for specific promoters in order to support tissue specific expression and signal transduction responses and question the validity of using foreign ‘generic’ promoters. Thus, an ability to demonstrate a role for enhancer– promoter synergy in tissue specificity and the interpretation of signal transduction information represents an important step in the design of future personalised therapeutic strategies.

The role of gene regulatory polymorphisms in human health and disease Regulatory polymorphisms can prevent the proper interaction of cis-regulatory sequences with the transcription factor proteins required to support their tissue specific and inducible properties by altering their recognition sequences (Fig. 2C). Conversely, allelic variants could also attract a transcription factor whose binding would induce significant functional changes. Despite strong evidence for the role of regulatory polymorphisms in the development of malformations such as preaxial polydactyly and Hirschsprung disease (Refs 37, 38, 51, 52, 53, 54) these elegant studies are regarded as the exception rather than the rule and many biomedical scientists continue to work under the assumption that human malformation and heritable disease are mostly a function of protein malfunction (hence the continued use of exome sequencing in the detection of disease causing variants). The sequencing of the human genome

Accession information: doi:10.1017/erm.2014.23; Vol. 17; e4; January 2015 © Cambridge University Press 2015

The noncoding human genome and the future of personalised medicine

http://www.expertreviews.org/

11

expert reviews in molecular medicine

made possible the use of GWA studies for the nonbiased association of polymorphic variation with disease. Unexpectedly, it was shown that up to 93% of disease associated SNPs fall within regions of the genome that do not encode protein. Moreover, 73% of these can be detected in regions that have been identified by FAIREseq and DNaseI-seq as having a possible regulatory function (Ref. 1). Thus, it is essential that more emphasis be placed on determining the role of gene regulation in disease susceptibility. The results of Maurano et al., (2012) lends further support to the idea that a better understanding of how to examine the effects of polymorphisms on cis-regulatory sequences is essential to the future progress of personalised therapeutic strategies. Although GWA analysis overcame many of the limitations previously faced by association analyses of candidate loci, (now regarded by many as being statistically underpowered), GWA studies are still limited by their lack of resolution. Thus, although most GWA studies analyse SNP distributed across the whole genome only a limited proportion of the actual numbers of SNPs known to exist are represented (i.e. 500 thousand from a known 12 million SNPs). Although there is a current emphasis within clinical settings on the use of whole exome sequencing (WES) in the identification of disease causing variants this approach excludes the noncoding genome. However, thanks to the plummeting costs of whole genome sequencing it is likely that the limitations of GWA analysis and WES will be overcome by the availability of the whole genome sequence (WGS) of many thousand individuals within the next decade (Ref. 55). In this way, it is likely that the identification of disease causing loci within the noncoding genome will become a possibility in the near future. Although chromatin modification, co-factor binding (p300) and detection of open chromatin (FAIRE-seq and DNase-seq) represent powerful methods of enhancer detection they require that the enhancer is active. From what we have learned about enhancers, their high degrees of cell specificity, their reliance of specific promoters and the presence of the correct stimuli suggests that choosing the correct paradigm for characterising a specific enhancer is rarely achieved using cell lines. Thus, comparative genomics represents an effective

way to identify tissue specific enhancer regions that may be silent in the majority of cell types thus eluding detection by ChIP-seq, FAIRE or DNaseI. For example, comparative genomics, molecular biology and transgenic mouse analysis was used to identify, clone and characterise an enhancer (GAL5.1) that was responsible for supporting the expression of the galanin gene in the paraventricular nucleus (PVN) of the hypothalamus where galanin modulates fat and alcohol intake (Fig. 4A) (Ref. 56). These studies also explored the signal transduction systems controlling GAL5.1 and demonstrated that GAL5.1 responded to PKC signalling (Fig. 4A) but did not respond to MAPkinase or PKA stimulation. Further analysis showed that GAL5.1 contained two polymorphisms and existed in the human population as two haplotypes (GG>CA) where the CA haplotype was 40% less active than the GG haplotype in primary hypothalamic neurons (Fig. 4A) (Ref. 56). Subsequent clinical studies involving fMRI studies multiple patient samples were able to determine that the ‘GG’ genotype was associated with alcohol abuse (Ref. 57). However, attempts to recapitulate the identification of GAL5.1 using information contained in the UCSC browser (including ENCODE) were not encouraging. Only two conclusions can be drawn from this; that GAL5.1 is not an enhancer (unlikely given its capacity to drive highly cell specific expression in transgenic mice) or that because of its previous reliance on cancer and embryonic stem cell lines the techniques used by ENCODE are not able to efficiently identify cis-regulatory sequences that show highly tissue specific expression. These studies support the hypothesis that comparative genomics can still be regarded as a potent method of identifying highly tissue specific cisregulatory sequences and complements ChIP-seq and DNAse/FAIRE-seq based technologies in defining the functional consequences of genetic variation and epigenetic modification in health and disease. Thus, it is also highly likely that combining comparative genomics with WGS will become a standard approach for the identification of disease causing loci. Studies of the regulation of the BDNF gene have highlighted another factor that must be considered when attempting to define the effects of disease associated polymorphisms on cis-regulatory function; that of combinatorial signal

Accession information: doi:10.1017/erm.2014.23; Vol. 17; e4; January 2015 © Cambridge University Press 2015

The noncoding human genome and the future of personalised medicine

http://www.expertreviews.org/

12

expert reviews in molecular medicine

(A)

C

rs2513282

PK

rs2513281

PVN

G

G

Hypothalamus

GAL5.1

Galanin 42 kb

C

A

(B)

Amygdala

C

BL A

CA2 . po Hip

Cortex

A1

Cell depolarisation

BE5.2

CeA

BDNF prom4 T

21 kb

rs12273363 C

(C)

BE5.2

PKA

PKC

Cell depolarisation

BDNF prom4 T

rs12273363 C

X

X

Enhancer-promoter interactions in the regulation of the GAL and BDNF genes. Expert Reviews in Molecular Medicine © 2015 Cambridge University Press Figure 4. A summary of enhancer–promoter interactions in the regulation of the GAL gene (A) and the BDNF gene (B and C) and the effects of disease associated allelic variants. (See figure 3 for key) (A) The tissue specific properties of the GAL5.1 enhancer in the periventricular nucleus (PVN) of the hypothalamus and the differential effects of the GG (Thick red arrow) and CA (thin red arrow) haplotypes in primary hypothalamic cell cultures. (B) Diagram describing the effects of different alleles (rs12273363, C-T) of the BE5.2 silencer element on BDNF promoter 4 in amygdala neurones. (C) Combined treatment with PKA and PKC reverses the repression BDNF promoter 4 by BE5.2 thus allowing its response to cell depolarisation in a manner analogous to a Boolean AND gate. BLA, basolateral amygdala; CeA, central amygdala: CA1, CA2, CA1 and CA2 regions of the hippocampus (Hippo).

Accession information: doi:10.1017/erm.2014.23; Vol. 17; e4; January 2015 © Cambridge University Press 2015

The noncoding human genome and the future of personalised medicine

http://www.expertreviews.org/

13

expert reviews in molecular medicine

transduction effects of cis-regulatory sequences. For example, a polymorphism associated with mood disorders (rs12273363) (Ref. 58) was found within a highly conserved putative cisregulatory sequence (BE5.2, Fig. 4B) close to the BDNF locus. Further analysis showed that this element acted as a silencer of BDNF promoter 4 (BP4) activity in primary neurones derived from cortex, hippocampus or amygdala; where BP4 is active and is required for mood modulation. These studies demonstrated that the depression associated C-allele was a stronger silencer than the T-allele (Fig. 4B) (Ref. 59). Activation of PKA or PKC pathways failed to have any effect of the ability of BE5.2 to silence BP4 when activated in isolation (Ref. 59). However, when PKA and PKC were activated together the silencing effects of BE5.2 were significantly reduced in a manner analogous to an AND gate in Boolean logic (Fig. 4B) (Ref. 59). These experiments demonstrate the probability that many other cisregulatory sequences act in a manner analogous to logic gates used in Boolean algebra where signal transduction systems act as inputs and RNApolII activity is the output.

Cis-regulatory sequences at the cross roads of genetics and epigenetics Epigenetics is the study of heritable changes in gene activity that are not caused by changes in the DNA sequence. The mechanisms responsible include environmentally induced methylation/ de-methylation of CpG dinucleotides (Fig. 2D) and histone acetylation/methylation (Fig. 1). Histone modification is used as an accurate marker of chromatin activity whereby promoters, enhancers and silenced chromatin are distinguished by methylation/acetylation of specific amino-acid residues (Table 1 and Fig. 1). What is less clear is whether these chromatin modification marks have any effect on health or play an important role in disease. In contrast, there is significant evidence that DNAmethylation can be altered by environmental influences such as early life stress and strongly influences gene expression (Fig. 2D). DNAmethylation through the production of 5methyl-cytosine (5 mC) at CpG dinucleotides by the enzymes DNMT3A and DNMT3B is used as a marker for transcriptionally repressed chromatin. However, recent studies have determined that an intermediary product of demethylation of 5 mC by the Tet enzyme to 5-

hydroxymethyl-cytosine (5 hmC) is associated with both poised and active enhancers and promoters (Refs 60, 61). This observation presents problems for interpreting DNAmethylation databases generated using bisulphite conversion sequencing as these older techniques cannot differentiate between 5 mC and 5 hmC. However, a novel oxidative bisuphite sequencing technique now permits differentiation between 5 mC and 5 hmC (Ref. 62). It is clear that CpG methylation plays a critical role in health and has shown to be significantly altered in many different disease states including cancer, cardiovascular disease, obesity and type II diabetes (Refs 63, 64, 65, 66). Epigenetics and DNA-methylation have also been linked to stress induced mood disorders (Ref. 67). For example, early life stress resulted in hypomethylation of an enhancer that regulates the expression of the neuropeptide arginine vasopressin (AVP) in mice (Refs 68, 69). Murgatroyd et al., found that hypomethylation of this AVP enhancer increased the activity of the AVP gene in the PVN, an expression profile associated with depression (Refs 69, 70, 71, 72, 73, 74, 75, 76). Intriguingly, many polymorphisms are able to confer susceptibility or resistance of cis-regulatory sequences to epigenetic modification by DNAmethylation. For example, the CA allele of the galanin gene enhancer; GAL5.1, is rendered more susceptible to DNA methylation by the incorporation of a novel CpG site (Ref. 56). In addition, the obesity and addiction associated Tallele within an intronic enhancer of the cannabinoid-1 receptor gene decreased the susceptibility of this enhancer to methylation (Ref. 77). Thus, cis-regulatory sequences are a major functional target for DNA-methylation in the genome and may represent the frontier at which epigenetic modification and genetics interact to maintain health or increase disease susceptibility. One important consideration that will have a significant impact on the future of discovering the causes of disease is the strong likelihood that manifestation of many disease phenotypes may be influenced by epigenetic modification. Thus, the effects of SNPs on enhancer or promoter function might be increased or masked by DNA-methylation raising the real possibility that the interpretation of many GWA studies of disease susceptibility may be skewed by the functional consequences of epigenetic modification.

Accession information: doi:10.1017/erm.2014.23; Vol. 17; e4; January 2015 © Cambridge University Press 2015

The noncoding human genome and the future of personalised medicine

http://www.expertreviews.org/

14

expert reviews

A - Permissive (euchromatin) Pharmacological stimulation

Signal transduction systems

in molecular medicine

B - Allelic differences

C - Epigenetics (methylation)

Pharmacological stimulation

Signal transduction systems

Signal transduction systems

Transcription factor activation

Transcription factor activation Altered transcription factor binding

Transcription factors bind regulatory region

Pharmacological stimulation

Transcription factor activation Transcription factor binding restricted

CRS

Prom

Regulation of promoter activity

CRS

Prom

No (mis-) regulation of promoter activity

CRS

Me Me Me

Me Me Me

X

V

Prom

No (mis-) regulation of promoter activity

Genetic and epigenetic effects on drug efficacy in the human genome. Expert Reviews in Molecular Medicine © 2015 Cambridge University Press Figure 5. Simplified summary diagram describing (A) how the genome detects the effects of drug treatments in the cell and how drug effects are modulated by (B) allelic variation and (C) epigenetic modification. CRS, cis-regulatory sequence; Me, methylated CpG dinucleotide; V, allelic variant.

The future of signal transduction and the cisregulatory genome in personalised medicine Recent emphasis in medical genetics has been the use of the human genome sequence to permit GWA analysis to explore the causes of heritable human disease, not just in exomic sequence but throughout the genome. Furthermore, the tremendous acceleration in our ability to sequence whole genomes brings the possibility of accessing individual patient genome sequence as easily as accessing patient records. By gaining insights into the functionality of the noncoding human genome, how this genome interacts and responds to signal transduction cues and how SNPs and epigenetic modification affect these responses, it is hoped that susceptibility to many life-threatening conditions might be predicted early in life to help decide the best therapeutic options; one of the targets of personalised medicine. A further obstacle to the provision of personalised medicine is the wide variation in drug responses obvious in the patient population which range from a complete lack of efficacy to the development of significant side effects. Although some of this variation can be accounted for by individual differences in drug metabolism by enzymes such as Cytochrome P450 (Ref. 78) metabolism is unable to fully

account for all variation in drug effects and a number of noncoding loci, that do not involve metabolic enzyme loci, have been identified that alter drug efficacy (Ref. 79). For example, because of its role in appetite, addiction, cognition and inflammatory pain the cannabinoid 1-receptor has been the subject of much interest in the development of novel antiaddiction and anti-obesity therapies. Rimonabant; an inverse agonists of CB1, was initially developed as an anti-smoking therapeutic and then as an appetite suppressant. However, rimonabant had to be withdrawn from the market because of side effects that included suicidal feelings and depression (Ref. 80). Indeed, drug response stratification continues to be a major problem in the development of novel cannabinoid-based therapeutics and analysis of the CB1 receptor coding region failed to identify a nonsynonymous polymorphism that could account for these differences. An alternative to the involvement of metabolic or coding polymorphisms in cannabinoid drug response stratification is that regulation of components of the cannabinoid system, such as receptors or the enzymes involved in the production, or degradation of, endogenous cannabinoids, might be altered by SNPs or epigenetic

Accession information: doi:10.1017/erm.2014.23; Vol. 17; e4; January 2015 © Cambridge University Press 2015

The noncoding human genome and the future of personalised medicine

http://www.expertreviews.org/

15

expert reviews in molecular medicine

modification of cis-regulatory sequences downstream of the receptor or signal transduction cascade drug target (Fig. 5). Clues to the possible mechanisms through which polymorphisms in the noncoding genome affects drug response can be seen in the significant differences observed between the C and T-allele of the CNR1 intron 2 enhancer where the T-allele responds much more strongly to activation of MAPkinase pathways (Ref. 77). Indeed, these differences are currently being explored as a possible cause of cannabinoid drug stratification.

The future of understanding the role of the noncoding genome in personalised medicine As was the case for gene coding sequences in the past the real ‘proof of the pudding’ with regard to the function of cis-regulatory elements and the effects of polymorphic variation will be to delete or manipulate their sequences from the genomes of living animals in order to explore disease related changes in physiology and behaviour. In the past this was achieved using gene knockout studies involving embryonic stem (ES) cell gene targeting in mice. Although ES cell targeting has sometimes been used to knock out putative enhancer regions in mice this technology proved to be too expensive, technically challenging and time consuming to permit the regular targeting of putative cis-regulatory sequences. However, with the recent promises of rapid and inexpensive genome editing techniques such as CRISPR (Refs 81, 82) for the manipulation of mammalian genomes we may be poised on a threshold of a new age of functional analysis of the cis-regulatory genome in mammals that will accelerate the development of personalised medicine.

Conclusions The publication of the human genome sequence over 10 years ago allowed for the development of novel technologies such as GWA analysis of genetic disease and, more recently, the genome wide analysis of the regulatory genome using techniques reliant on next generation sequencing. This huge volume of information has permitted a revolution in our understanding of the role of gene regulation in the development of human disease and to make major inroads into understanding the transcriptional mechanisms behind many common

human diseases. Indeed, it is almost certain that the future of predicting and combating disease will rely on understanding the complexities of the regulatory genome. Despite the huge advances made in our understanding of the cis-regulatory landscape of the human genome in the last 10 years, huge obstacles, such as a lack of tissue specific high throughput screening technologies, prevent further understanding of the effects of genetic and epigenetic changes on gene regulation and their roles in disease. Thus, mere examination the role of protein coding regions in disease and drug response is no longer a viable option. Only by factoring in the role of cell specificity and signal transduction responses in our study of the cisregulatory genome and the effects of genetic and epigenetic variation can effective progress be made in delivering the promises of personalised medicine.

Acknowledgement This work was funded by The BBSRC, the Wellcome Trust and the Medical Research Council. PC was funded by the Scottish Universities Life Science Alliance and EAH is funded by Medical Research Scotland. We would like to thank Neil Vargesson, Iain McEwan and Stefan Hoppler for critically reading our manuscript.

References 1 Maurano, M.T. et al. (2012) Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190-1195 2 Dunham, I. et al. (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 3 Graur, D. et al. (2013) On the immortality of television sets: ‘function’ in the human genome according to the evolution-free gospel of ENCODE. Genome Biology Evolution 5, 578-590 4 MacKenzie, A., Hing, B. and Davidson, S. (2013) Exploring the effects of polymorphisms on cisregulatory signal transduction response. Trends in Molecular Medicine 19, 99-107 5 Lenhard, B., Sandelin, A. and Carninci, P. (2012) Metazoan promoters: emerging characteristics and insights into transcriptional regulation. Nature Review Genetics 13, 233-245 6 Bannister, A.J. and Kouzarides, T. (2011) Regulation of chromatin by histone modifications. Cell Research 21, 381-395 7 Jin, B. and Robertson, K.D. (2012) DNA methyltransferases, DNA damage repair, and

Accession information: doi:10.1017/erm.2014.23; Vol. 17; e4; January 2015 © Cambridge University Press 2015

The noncoding human genome and the future of personalised medicine

http://www.expertreviews.org/

16

expert reviews

8

9

10

11

12

13

14 15

16

17

18

19

in molecular medicine

cancer. Advances in Experimental Medicine and Biology 754, 3-29 Long, H.K. et al. (2013) Epigenetic conservation at gene regulatory elements revealed by nonmethylated DNA profiling in seven vertebrates. Elife 2, e00348 Ginno, P.A. et al. (2012) R-loop formation is a distinctive characteristic of unmethylated human CpG island promoters. Molecular Cell 45, 814-825 van Otterdijk, S.D., Mathers, J.C. and Strathdee, G. (2013) Do age-related changes in DNA methylation play a role in the development of age-related diseases? Biochemical Society Transactions 41, 803-807 Anier, K. et al. (2013) Maternal separation is associated with DNA methylation and behavioural changes in adult rats. European Neuropsychopharmacology 24(3), 459-468 Kang, H.J. et al. (2013) Association of SLC6A4 methylation with early adversity, characteristics and outcomes in depression. Progress in Neuropsychopharmacology Biology Psychiatry 44, 23-28 Ouellet-Morin, I. et al. (2012) Increased serotonin transporter gene (SERT) DNA methylation is associated with bullying victimization and blunted cortisol response to stress in childhood: a longitudinal study of discordant monozygotic twins. Psychological Medicine 43, 1813-1823 Sanyal, A. et al. (2012) The long-range interaction landscape of gene promoters. Nature 489, 109-113 Wasserman, W.W. and Sandelin, A. (2004) Applied bioinformatics for the identification of regulatory elements. Nature Review Genetics 5, 276-287 de Villiers, J. and Schaffner, W. (1981) A small segment of polyoma virus DNA enhances the expression of a cloned beta-globin gene over a distance of 1400 base pairs. Nucleic Acids Research 9, 6251-6264 Veldman, G.M., Lupton, S. and Kamen, R. (1985) Polyomavirus enhancer contains multiple redundant sequence elements that activate both DNA replication and gene expression. Molecular and Cell Biology 5, 649-658 MacKenzie, A. et al. (1997) Two enhancer domains control early aspects of the complex expression pattern of Msx1. Mechanisms of Development 62, 29-40 Marinic, M. et al. (2013) An integrated holo-enhancer unit defines tissue and gene specificity of the Fgf8 regulatory landscape. Developmental Cell 24, 530-542

20 Loots, G.G. et al. (2000) Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science 288, 136-140 21 Miller, K.A. et al. (2007) A highly conserved Wntdependent TCF4 binding site within the proximal enhancer of the anti-myogenic Msx1 gene supports expression within Pax3-expressing limb bud muscle precursor cells. Developmental Biology 311, 665-678 22 Visel, A., Bristow, J. and Pennacchio, L.A. (2007) Enhancer identification through comparative genomics. Seminars Cell and Developmental Biology 18, 140-152 23 Odom, D.T. et al. (2007) Tissue-specific transcriptional regulation has diverged significantly between human and mouse. Nature Genetics 39, 730-732 24 Birney, E. et al. (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799-816 25 Visel, A. et al. (2009) ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature 457, 854-858 26 Visel, A., Rubin, E.M. and Pennacchio, L.A. (2009) Genomic views of distant-acting enhancers. Nature 461, 199-205 27 Andersson, R. et al. (2014) An atlas of active enhancers across human cell types and tissues. Nature 507, 455-461 28 Calo, E. and Wysocka, J. (2013) Modification of enhancer chromatin: what, how, and why? Molecular Cell 49, 825-837 29 Attanasio, C. et al. (2013) Fine tuning of craniofacial morphology by distant-acting enhancers. Science 342, 1241006 30 Barski, A. and Zhao, K. (2009) Genomic location analysis by ChIP-Seq. Journal of Cell Biochemistry 107, 11-18 31 Furey, T.S. (2012) ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions. Nature Reviews Genetics 13, 840-852 32 Song, L. et al. (2011) Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity. Genome Research 21, 1757-1767 33 Nagy, P.L. and Price, D.H. (2009) Formaldehydeassisted isolation of regulatory elements. Wiley Interdisciplinary Reviews: Systems Biology and Medicine 1, 400-406 34 Zuccato, C. et al. (2007) Widespread disruption of repressor element-1 silencing transcription factor/ neuron-restrictive silencer factor occupancy at its

Accession information: doi:10.1017/erm.2014.23; Vol. 17; e4; January 2015 © Cambridge University Press 2015

The noncoding human genome and the future of personalised medicine

http://www.expertreviews.org/

17

expert reviews

35

36

37

38

39

40

41

42

43

44

45

46

47

48

in molecular medicine

target genes in Huntington’s disease. Journal of Neuroscience 27, 6972-6983 Schwartz, Y.B. and Pirrotta, V. (2013) A new world of Polycombs: unexpected partnerships and emerging functions. Nature Review Genetics 14, 853-864 Kolovos, P. et al. (2012) Enhancers and silencers: an integrated and simple model for their function. Epigenetics Chromatin 5, 1 Lettice, L.A. et al. (2008) Point mutations in a distant sonic hedgehog cis-regulator generate a variable regulatory output responsible for preaxial polydactyly. Human Molecular Genetics 17, 978-985 Lettice, L.A. et al. (2002) Disruption of a long-range cis-acting regulator for Shh causes preaxial polydactyly. Proceedings of National Academy Science of the United States of America 99, 7548-7553 Davidson, S. et al. (2006) A remote and highly conserved enhancer supports amygdala specific expression of the gene encoding the anxiogenic neuropeptide substance-P. Molecular Psychiatry 11, 410-421 Shanley, L. et al. (2010) Long-range regulatory synergy is required to allow control of the TAC1 locus by MEK/ERK signalling in sensory neurones. Neurosignals 18, 173-185 de Wit, E. and de Laat, W. (2012) A decade of 3C technologies: insights into nuclear organization. Genes Development 26, 11-24 Chetverina, D. et al. (2014) Making connections: insulators organize eukaryotic chromosomes into independent cis-regulatory networks. Bioessays 36, 163-172 Symmons, O. et al. (2014) Functional and topological characteristics of mammalian regulatory domains. Genome Research 24, 390-400 Papantonis, A. and Cook, P.R. (2013) Transcription factories: genome organization and gene regulation. Chemical Reviews 113, 8683-8705 de Villiers, J. et al. (1982) Transcriptional ‘enhancers’ from SV40 and polyoma virus show a cell type preference. Nucleic Acids Research 10, 7965-7976 Miller, K.A. et al. (2008) Prediction and characterisation of a highly conserved, remote and cAMP responsive enhancer that regulates Msx1 gene expression in cardiac neural crest and outflow tract. Developmental Biology 317, 686-694 Consortium, I.C.G.S. (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695-716 Davidson, S. et al. (2006) Cellular co-expression of a LacZ marker gene driven by the amygdala-specific

49

50

51

52

53

54

55

56

57

58

59

60

61

ECR1 enhancer with the substance P neuropeptide. Molecular Psychiatry 11, 323 Ebner, K. et al. (2008) Substance P in stress and anxiety: NK-1 receptor antagonism interacts with key brain areas of the stress circuitry. Annals of the New York Academy of Sciences 1144, 61-73 Shanley, L. et al. (2011) Evidence for regulatory diversity and auto-regulation at the TAC1 locus in sensory neurones. Journal of Neuroinflammation 8, 10 Lettice, L., Heaney, S. and Hill, R. (2002) 2 Preaxial polydactyly in human and mouse: regulatory anomalies in digit patterning. Journal of Anatomy 201, 417 Lettice, L.A. and Hill, R.E. (2005) Preaxial polydactyly: a model for defective long-range regulation in congenital abnormalities. Current Opinion in Genetics and Development 15, 294-300 Lettice, L.A. et al. (2003) A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Human Molecular Genetics 12, 1725-1735 Emison, E.S. et al. (2005) A common sex-dependent mutation in a RET enhancer underlies Hirschsprung disease risk. Nature 434, 857-863 Ombrello, M.J., Sikora, K.A. and Kastner, D.L. (2014) Genetics, genomics, and their relevance to pathology and therapy. Best Practice and Research Clinical Rheumatology 28, 175-189 Davidson, S. et al. (2011) Differential activity by polymorphic variants of a remote enhancer that supports galanin expression in the hypothalamus and amygdala: implications for obesity, depression and alcoholism. Neuropsychopharmacology 36, 2211-2221 Nikolova, Y.S. et al. (2013) Reward-related ventral striatum reactivity mediates gender-specific effects of a galanin remote enhancer haplotype on problem drinking. Genes Brain Behaviour 12, 516-524 Juhasz, G. et al. (2011) The CREB1-BDNF-NTRK2 pathway in depression: multiple gene-cognitionenvironment interactions. Biological Psychiatry 69, 762-771 Hing, B. et al. (2012) A polymorphism associated with depressive disorders differentially regulates brain derived neurotrophic factor promoter IV activity. Biological Psychiatry 71, 618-626 Branco, M.R., Ficz, G. and Reik, W. (2011) Uncovering the role of 5-hydroxymethylcytosine in the epigenome. Nature Review of Genetics 13, 7-13 Ficz, G. et al. (2011) Dynamic regulation of 5hydroxymethylcytosine in mouse ES cells and during differentiation. Nature 473, 398-402

Accession information: doi:10.1017/erm.2014.23; Vol. 17; e4; January 2015 © Cambridge University Press 2015

The noncoding human genome and the future of personalised medicine

http://www.expertreviews.org/

18

expert reviews in molecular medicine

62 Booth, M.J. et al. (2013) Oxidative bisulfite sequencing of 5-methylcytosine and 5hydroxymethylcytosine. Nature Protocol 8, 1841-1851 63 Schmitt, A. et al. (2014) The impact of environmental factors in severe psychiatric disorders. Frontier in Neuroscience 8, 19 64 Tarry-Adkins, J.L. and Ozanne, S.E. (2014) The impact of early nutrition on the ageing trajectory. Proceedings of the Nutrition Society 73(2), 289-301 65 Glier, M.B., Green, T.J. and Devlin, A.M. (2013) Methyl nutrients, DNA methylation, and cardiovascular disease. Molecular Nutrition Food Research 58, 172-182 66 Drummond, E.M. and Gibney, E.R. (2013) Epigenetic regulation in obesity. Current Opinion Clinical Nutrition Metabolic Care 16, 392-397 67 Dalton, V.S., Kolshus, E. and McLoughlin, D.M. (2013) Epigenetics and depression: return of the repressed. Journal of Affect Disorder 155, 1-12 68 Murgatroyd, C. et al. (2009) Dynamic DNA methylation programs persistent adverse effects of early-life stress. Nature Neuroscience 12, 1559-1566 69 Murgatroyd, C. et al. (2010) Genes learn from stress: how infantile trauma programs us for depression. Epigenetics 5(3), 194-199 70 Nephew, B. and Murgatroyd, C. (2013) The role of maternal care in shaping CNS function. Neuropeptides 47, 371-378 71 Murgatroyd, C. and Spengler, D. (2012) Epigenetic programming of the HPA axis: early life decides. Stress 14, 581-589 72 Murgatroyd, C. and Spengler, D. (2011) Epigenetics of early child development. Frontiers in Psychiatry 2, 16

73 Menger, Y. et al. (2011) Sex differences in brain epigenetics. Epigenomics 2, 807-821 74 Bettscheider, M., Murgatroyd, C. and Spengler, D. (2011) Simultaneous DNA and RNA isolation from brain punches for epigenetics. BMC Research Notes 4, 314 75 Murgatroyd, C. et al. (2010) The Janus face of DNA methylation in aging. Aging (Albany NY) 2, 107-110 76 Murgatroyd, C. and Spengler, D. (2010) Histone tales: echoes from the past, prospects for the future. Genome Biology 11, 105 77 Nicoll, G. et al. (2012) Allele-specific differences in activity of a novel cannabinoid receptor 1 (CNR1) gene intronic enhancer in hypothalamus, dorsal root ganglia, and hippocampus. Journal of Biological Chemistry 287, 12828-12834 78 Heller, F. (2013) Genetics/genomics and drug effects. Acta Clinica Belgica 68, 77-80 79 Lanni, C., Racchi, M. and Govoni, S. (2013) Do we need pharmacogenetics to personalize antidepressant therapy? Cell Molecular Life Science 70, 3327-3340 80 Kangas, B.D. et al. (2013) Cannabinoid discrimination and antagonism by CB(1) neutral and inverse agonist antagonists. Journal of Pharmacology Experimental Therapeutics 344, 561-567 81 Gaj, T., Gersbach, C.A. and Barbas, C.F., 3rd (2013) ZFN, TALEN, and CRISPR/Cas-based methods for genome engineering. Trends in Biotechnology 31, 397-405 82 Singh, P., Schimenti, J.C. and Bolcun-Filas, E. (2014) A mouse geneticist’s practical guide to CRISPR Applications. Genetics (in Press)

Further reading, resources and contacts Useful Web sites UCSC genome browser http://genome.ucsc.edu/. Human Genome Navigator http://hugenavigator.net/HuGENavigator/home.do. Further Reading Maurano, M. T., et al. (2012). Systematic localization of common disease-associated variation in regulatory DNA. Science 337(6099): 1190-5. Graur, D., et al. (2013). On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE. Genome Biol Evol 5(3): 578-90. Murgatroyd, C. and Spengler, D. (2012). Epigenetic programming of the HPA axis: early life decides. Stress 14(6): 581. MacKenzie, A., Hing, B. and Davidson, S. (2013). Exploring the effects of polymorphisms on cis-regulatory signal transduction response. Trends Mol Med 19(2): 99-107.

Accession information: doi:10.1017/erm.2014.23; Vol. 17; e4; January 2015 © Cambridge University Press 2015

The noncoding human genome and the future of personalised medicine

http://www.expertreviews.org/

19

expert reviews in molecular medicine

Features associated with this article Figures Figure 1. Simplified summary of the flow of information within cells demonstrating the involvement of cisregulatory regions in this process and demonstrating histone modification marks (K4/9/27 me/ac all refer to histone 3) and co-factor binding (p300 and PRC1/2) Figure 2. A simplified diagram summarising (A) permissive and (B) repressive chromatin states and the effects of (C) allelic variants and CpG methylation on transcription factor binding. Figure 3. A summary of two examples of what is known about the regulation of the MSX1 (A) and TAC1 (B) genes. Figure 4. A summary of enhancer–promoter interactions in the regulation of the GAL gene (A) and the BDNF gene (B and C) and the effects of disease associated allelic variants Figure 5. Simplified summary diagram describing (A) how the genome detects the effects of drug treatments in the cell and how drug effects are modulated by (B) allelic variation and (C) epigenetic modification. Table Table 1. Summary of the characteristics of cis-regulatory regions including promoters (types I–III), enhancers, silencers and insulators.

Citation details for this article Philip Cowie, Elizabeth A. Hay and Alasdair MacKenzie (2015) The noncoding human genome and the future of personalised medicine. Expert Rev. Mol. Med. Vol. 17, e4, January 2015, doi:10.1017/erm.2014.23

Accession information: doi:10.1017/erm.2014.23; Vol. 17; e4; January 2015 © Cambridge University Press 2015

The noncoding human genome and the future of personalised medicine

http://www.expertreviews.org/

20

The noncoding human genome and the future of personalised medicine.

Non-coding cis-regulatory sequences act as the 'eyes' of the genome and their role is to perceive, organise and relay cellular communication informati...
1MB Sizes 3 Downloads 9 Views