GENOMICS

10,157-165(1991)

Complete Structure of the Human Gene Encoding Neuron-Specific Enolase DANIELE OLIVA,” t lstituto

LARISSA CALi, t SALVATORE ko,

di Biologia de/lo Sviluppo de/ Consiglio Nazionale de//e Ricerche and * Dipartimento di Biologia Cellulare e de/lo Sviluppo, Via Archirafi 20-22, 90123 Palermo, Italy Received

August31,

1990;revised

Press. Inc.

INTRODUCTION

The enolases (phosphopyruvate hydratase, EC 4.2.1.11) are enzymes that catalyze the interconversion of 2-phosphoglycerate to phosphoenolpyruvate in the glycolytic pathway. The functional enzyme is a homodimer made up of subunits referred to as CY,p, and y (Zomzely-Neurath, 1983). These subunits are closely related to one another, exhibiting strong similarity at the amino acid level (more than 80%). Moreover, the polypeptide sequences predicted from enolase-encoding cDNAs isolated from different species show a high degree of evolutionary conservation (Segil et al., 1988; McAleese et al., 1988). In mammals there are at least three isoforms of enolase character-

Sequence data from EMBL/GenBank/DDBJ Accession No. X51956.

this article Nucleotide

have been Sequence

deposited Databases

December

27, 1990

ized by different tissue distributions as well as by distinct biochemical and immunological properties (Rider and Taylor, 1974). The (Y-or nonneuronal enolase (NNE) is a nearly ubiquitous form, found in almost all tissues, and its expression precedes that of the other isoforms in the early stage of embryonic development. The p- or muscle-specific enolase (MSE) is present in adult skeletal muscle, and the yor neuron-specific enolase (NSE) is the major form found in mature neurons and in cells of neuronal origin (Marangos and Schmechel, 1987). The transition from NNE to MSE or NSE in tissues such as muscle and nerve is developmentally regulated (Schmechel et al., 1980; Tanaka et al., 1985). The enolase subunits are encoded by three distinct genes, as established by cDNA sequence comparisons in human (Giallongo et al., 1986; Van Obberghen et al., 1988; McAleese et al., 1988; Oliva et al., 1989; Cali et al., 1990), rat (Sakimura et al., 1985a,b; Oshima et al., 1989), and mouse (Lamande et al., 1989; Kaghad et al., 1990). The human chromosome locations for the three gene loci (designated, in accordance with the guidelines for Human Gene Nomenclature, ENO1, ENO2, and EN03 for the (Y-, y-, and P-subunits, respectively) have been determined. EN01 has been mapped to the pter-p36.13 region of chromosome 1 (Khan et al., 1974; Cook and Hamerton, 1979), EN02 was assigned to chromosome 12 (Grzeschik, 1974; Law and Kao, 1982) and recently by in situ hybridization to the band p13 (Graig et al., 1989), and we have mapped EN03 to the short arm of chromosome 17 (Feo et al., 1990a). We are interested in investigating the molecular mechanisms underlying the developmental control and tissue-specific expression of the human enolase genes, the genetic and biochemical bases of multiplicity for these enzymes, and the evolutionary relationship between the members of this gene family. To address these issues and to provide tools for further studies, we have isolated and characterized the human gene encoding a-enolase (Giallongo et al., 1990)

At least three genes encode the different isoforms of the glycolytic enzyme enolase. We have isolated the gene for the human y- or neuron-specific enolase and determined the nucleotide sequence from upstream to the 5’ end to beyond the polyadenylation site. The gene contains 12 exons distributed over 9213 nucleotides. Introns occur at positions identical to those reported for the homologous rat gene, as well as for the human (Y- or nonneuronal enolase gene, supporting the existence of a single ancestor for the members of this gene family. Primer extension analysis indicates that the gene has multiple start sites. The putative promoter region lacks canonical TATA and CAAT boxes, is very G+C-rich, and contains several potential regulatory sequences. Furthermore, an inverted Ah sequence is present approximately 572 nucleotides upstream of the major start site. A comparison of the Ii’-flanking region of the human y-enolase gene with the same region of the rat gene revealed a high degree of sequence conservation. o 1991 Academic

t AND AGATA GlALLONGOt

in the under

157 All

Copyright 0 1991 rights of reproduction

o&3&7543/91 $3.00 by Academic Press, Inc. in any form reserved.

158

OLIVA

as well as a related processed pseudogene (Feo et al., 1990b) and we report here the primary structure of the locus for human y-enolase. To identify possible regulatory elements a sequence comparison with the homologous rat gene was performed (Sakimura et al., 1987); in addition to the already reported similarity in the 5’- and 3’-noncoding regions of the mRNAs (Day et al., 1987; Oliva et al., 1989), a striking conservation was found between sequences located 5’ upstream from the cap sites, suggesting a functional requirement for these sequences that have been conserved during evolution. MATERIALS

Isolation and Nucleotide y-Enolase Gene

AND

METHODS

Sequencing

of the Human

A partial Sau3A library of human acute T-cell leukemia DNA in the X vector EMBL3 was kindly provided by Dr. Louise C. Showe. The library was screened with a 32P-labeled 526-bp XhoII fragment derived from the 3’-untranslated region of the human y-enolase cDNA at high stringency as previously described (Oliva et aZ., 1989). One hybridization-positive clone, G3, was purified and characterized for restriction cleavage sites according to standard procedures (Ausubel et al., 1988). Four large fragments spanning a region of about 13 kb of the G3 X clone and containing the entire y-enolase gene were subcloned into the KS+ or KS- Bluescript vectors (Stratagene). Overlapping Bluescript clones were generated by progressive digestion with Exonuclease III (Boehringer) as described by Henikoff (1987), and sequenced by the dideoxynucleotide chain-termination method @anger et al, 1977). More than 90% of the nucleotide sequence was determined on both strands and otherwise at least twice on the same strand. In many experiments dITP was used in place of dGTP to alleviate band compression typically observed in G+C-rich regions of DNA (Tabor and Richardson, 1987), or TaqI polymerase (Promega) was used instead of modified T7 DNA polymerase (Sequenase, U.S. Biochemicals). The final sequence was assembled and analyzed using the HIBIO DNAsis program of the Hitachi Software Engineering Co.

Southern Blot Analysis DNA from the human fibroblast cell line PAF, after digestion with appropriate enzymes, was fractionated on an 0.8% agarose gel and transferred to a nylon membrane (Hybond-N, Amersham) according to the manufacturer’s instructions. Hybridization was performed with a 32P-oligolabeled human y-enolase-specific probe, namely, the 526-bp XhoII fragment (Oliva

ET

AL.

et al., 1989), as previously

described

(Giallongo

et aZ.,

1990).

Primer

Extension

Analysis

Total RNA was extracted by a modification of the acid-phenol method (Chomczynski and Sacchi, 1987), and poly(A)-rich RNA was selected by passage on oligo(dT)-cellulose (Ausubel et al., 1988). For the primer extension a 21-bp synthetic oligonucleotide, complementary to bases +187 to +207 of the human y-enolase gene sequence, was 5’-labeled with [T-~~P]ATP and used as primer. The labeled primer was annealed with 5 pug of poly(A)-rich RNA from human fetal brain in 5 n&f Tris-HCl, pH 7.5. The mixture was heated to 90°C for 2 min, incubated at 70°C for 10 min, and then allowed to cool slowly to room temperature. The annealed RNA template was then reverse transcribed at 42°C for 1 h with either 50 units of avian myeloblastosis virus (AMV, Boehringer) or 42 units of Moloney murine leukemia virus (M-MuLV, Boehringer) reverse transcriptase as previously described (Giallongo et al, 1990), except that 100 r&4 NaCl was used instead of 140 mM KC1 in the reaction mixture. The extended products were analyzed by electrophoresis through a 6% polyacrylamide/7 M urea sequencing gel. The lengths of the fragments obtained were estimated by comparison with sequence reactions loaded on the same gel. RESULTS

Cloning of the Human

y-Enolase

Gene

We previously reported the isolation of cDNA clones complementary to the human y-enolase mRNA and the generation of a y-enolase-specific 3’untranslated probe (Oliva et al., 1989). As shown in Fig. 1, Southern blot analysis of human DNA with this specific probe indicated that the y-enolase gene exists as a single copy in the human genome, since a single hybridizing band was detected in each digest. The same probe was then used to screen a human genomic library in the EMBL3 bacteriophage vector. One positive clone, G3, with an insert of genomic DNA of about 19 kb, was isolated from 5 X lo5 recombinant phages and found to contain the entire human y-enolase gene by further hybridization with different probes derived from both 5’ and 3’ regions of the mRNA (data not shown). Four overlapping fragments of the G3 DNA, a 3.2-kb EcoRI-BamHI, a 3.4-kb BglII-EcoRI, a 2.6-kb XbaI-BamHI, and a 3.9-kb BglII fragment (Fig. 2; the XbaI site is 268 bp upstream of the third EcoRI indicated on the restriction map), spanning the entire hybridization-positive region of about 13 kb, were then isolated and subcloned

HUMAN

GENE

FOR

exons 2-7 and the other exons 9-12 (Fig. 2). The locations of the introns are the same as those in the rat y-enolase gene (Sakimura et al., 1987) and in the human a-enolase gene (Giallongo et al., 1990), with the exception of intron 1, which interrupts the 5’-untranslated sequence 13 bp upstream of the initiation methionine codon (Fig. 3). Therefore the sizes of the strictly coding exons (exons 3-11) are identical to those reported for the other characterized enolase genes, while exon 2 is 97 bp (100 bp in the homologous rat gene and 94 bp in the human a gene) and exon 12, containing the long 3’-untranslated region, is 965 bp. The length of the noncoding exon 1 varies from 77 to 210 bp due to the presence of multiple start sites of transcription (see below). Intron sizes are different and vary from 157 bp (intron 3) to 1812 bp (intron 8). All splice donor and acceptor sites are in agreement with the consensus sequence reported for many other eukaryotic genes (Mount, 1982) and referable to the typical 5’ GT-AG 3’ rule (Fig. 3). Sequences resembling the branch site consensus, PyNPyTPuAPy (Ruskin et al., 1984), are found at the 3’ end of almost all the introns (data not shown). A variant polyadenylation signal ATTAAA is present in exon 12,895 bp beyond the termination codon TGA and 13 bp upstream of the poly(A) addition site, in agreement with the reported cDNAs (Day et al., 1987; Oliva et al., 1989). There are only two differences between the genomic exon sequences and the cDNA sequence that we previously determined; both differences occur in the untranslated regions of the y-enolase mRNA: a C to G transition and a C deletion at cDNA positions -73 and +1385, respectively (Oliva et al., 1989). The first difference at the extreme of the 5’-untranslated region of the isolated cDNA may represent a cloning artifact, while the second was identified as an overlooked sequencing error. Two copies of the highly repetitive Alu sequence

6.5

2.3 2.0 FIG. 1. Southern blot analysis of human genomic DNA. Human genomic DNA from the fibroblast cell line PAF was digested with the indicated restriction enzymes and hybridized to the “Plabeled XhoII fragment of the y-enolase cDNA, which represents most of the 3’-noncoding region of the corresponding mRNA. Molecular weight markers from HindHI-digested X DNA are shown at left.

into Bluescript plasmids for subsequent use in generating subfragments for nucleotide sequencing.

Sequence and Genomic Organization y-Enoluse Gene

of the

The analyzed sequence includes 1302 bp covering the coding region of the y-enolase gene (434 amino acids, including the initiator methionine), all the introns, and the 5’- and 3’-flanking regions, for a total of 10,905 bp (Fig. 3). The gene consists of 12 exons that appear to be clustered in two groups: one including AT0 I

159

y-ENOLASE

TGA

ATTIM

Bg E

HE

RI

l

4

FIG. 2. represented noncoding ATTAAA Methods).

E

B

BgB

4

B

l

‘%

BHB

Bg E

HB

1 kb

-

Structure and sequencing strategy of the human y-enolase gene. The genomic DNA contained in the recombinant X clone G3 is as a thick line at the top, and a restriction endonuclease map is indicated below. Coding exons are shown as black rectangles and exon regions as open boxes. Positions of the translation initiation codon ATG, stop codon TGA, and polyadenylation signal are indicated. Large arrows show the regions sequenced after progressive digestion with Exonuclease III (see Materials and Bg, Eⅈ B, BarnHI; E, EcoRI; H, HindHI.

160

OLIVA

ET AL.

TTGGTGGTGTATGCAGCTGGCCTAGGAGAGAAGCAGGAG -1000 GATGGGGGAAGAAAAGAGGGCAGA GAGACTGAAGAGATGAGGGTGGAGGCACTTTAGATAGGGGAGAGGCTT-900 -800 -700 CGCCTCCGCP -600 C1CAGCCCCTU;TCTTCTACCTC~~~~A~~TCTGTCTGGCCTTCAGACCTGATCAGACTCCCAGGGGCAGCCAC~ TEB -500 TATGACAGdkGAGGATGCCTGTTTTTCCCCAAAGCTGGAAATTC TCACAACCTGAGGCCCAGGATCTGCTE$TG;;GGTCCTCTG N

-400 -300 -200 -100 +1 101

1301

1901 2001 2101 % 2401

GGGTGCAGAATGGGGTGCCTAGGCCTGAGCGTTGCCTGGAGCCTAGGCCGGGGG~TCGGGCAG -+--- GTGGGTGAGAGCCAAGACCGC -2 TGGGCCGC GGGGTGCTGGTAGGAGTGGTTGGAGAGACTTGCGAAGGCGGCTGGGGTGTTCGGATT~~CAGAGTGATGCTCCTGTGTCTGACCGGGTT TG&&iTTGAGGCTGTCTTGGGCTTCACT TGGGCCTTCGTACCCGGGCTACAGGGGTGCGGCTCTGCCTGTTACTG CCGTGGGTATGAGCGCTTGTGTGCGCTGGGGCCAGGTCGTGGGTGCCCCCACCCTTCCCCCATCCTCCTCCCTTCCCCACTCCACCCTCGTCG bIl1 v c-TetxcccIA t 1 v 1 ..... ..... ...... ...... .... cGc?rcTcTcAlxGccGccGTcGc-. . . . . . . rbtron I ~TGAAGCCCCCGCCAGGCCCAGAGCCCCTGTGGCCCCCTCCCTTCGCCGCGGCGCCCCTGCCTCCTTTACCCGGTGCCGCTGCGCACCTC TCCGCATCTCTGGCCCGGTGCAGCTGCGCACCTGCTCCGCCGCCCGCGCCCAGGGCGCCTTCCCTCCCTGGCCTTCCCCGCCC~TGCCTTC~T GCCCTCTGCGCGCCTATCTCTAACTGCGCCTCTCCACCCTTGCCTGCCTCTCTCCCGGTGCTCGCCCTCATCTACGTCTATTGTTCT~GTTGGACG CCGGCGTGCATTTTATCCTAGAGCGTCCCTTTTGGTTTGCATTTGGG~TGTCTTCTCTCACCGTTCCTCACCTCCCCCA~CTTCCCTT~CTCCCCT CTCTCCTGCTCTTCCCCACGGCGCCCCTCTCCGTTCGCGCTTCCTCCCCTCT~TGCACG~GGA~GATG~~GTGGG~TCTGTCGAGGC~A~ GGGGAGGACAGGCCCAGCTCGCCTCCACTCCCCACCGGCTCTTATCCTCTTCACTTCCCGCTGC~CCCCCAGGGACT~AGG~CTTTCTCA~CCCT CTTCCCCGACACCTGTATTGCATGCGCCTTTCGCGGAAG TGGGGTGTTAAGTTGGGGAGGTATGGGGGCTCTGG~GAGAGGCCCAGGCAGCCCATCCTCTTCTTGGCTCCCA~~TGCM~CTGG~~CATT CCGCTCTGAGGAGATCTGGGAACCCACCGGCTGGCCAAGCTGC~GAGGCGCGG~CATGTCGT~CCTCCCTCCTCACTGCAGTGTTCTCCATCTCA TCATTTGATCTTACCCGCCCGGGACTGCAGTGACCTCCCCTTCCTCACTGCATTGACCTTCTCCTCTTCCAGGG~GAGGG~~TCCCTTCTACAGCCC TAGACTGCAGCCAGCGCCTCCCTACCCACCCCCCACCCACTGCAGT~CCTCTTTCCCATCCCTCCCAGCCTCCCGCCCCGCCCATTGATCTCAG MetSerIleGluL sIleTr AlaAs GluIleLeuAs ACCCCTCTAAGCCTCTTATCTTTCTCCTTCCTTCCCTTCCACC~G~~~~~~P SerAr Gl AsnProThrValGluValAs LeuT rThrAlaL sGIntron II PkiTACTiX& TAATGGGTGTGGCATGGGCCTTCCTACAGCCCTAGCTTTTCCACGGCCAGGC TGGGTTCGGCCAGGGGTTCGGAGGCCTTTTTTGATACCCAGGGGTATGGGGTGCTGGGCCAGGCTCAC~GCCTGGGTTGTGGCGGTTGGATCCTC TCCGGGAGCCAGGGTAGGTGGTCTGTGTTCGAGTTTAGCGTGT~GCATGTCCTGCCTCCGTGTGTCTGCCTGTGCACTTGCATGTGTGCAGACGTGT CTGCAAGCAATTTTCTTTCTGCGGGCTCCCACTTGTGCATGTGGGGCCTCAGATGGGTGGATGAGGAGGCCACTTCTTGTGCATCTGCCTCAGTGTG GTGGGGTGGGGGTGGGGGGATGTTAGGGAGCAGTGTGGGGCTCCCTGGCC bon 3-d LeuPheAr AlaAlaValProSerGl AlaSerThrGl IleT rGluAlaLeu CCTGTGTCCTCTTCACTCCCTCTCATTCCATGTTCCTCCTTTAG fiAmrTcccsY GluLeuAr As 1 As L sGlnAr T rLeuG1 L .sG-Intron III A&+ TGAGGTCCCTTCTCTTTTCCAGACTCTCCCCCACCTCAGCCTTATGCCCCTACCTCACA Exoa 411 V CCAGTCCCCAGTCCTCCTCTAGCATGGCTTCCCCTCCTCCCATTGATCCCTTCCGCCCCTCCTGGCCCGACCCAGTCCAGCCTCTTCCTTTCCCC Gdc alI,euL sAlaValAs HisIleAsnSerThrIleAlaProAlaLeuIleSerSer~Irkroa IV a--m TGAGGCCTGCTCTTTGCTGGGGATAGCAGGGCCAGAGTTCTGG AAGGAATCCCGGAGCAGGGCAGGAGGAAGGG~G~G~GGCCCACTCTTAGG~TCATGGTTAC~GGGGG~GGGTGGGG~CAGCTTCCTTMTGC ACCCTGCTCCCATGGGAGTTCAGGTCCCCTAATCCAGGTAGGCCCCTGTCACAGGGACCTGGTTG~CCCTG~C~TGT~GCTTGGGTGT~T~G

2501 GGGACCCTCTGCCTTAGGCTCAGCCTCCAGCCTGGCCCTG~GTGATGGAGCTCTGCCCTCA - --.

3101

LeuIleLeuProValPrcrIntron VI ~~Cn;n;CCCGTGAGCAATAAGCCACGGTGCGGCTCTCCCAGGGGCGGGTGGGGGAGGGAGCATGC~CTCATG 33% AGGAATGATGGGAGGAAAGTGAATTGAGGGAGGTAAAGAGG~GGATGGGGACGTGAGACTTAGTCCG~GCTGGGGG~GTTTGG~TCTTGGGTTA kor,__7-dlaPheAsnValIl -_ ,_.___ -. _ _ ___ 3401 ACACTCCTGGGGCGGGCAGGGAGGGGCTCTTTGACCCTTCTGTCTTTCTGTGGCTCCCCA~l ~~ 3501

bon EllaLeuGluLeuV 5001 ACCTCTGCCCTGAATGTCTTTTCTTTCCCTCCTCCTCCTTGCCCATCCCTCCTGCTTGTACTAT~TCTCACTGTATTCTGTCCCCA~ 5101 PheL sSerProThrAs ProSerAr T rIleThrG1 As lnLeuG1 AlaLeuT rGlnAs PheValAr As T rProV,-Introa VIII 5201 PMTP9YrY TCCTGTGAGAGGAAGTGG 5301 TGTGAGGGGGAGGTCTGGGGGCAGGCAGGGACGTGTCCCAGC~CTCTGGACCTTAT~GT~T~CTCA~CACCA~T~GGGTGTC~~~~

HUMAN

-ulcron

GENE

FOR

-y-ENOLASE

161

u

7301 GTW\GTWLCTTCTGGCCCTCTCCTGTGTGGTCCTCGTTTCTATMGACTCCTTTTGCMGTGCTCCAGCCTMTTCTACCCAGGGGTGCC~~GAGCG 7401 GGGMCCTGGAATCATCCTCACAGTTCTCTCACCTCTGCCCCTCCACCCCT~TTCTCTGCTCCCCTCCCA~TA~TTTCCCCTA~TGTTTCCTGACA 7501 TAGACCAAGGTTGGGGCTGGGAAGAGAGTGCCCAGTGTG 1 ValMetValSerHisAr

SerGl

GluThrGluAs

PGEY-L TGAGTGAAGAGAACTCTCTGTGGGATTGGTATTTCTAGCTCACCCACCTGGTCTCTCCTTCCA~TGTTTGAG~TGTCAG~GAG~TCA~~GCAG 7801 MGTTTCCTTTCAGGGGTGAGAGGGCAGTCACTGAGCTGC~TCCTTT~TGTTTCA~ LJsT~~Intron

XI

TGAGGGTCCCTGGGGTGGGAGCCCCTGGCCCAGATGGCTAG i%: GGCCTGGATMCAGTCCATTTCCTGGATAACAGTCCAACAGAT~TATTGGTTTTTGCTTCCTGGGTTTATTGATGGCCT~TTGAC~TCCCA~~T 8101 CACATGGGAAAGCCAGGGAGCTMGCCTTGGGGCAGGACAC~GCAGGTGGTGTGGGGGTGGTTG~GTCTGGGGGACCCCTA~~GA~GCAG ken 12 IleGluGluGluLeuGl As 1uAlaAr PheAlaGl HisAsnPheAr 8201 GATCCTCCTGCATCCCTGACCACTTCCTTTGTGGTTCATCTCTCTCA aaamw AsnProSerValLeu***

8301 ACTGGAGAC-P8401 8501 88% 8801 8901 9001 9101 E 9401 9501 E: 9801

GCTGGGGGGGCACAAAAAGAGGMGACAAAC~CT~C~TATGGCCGAGATGATGGCACTGCCTACCCCATTCTG~TAGGTG~GT~ATGT~CCCCT GCPTTCTTAGCAGAAGGCTTGGCTCCCAGACGCAGGTGAAACG CGCATTTTGGGGGAAAGGGGGTCTMGGTGTTTTCATATCC~GGGCTTGTGGACTGGAGCAGCTCCTGTACTGGGCCTCTGCCMC~CCCTGGCT GGTTCTCGMTGGAACAGGACTTCATGGCCATCACCCACTGC~GATGGGG~TGGG~GG~GMTGGTTCCGGGGGTAGTATAC~~G~CCT~G GAAACAGAGTCCTCAATAAACTGAAGATTCAGGAACAAAA ACGTAC

FIG. 3. Nucleotide sequence of the human y-enolase gene and 5’- and 3’-flanking regions. The numbering indicates the nucleotide position relative to the major transcription start site, which is designated +l. Major and minor starts of transcription demonstrated by primer extension are indicated by large and small vertical arrowheads, respectively. The 12 exons are printed in boldface letters and the amino acid sequence is shown above the coding exons in the three-letter code. The translation stop codon is marked by asterisks and the variant polyadenylation signal (ATTAAA) is underlined. Alu-like sequences are underlined and their flanking direct repeats are doubly underlined. Sequences with similarity to known regulatory elements (Refs. (13, 17, 30, 43)) in the 5’-flanking region are shown. Sequence motifs were found directly or as the inverse complement on the coding strand. Nucleotides matching the consensus sequence are overlined, whereas mismatches are indicated by dots. A CAAT sequence is boxed. The sequence overlined with dots in exon 1 corresponds to the synthetic oligonucleotide used in primer extension experiments.

are present in intron 7 in a head-to-head orientation, while another Alu sequence is found in the 5’-flanking region of the gene in an orientation opposite to the direction of transcription (Fig. 3). All the Alu sequences are flanked by direct repeats 12-15 bp long and show 75-90% similarity to the consensus sequence of human Alu repeats (Kariya et al., 1987). Mapping of the Start Sites of Transcription Characterization of the 5’-End Region

and

To determine the transcription initiation site(s) of the y-enolase gene, a primer extension analysis was

performed with an end-labeled synthetic oligonucleotide complementary to bases +187 to +207 in Fig. 3. In the presence of human fetal brain poly(A)-rich RNA, one major extended product of 207 bp and severa1 minor bands ranging from 74 to 121 bp were observed using either AMV reverse transcriptase (Fig. 4, lane 1) or M-MuLV reverse transcriptase (Fig. 4, lane 3). None of these bands was detectable in the control lanes containing unrelated RNA (Fig. 4, lanes 2 and 4) even after longer exposure (data not shown). The exact positions of major and minor start sites with respect to the sequence are indicated in Fig. 3. Our attempts to confirm these data by Sl protection analy-

162

OLIVA

TGCA

AMV M-ML .V ______ 1 2 3 4

4

207

z

121 118

z

103 99

78 74

FIG. 4. Identification of the start site of transcription. A 21nucleotide oligomer complementary to bases +187 to +207 in exon 1 (see Fig. 3) was labeled at its 5’end and used for primer extension. The labeled primer was annealed to 5 pg of poly(A)-rich RNA from human fetal brain (lanes 1 and 3) or 10 pg of yeast tRNA as control (lanes 2 and 4) and reverse transcribed with AMV (lanes 1 and 2) or M-MuLV polymerase (lanes 3 and 4). The lengths of the extended fragments, indicated by arrows on the right side, were estimated in both cases by comparison with the sequence reactions generated using the same 21-nucleotide oligomer to prime a Blue!script single-stranded clone that contained the 5’ end of the y-enolase gene.

sis have been unsuccessful, most likely because of the presence of quite stable secondary structures in the 5’-untranslated region of the y-enolase mRNA (see Discussion). However, identical primer extension results were consistently obtained under different annealing conditions as well as with different RNA prep-

ET

AL.

arations from both human fetal brain samples and neuroblastoma cell lines expressing considerable levels of the y-enolase mRNA (data not shown). In agreement with the presence of multiple start sites of transcription, the nucleotide sequence of more than 1 kb of the region 5’ of the major start revealed the absence of canonical TATA and CAAT boxes. A CAAT sequence is found at -241, a position with no functional significance, while 5 copies of the upstream promoter element CACCC, which may function as a CAAT box (Myers et al., 1986), are present within 150 nucleotides upstream of the mRNA major start (positions -137, -60, -51, -18, -1, Fig. 3). The sequence surrounding the start sites, from base pairs -200 to +210, and the first 250 nucleotides of intron 1 are particularly C+G rich (71.5 and 73.6%, respectively). In both regions the dinucleotide CG is present nearly as often as GC (29 versus 49 and 30 versus 42 times, respectively), while it is underrepresented within the remainder of the gene (179 versus 571). These C+Grich regions may represent CpG islands that contain high levels of the normal underrepresented dinucleotide CG, a target for DNA methylation, but are undermethylated (Bird, 1986). The 5’ region of the y-enolase gene also contains an inverted Alu sequence that spans approximately 339 bp (-911 to -572), including the 12-bp-long almost perfectly matching direct repeats (Fig. 3). The Ah sequence contains the described RNA polymerase III promoter A- and B-box elements (Perez-Stable et aZ., 1984) at positions -636 and -664, respectively (Fig. 3). Several sequences that match DNA-binding sequences for transcription factors or consensus for other regulatory elements (Jones et al., 1988; Wingender, 1988; Gumucio et al., 1988; Peterson and Calame, 1989) are present in the 5’-flanking region of the gene, within the approx 600 nucleotides located between the transcription start sites and the Ah sequence and upstream of the Ah element (see Fig. 3).

Sequence Comparisons between the 5’-Flanking Regions of Human and Rat y-Erwlase Genes We previously reported the presence of a striking similarity in the 5’-noncoding regions of the human and rat y-enolase mRNAs (Oliva et al., 1989). Comparisons of sequences further upstream, now available, revealed the presence of other stretches of nucleotides highly conserved between the two species (Fig. 5). Sequences were aligned without regard to intron 1, which interrupts the human and rat 5’-untranslated regions 13 and 16 nucleotides upstream of the initiation codon, respectively, and likely omitted the Ah element which is present only in the human sequence. The overall similarity of the flanking re-

HUMAN

Hu

GENE

FOR

163

r-ENOLASE

:

Rat:

6327 b Alu Hu : AGA-TGAGGG-TGGCCGCCAGGCACTTTAGATAGGGGAGAGGCT~~CTTCT . . . . . . . . 5:CCT---CCAGGAGATCAAAGACGCTGGCCTTCA-G ... .. . ... .. . .... . . . . . . . .. . . . . . . . . Rat: -bCCib-~~~~~T--E---~~~-~~CC~G~~~~~~C~--~---~G~~-~A~~-~~~~~~~~-~T~TTT~~~~~~~~~T~~~~G~-~~~

-757 -704

Hu : ACCTGATCAW\CTCCCAG-GGGCAGCCACCACATG--TA----TG--ACA~G~CA~G-~TG--CCTGTTTT-TCCCC~GCTG~TTCATCAC : :: :: : :: .. .. .. .. .. .. .. .. : ::::: .*... . .. . . . . . . . . . : : :::: . . . . . . . . . . . : : . . . . . . . . . . . . . -670 Rat: AACTCATAATAC-CCCAGTGGGGA-CCACCGCATTCA~~~CC~~TT~~~-~~GTGG~~TGTT~~~-~~~~G~~~~-A~-~~~~C~~~T~~ -610 Hu : AACCTGAGGCCCAGGATCTGCTCTGTGCCGGTCCTCTGGGCAGTGTGGGGTGCA~TG~GTGCCTAGGCCT~GCGTTGCCTG~~CTA~CCG~G ... ..... : :: ::::: :::: :::: : ::: : :: ::: : :::::: :: : : ----TG--T-----T---r-~-~--~~-~~~~~--~~T~ Rat: ATCCCGAGGCTCAGGTTCTG-TG-GTG--G-TCATCT---CTGTGTGG----~----~

-570 -547

HU : GCCGCCCTCGGG-CAGGCGTGGGTGAGAGCCAAGACCGCGTGGGCCG-CG~G-TGCTGGTAGGAGTGGT-TG~~GACTTGC-GA--AGGCGG-CTG::: . . ...: . . . . . . . . . . . . . .. .. .. .. ..a. ... .. . .. .... Rat: GTC-C--~AA~CA;‘~~-C-TC--~C~-~~C-~-~T~-~~~~-~-~A~T~~~~~A~~~~~G~~T~~CTGGA~~~~~C~G~~T~~~~~T

-479 -460

Hu : G--GG-T--GTTCGGATTTCCAATAAAGAAA-CAG----AGTGATGCTCCTGTGTCTWLCCGGGTTTGTGAGACAT-T~GGCTGT-CTT~GCTTCAC .,.. ,... *....a .. . ..... .. .... .. .. . . .... ~AT~~ICA~~~~A~C~‘~~~~~~-~GG~-~TTTTT~-~-~~--~~-~~~~~~-~~--~--~~~~~~-~~C~~~~T~-~~~~~~---~~~~~Rat:

-392 -378

Hu : TGGCAGTGTGGGCCTTCGTACCCGGG-CTACAGGGG--T--GCG-GCTCTGCCTG-TTACTGTC~GTG~TCGGGCCGT~GTAT~GCGCTT-GTGTG

-300

Rat: Hu

: CGC-TGGGG---CCAGGTCGTGGGTGCCCCCACCCTTCCCCCATC-CTCCT--CCCTTCCCCACTCCACCC-TCGTCGGT-~CC-C-~CC-C--GCGCT

Rat:

..... .,.. . .. .. . .. .. .. .. .. .. .. .. .. *... ... .. .. .. ~TTA~~~~ETTG~~~~;C~--~~~-~-~~-------~GG----~~~G~-~~GGG~~~TT~~~~-~---~~~~G~--~~~~~T~~~~A~C~~~A~CC~~~~~

Hu : CGTACGTGCGCCTCCGCCGGCAGCTCCTWLCTCATCGGGGC----GCC

. . . . . . . . .. .. .. .. .. .. .. .,..... . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. : ::::::::::::::: .. .. ...*....

::: ::::

:::::::::::

. ..

:: :: ::::

.....

::

Rat : CGTACGTGCGTCTCCGCCTGCAGCTCTTGACTCATCGGGGCCCCCGGGTCACATGCGCTCGCTCGGCTCTATAG~~CCGCCCCCT~CC~CCCCCC~C Hu : Rat:

:::

-214

-210 -119 -110

q....... .. .. .. .. .. .. .. .. .. .. .. .. .. ::: :: : :::::: :::::: . . . . . . . . .. ...*... . . . . . . . .. .. .. ...*.. . . . . . . .. .. :: ::::::::::::::: CGCGCTG-G--G-AGCCGCAGCCGCC---GCCACTCCTGCTCTCTCTGCGCCGCCGCCGTCA---CCACC~CACCGCCA------CCGGCTGAGTCTGC -26

Hu :

Intron

1

FIG. 6. Comparison of nucleotide sequences of the 5’-flanking regions of the human and rat y-enolase genes. Sequence data for the rat gene are from Sakimura et al. (35). The two sequences are aligned at the initiation codon ATG, marked with a box, and nucleotides upstream of the initiation codon are negatively numbered. The 327 bp of the Ah sequence present in the 5’-flanking region of the human gene have been deleted and gaps (dashes) introduced to maximize homology. The position of the Ah element is shown and its 5’ direct repeat is underlined. Intron 1 positions are indicated but sequences are not shown. The vertical arrowheads above and the dots beneath the sequences indicate the transcription-initiation sites for the human and the rat (35) y-enolase genes, respectively.

gions analyzed (about 900 bp) is 66%. The sequences up to the major start site of the human gene (-1 to -222 and -1 to -221 in human and rat, respectively) are 80% similar and several blocks of conserved sequences are present further upstream (Fig. 5). Interestingly, two regions of high similarity (74-78%) are located immediately upstream and downstream of the Alu sequence, whose insertion in the human sequence appears to have interrupted an original longer stretch of similarity between man and rat. DISCUSSION

We have isolated the gene encoding human y-enolase and determined the complete nucleotide sequence. The overall organization of this gene, with 11 coding exons and a large intron in the 5’-untranslated

region, is the same as that of the human (Y-and fl-enolase genes (Giallongo et aZ., 1990; Oliva et CL&unpublished results). Furthermore, the protein coding regions of the three enolase genes are interrupted at analogous positions by introns with splice junctions at identical positions within the triplet codons (data not shown), suggesting that the members of the human enolase multigene family are the product of a series of duplication of a single ancestral gene. Southern blot analysis and comparison with cloned genomic DNA indicated that the y-enolase locus exists as a single copy in the haploid human genome. The existence of multiple start sites of transcription was determined by primer extension. Minor starts map to within a few nucleotides centered around 91, 116, and 133 nucleotides upstream from the initiation codon ATG, positions analogous to

164

OLIVA

those reported by Sakimura et al. (1987) for the homologous rat gene, while the major start in the human gene is located further upstream. No Sl-protected fragments were obtained using a single-stranded probe complementary to sequences surrounding the 5’ end of the gene. The presence of a potential stemand-loop structure (AG = -21 kcal/mol) in this region at positions +105 to +140 (Fig. 3) might promote the formation of a cruciform structure between the y-enolase mRNA and the end-labeled probe, rendering the loop within the probe sensitive to Sl digestion. At the same time we cannot exclude the possibility that the minor primer extension products, whose 5’ ends map to the same region, are due to premature termination of reverse transcriptase at the stem-and-loop structure within the mRNA. However, the existence of multiple start sites of transcription in the y-enolase gene is consistent with the lack of canonical TATA and CAAT boxes. Sequences matching transcription factor-binding sites or consensus for other regulatory elements were found in the 5’-flanking region of the gene. Another feature of the putative promoter region of the y-enolase gene is the high C+G content (71.5%). Actually, clusters of the CG doublet occur both upstream and downstream of the major site of transcriptional initiation, as well in the first noncoding exon and in the first intron, resembling potential CpG-rich islands (Bird, 1986). These characteristics, lack of TATA and CAAT boxes, heterogeneous start sites of transcription, and the presence of C+G-rich sequences, all of which are found in the rat y-enolase gene (Sakimura et al., 1987), have been associated with mammalian “housekeeping” genes, whereas highly tissue-specific genes do not usually display these features (Bird, 1986). Although y-enolase is used largely as a marker of neuronal differentiation (Marangos and Schmechel, 1987), the protein and specific mRNAs have been detected in a number of normal and transformed tissues of nonneuronal origin (Haimoto et al., 1985; Van Obberghen et al., 1988) and in peripheral blood lymphocytes upon mitogenic stimulation (S. Feo, unpublished results). Therefore, the tissue-specificity of this gene may be considered intermediate between that of constitutively expressed genes and that of specialized genes encoding proteins that are present only in determined cell types. The same housekeepingtype promoter has been observed in two other brainspecific genes, the ones encoding aldolase C (Vibert et al., 1989) and the membrane protein Thy-l.2 (Ingraham and Evans, 1986). No evidence for common DNA elements has been found in sequence comparisons between these two genes and the one encoding rat y-enolase (Vibert et al., 1989), while a striking sequence similarity was found in the 5’-flanking regions of hu-

ET

AL.

man and rat enolase genes. This suggests the existence of a strong evolutionary constraint to preserve these sequences and, thus, their potential functional role in the expression control of the y-enolase gene. ACKNOWLEDGMENTS We are indebted to Dr. L. C. Showe for the human genomic library and to Dr. G. Lennon for the human fetal brain samples. This work was partially supported by CNR grants: Progetto Bilaterale to A.G. and Progetto Finalizzato Ingegneria Genetica to S.F.

REFERENCES 1.

AIJSUBEL, F. M., BRENT, R., KINGSTONE, R. E., MOORE, D. D., SEIDMAN, J., SMITH, J. A., AND STRUHL, K. (1988). “Current Protocols in Molecular Biology,” Greene and Wiley-Interscience, New York.

2.

BIRD, A. (1986). CpG-rich islands and the function methylation. Nature (London) 321: 209-213.

3.

CAI.~, L., FEO, S., OLIVA, D., AND GIALLONGO, A. (1990). Nucleotide sequence of a cDNA encoding the human musclespecific enolase (MSE). Nucleic Acids Res. 18: 1893. CHOMCZYNSKI, P., AND SACCHI, N. (1987). Single step method of RNA isolation by acid guanidinium thiocyanatephenol-chloroform extraction. Anal. Biochem. 162: 156-159. COOK, P. J. L., AND HAMERTON, J. L. (1979). Report of the committee on the genetic constitution of chromosome 1. Cytogenet. Cell Gem&. 25: 9-20. DAY, I. N. M., ALLSOPP, M. T. E. P., MOORE, D. C. McN., AND THOMPSON, R. J. (1987). Sequence conservation in the 3’-untranslated regions of neuron-specific enolase, lymphokine and protooncogene mRNA. FEBS Lett. 222: 139-143.

4.

5.

6.

7.

8.

of DNA

FEO, S., OLIVA, D., BARBIERI, G., Xv, W., FRIED, M., AND GIALLONGO, A. (1990a). The gene for the muscle-specific enolase is on the short arm of human chromosome 17. Genomics 6: 192-194. FEO, S., OLNA, D., AR&, B., BAREIA, G., CALI, L., AND GIALLONGO, A. (199Ob). The human genome contains a single processed pseudogene for alpha enolase located on chromosome 1. DNA Sequence 1: 79-83.

9.

GIALLONGO, A., FEo, S., MOORE, R., CROCE, C. M., AND SHOWE, L. C. (1986). Molecular cloning and nucleotide sequence of a full-length cDNA for human alpha enolase. Proc. Natl. Acad. Sci. USA 83: 6741-6745.

10.

GIALLONGO, A., OLIVA, D., CAL?, L., BARBA, G., BARBERI, G., AND FEO, S. (1990). Structure of the human gene for alpha enolase. Eur. J. Biochem. 190: 567-573. GRAIG, S. P., DAY, I. N. M., THOMPSON, R. J., AND GRAIG, I. W. (1989). Localization of human neuron-specific enolase to chromosome 12~13 (A2540).(HGMlO). Cytogenct. Cell Genet. 51: 980.

11.

12.

GRZESCHIK, K. H. (1974) Assignment of human genes: @-Glucuronidase to chromosome 7, adenylate kinase-1 to 9, a second enzyme with enolase activity to 12, and mitochondrial IDH to 15. Cytogenet. Cell Genet. 16: 142-148.

13.

GUMUCIO, D. L., WIEBAUER, K., CALDWELL, R. M., SAUMELSON, L. C., AND MEISLJZR, M. H. (1988). Concerted evolution of human amylase genes. Mol. Cell. Biol. 8: 1197-1205.

14.

HAIMOTO, H., TAKAHASHI, Y., KOSHIKAWA, T., NAGURA, H., AND KATO, K. (1985). Immunohistochemical localization of gamma-enolase in normal human tissues other than nervous and neuroendocrine tissues. Lab. Invest. 52: 257-263.

HUMAN

GENE

FOR

165

y-ENOLASE

15.

HENIKOFF, S. (1987). Unidirectional digestion with exonuclease III in DNA sequence analysis. In “Methods in Enzymology” (J. N. Abelson, M. I. Simon, Eds.), Vol. 155, pp. 156-165, Academic Press, New York.

30.

PETERSON, C. L., AND CALAME, K. (1989). to site C2 (rE3) in the immunoglobulin hancer in multiple oligomeric forms. Mol. 786.

16.

INGRAHAM, H. A., AND EVANS, G. A. (1986). Characterization of two atypical promoters and alternate mRNA processing in the mouse Thy-l.2 glycoprotein gene. Mol. Cell Bial. 6: 29232931.

31.

RIDER, C. C., AND TAYLOR, C. B. (1974). Enolase isoenzymes in rat tissues: Electrophoretic, chromatographic, immunological and kinetic properties. Biochim. Biophys. Acta 365: 285300.

17.

JONES, N. C., RIGBY, P. W. J., AND ZIFF, E. B. (1988). Transacting protein factors and the regulation of eukaryotic transcription: Lessons from studies on DNA tumor viruses. Genes Dev. 2: 267-281.

32.

RUSKIN, B., KRAINER, A. R., MANIATIS, T., AND GREEN, M. R. (1984). Excision of an intact intron as a novel lariat structure during pre-mRNA splicing in vitro. Cell 38: 317331.

18.

KAGHAD, M., DUMONT, X., CHALON, P., LELIAS, J. L., LAMAND&, N., LUCAS, M., LAzAR, M., AND CAPUT, D. (1990). Nucleotide sequence of cDNAs alpha and gamma enolase mRNAs from mouse brain. Nucleic Acids Res. 18: 3638. KARIYA, Y., KATO, K., HAYASHUAKI, Y., HIMENO, S., TARUI, S., AND MATSUBARA, K. (1987). Revision of consensus sequence of human Alu repeats-a review. Gene 63: l-10.

33.

SAKIMIJRA, K., KUSHIYA, E., OBINATA, M., AND TAKAHASHI, Y. (1985a). Molecular cloning and the nucleotide sequence of cDNA to mRNA for non-neuronal enolase (alpha-alpha enolase) of rat brain and liver. Nucleic Acids Res. 13: 4365-4378.

34.

SAKIMURA, K., KUSHIYA, E., OBINATA, M., ODANI, S., AND TAKAHASHI, Y. (1985b). Molecular cloning and the nucleotide sequence of cDNA for neuron-specific enolase messenger RNA of rat brain. Proc. Natl. Acad. Sci. USA 82: 7453-7457.

35.

SAKIMURA, K., KUSHIYA, E., TAKAHASHI, Y., AND SUZUKI, Y. (1987). The structure and expression of neuron-specific enolase gene. Gene 60: 103-113.

36.

SANGER, F., NICKLEN, S., AND COULSON, A. R. (1977). sequencing with chain-terminating inhibitors. Proc. Acad. Sci. USA 74: 5463-5467.

37.

SCHMECHEL, P. J. (1980). neuron-specific 190: 195-214.

38.

SEGIL, N., SHRU~OWSKI, A., DWORKIN, M., AND DWORKINRASTL, E. (1988). Enolase isoenzymes in adult and developing Xenapus locvi.s and characterization of a cloned enolase sequence. Biochem. J. 251: 31-39.

39.

TABOR, S., AND RICHARDSON, C. (1987). DNA sequence sis with a modified bacteriophage T7 DNA polymerase. Natl. Acad. Sci. USA 84: 4767-4771.

40.

TANAKA, M., SUGISAKI, K., AND NAKASHIMA, K. (1985). Switching in levels of translatable mRNAs for enolase isozymes during development of chicken skeletal muscle. Bio&em. Biophys. Res. Commun. 133: 868-872.

41.

VAN OBBERGHEN, E., KAMHOLZ, J., BISHOP, J. G., III, ZOMZELY-NEURATH, C., AND LAZZARINI, R. A. (1988). Human gamma enolase: Isolation of a cDNA clone and expression in normal and tumor tissues of human origin. J. Neurosci. Res.

19.

20.

21.

22.

23.

24.

25. 26.

27.

28.

29.

KHAN, M. L., DOOPERT, B. A., HAGMMEIJER, A., AND WESTJZRVELD, A. (1974). The human loci for phosphopyruvate hydratase and guanylate kinase are syntenic with the PGDPGM, linkage group in man-Chinese hamster somatic cell hybrids. Cytogenct. CeU Genet. 13: 130-136. LAMAND&, N., MAZO, A. M., LUCAS, M., MONTARRAS, D., PINSET, C., GROSS, F., LEGAULT-DEMARE, L., ANJJ LAZAR, M. (1989). Murine muscle-specific enolase: cDNA cloning, sequence, and developmental expression. Proc. Natl. Acad. Sci. USA 86: 4445-4449. LAW, M. L., AND KAO, F. (1982). Regional mapping of the gene coding for enolase-2 on chromosome 12. J. Cell Sci. 63: 245-254. MARANGOS, P. J., AND SCHMECHEL, D. E. (1987). Neuronspecific enolase, a clinically useful marker for neurons and neuroendocrine cells. Annu. Rev. Biochem. 10: 269-295. MCALEESE, S. M., DUMEIAR, B., FOTHERGILL, J. E., HINKS, L. J., AND DAY, I. N. M. (1988). Complete amino acid sequence of the neuron specific gamma isozyme of enolase (NSE) from human brain and comparison with the nonneuronal alpha form (NNE). Eur. J. Biochem. 178: 413-417. MOUNT, S. M. (1982). A catalogue of splice junction sequences. Nucleic Acids Res. 10: 459-472. MYERS, R. M., TILLY, K., AND MANIATIS, T. (1986). Fine structure and genetic analysis of a beta-globin promoter. Science 232: 613-618. OLIVA, D., BARBA, G., BARBIERI, G., GIALLONGO, A., AND FEO, S. (1989). Cloning, expression and sequence homologies of cDNA for human gamma enolase. Gene 79: 355-360. OSHIMA, Y., MITSUI, H., TAKAYAMA, Y., KUSHNA, E., SAKIMURA, K., AND TAKAHASHI, Y. (1989). cDNA cloning and nucleotide sequence of rat muscle-specific enolase (beta-beta enolase). FEBS Lett. 242: 425-430. PEREZ-STABLE, C., AYRES, T. M., AND SHEN, C. K. J. (1984). Distinctive sequence organization and functional programming of an Alu repeat promoter. Proc. Natl. Acad Sci. USA

81:5291-5295.

Proteins binding heavy-chain enCell. Bial. 9: 776-

D. E., BFUGJTMAN, M. W., AND Neurons switch from nonneuronal enolase during differentiation.

DNA Natl.

MARANCOS, enolase to Brain Res.

analyProc.

19:450-456. 42.

VIBERT, M., HENRY, J., KAHN, A., AND SKALA, H. (1989). The brain-specific gene for rat aldolase C possesses an unusual housekeeping-type promoter. Eur. J. B&hem. 181: 33-39.

43.

WINGENDER, E. (1988). Compilation of transcription regulating proteins. Nucleic Acids Res. 16: 1879-1902. ZOMZELY-NEURATH, C. E. (1983). Enolase. In “Handbook of Neurochemistry” (A. Lajtha, Ed.), Vol. 4, 2nd ed., pp. 403433, Plenum Press, New York.

44.

Complete structure of the human gene encoding neuron-specific enolase.

At least three genes encode the different isoforms of the glycolytic enzyme enolase. We have isolated the gene for the human gamma- or neuron-specific...
2MB Sizes 0 Downloads 0 Views