CENOMICS
12,
197-205
(19%)
Structural Organization and Complete Nucleotide Sequence of the Gene Encoding Human Acid Sphingomyelinase (SAWDl) EDWARD H. SCHUCHMAN,’ Division of Medical
and Molecular
ORNA LEVRAN, LYGIA V. PEREIRA, AND ROBERT J. DESNICK Genetics, Mount
Sinai School of Medicine,
ReceivedSeptember5,
Acid sphingomyelinase (ASM; HGMW-approved symbol, SMPDZ) is the lysosomalphosphodiesterasethat hydrolyzes sphingomyelin to ceramide and phosphocholine.The deficient activity of this enzyme results in Types A and B NiemannPick disease(NPD). The full-length cDNA encoding human ASM has been isolated and characterized (E. II. Schuchman, M. Suchi, T. Takahashi, K. Sandhoff, and R. J. Desnick (1991) J. Biol. Chem.66:8531-8639), and the ASM genehas been localized to chromosomalregion llp15.1-~15.4 (L. V. Pereira, R. J. Desnick, D. Adler, C. M. Disteche, and E. II. Schuchman(1991) Genomics9:229-234). Using the cDNA as a probe, a genomic clone containing the ASM genomic region was isolated and the complete nucleotide sequenceof the human ASM gene, including 1116 and 468 nucleotidesupstream and downstream from the ASM coding region, respectively, was determined. This housekeepinggenecontained six exons ranging in size from 77 to 773 bp and five introns ranging in size from 153 to 1059 bp. Exon 2 was unusually large and encoded256 amino acids, or about 44% of the mature ASM polypeptide. The alternatively spliced 172-bp type l-specific sequencewas encodedby exon 3, whereas the type 2-specific sequencewas located at the 5’ end of intron 2. An analysis of the intron/exon junctions revealed that there was a weak donor splice site (AAA gtgagg) at the exon 3jintron 3 junction which occasionally leadsto alternative splicing of exon 3 and the occurrence of the type 2 and 3 ASM transcripts. A single Ah1 elementin the reverse orientation was in intron 2, immediately downstream from the type a-specific sequence.The regulatory region upstream of the ASM coding sequencewas GC rich and contained putative promoter elements,including SPl, TATA, CAAT, NF-1, and AP-1 binding sites. Intriguingly, the ASM genomicregion encodedthree other long open reading frames (ORFs) which predicted polypeptides of 101, 104, and I58 amino acid residues,respectively. o lee2 Academic Press, Inc.
INTRODUCTION
Acid sphingomyelinase (ASM;’ sphingomyelin phosphodiesterase, EC 3.1.4.12) is the lysosomal hydrolase Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under Accession Nos. M81780, M81781. ’ To whom correspondence should be addressed at Division of Medical and Molecular Genetics, Mount Sinai School of Medicine, 100th Street and Fifth Avenue, New York, NY 10029. ’ HGMW-approved symbol, SMPDl.
New York, New York 10029
1991
required for the enzymatic degradation of sphingomyelin to ceramide and phosphocholine (for reviews, see Barenholz and Gatt, 1982; Koval and Papano, 1991). Although the human enzyme was first isolated in 1966 (Barenholz et al.), purified preparations have been obtained only recently. The most highly purified preparation was a 70-kDa glycoprotein with a pH optimum of about 4.5 obtained from urine (Quintern et aZ., 1987). In 1966, Brady et al. identified ASM as the primary enzymatic deficiency in Type A Niemann-Pick disease (NPD). This finding was soon confirmed by Schneider and Kennedy (1967), who also described several other NPD patients with deficient ASM activity, but mild symptomology (i.e., Type B NPD). Type A NPD is a severe, neurodegenerative disease of infancy and generally leads to death by 2 to 3 years of age (Spence and Callahan, 1989). In contrast, Type B NPD patients have little or no neurologic involvement and may survive into adulthood. Recently, three distinct human ASM cDNAs were isolated, characterized, and expressed (Quintern et al., 1989; Schuchman et al., 1991a), the first molecular lesions that cause human NPD were identified (Levran et al., 1991a,b), and the human ASM gene was localized to chromosomal region llp15.1-~15.4 (Pereiraetal., 1991). In this communication, the structural organization and complete nucleotide sequence of the gene encoding human ASM are described. This housekeeping gene is small (“5 kb) and is composed of six exons. Interestingly, exon 2 encoded about 44% of the mature ASM polypeptide. Analysis of the genomic region demonstrated that each of the three ASM transcripts previously identified (Schuchman et al., 1991a) was derived from alternative splicing of a single hnRNA: i.e., type 1 transcripts occurred from normal splicing events, whereas type 2 and 3 transcripts occurred from alternative splicing of the type l-specific exon 3. In addition, three other open reading frames (ORFs) were identified within the ASM genomic region and predicted polypeptides of 101, 104, and 158 amino acid residues, respectively. MATERIALS Genomic library sert size -10-15
197
AND
METHODS
screening. A human genomic library (average kb) was constructed in the phage vector EMBL
in3
o&B-7543/92 $3.00 Copyright 0 1992 by Academic Press, Inc. All rights of reproduction in any form reserved.
198
SCHUCHMAN
-210
tcgacagccg
cccgccaccg
-190
agagatcagc
tgtcagagat
-90 EXONl
-110
gagaagggta
atcgggtgtc
CCCGaxEG
-170
cagaggaaga
AL.
ggaaggggcg
-150
gagctgcttt
gcggccggcc
SPl-50
-70
-
ET
TcuxuPx
B
-130
ggagcagtca
-30
cwuxGGc
fzwsxEu
Acmwc&x
111
131
I.51
171 cxtxmsc~~cTc
191 211 BBBBBBBBBCrCTATTCAC
231
251
271
aggccgaaag$$tgctggg
gctgggggct3gg~qctgatg
&-s--B-
291
311
CGccATcAAc
B
AGqtgagcac
391
tgaaggggct
411
gggctcagaa
491
cattgtgacc
591
aagcgatggc
tgcatccctg tttgtgaagt
atggagaggq
511
aagaaataat
611
ctggtgctgc
ctgagttaca
aaggggtggt
ggccaggggt
691
tggcatctac gcagacagtg
gcagtggagg
431
aatccatcac
531
cctgaggaag
631
gggcaatatc
tggaaggcaa
tggcctggtt
cctctgctct
711
aatggaggcc
331
tgagtttgct tcagcttgcc
451
cccctttggg
551
aagcaaaggc
651
aggtgtgcac
tgagcttggt
gcctctgatt
tc@ccatg
731
gccgactaca
gacacccatg ctcatgccac
ORFl+
ctggtgcgct
471
gctacatgcc
acE&xcc
cORF1
571
aggccgctga
671
gcactgagtc
ctgcccagcc
cgctcctccc
actgcagA&G
751
771
791 911 TixcmzxTBCOCATCAACCBCClYIRROATA
831
oRF2-e iiismaAB--
891 911 BB-CCCATCrCACCOCTCTCCCCsBBACATlTlYlTC
931
951
971
991 1011 ATcJmmx-CAACCCCCCC-sCB~--
1031
1051
1071
1151
B
+ORF2
871
gctaaagaag ccagtttgga
EXON2 CAADCCAATC
1171
BCTACCrCCAC--ararrrarrr-BB-CCATaCOCCCB 1191 ---DCOCCK;ACC----Bs
I.211
1231
1251
I.271
1291 1311 AcAcGAI;AcAT0003(;CACA-B--B---
l331
xi51
1371
l391 1411 camaaGT~----cccDclTcAT-~~
1431
1451
1471
1491 TCAACCCATC--m-B 1591 gtgaatgaaa
1891 tt
ttttttt
1631 ggggcattgt
ctctgattgc
tgccgcacca
ggcttttttt
tttttttttt
agcaatcctc
ccgcctcggc
ctcccaaaat
gagatggaat tcccaagtac
ttttagttta
gtttttgtag
tgcccaggct
gctgggacca
caggcatcag
aggtgcacgc
ctcttcattt
ggctccccta
aqacaagatc
ttgctatgtt
ctactgctcc
tggccctccc
ggcaccatct
cagctcacta
cagcctccac
aqctaatttt
tgtattttta
gtagagatgg
gcccaggctg
1871 tttttttttt
1971
2051 caccacaccc
atctgactcc
1771
1951 ggagtgcagt
gaaaagaXg
1671
1851
2031 ctgggactac
cccaggaagg
1751
1931
2011 tgcctcagcc
gtccttagtg
1831
cttgctctgt
1571 tccgtggaaa
1651 tctagcatga
1731
1911 tttttttttt
1991 caagcaattc
aagggaacct
1811 cctaacctca
1551 gtacttatcg
1711 tctactgttt
1791 gtctcaaaca
1531
1611 gtgaagggag
1691 tccttccctt
I.511
ctcctqggtt
2071 ggtttcacca
FIG. 1. Nucleotide sequence of the gene encoding human ASM. Exonic sequences are shown in bold uppercase letters. Intronic sequences are in lowercase. The two putative ASM initiation codons and the first in-frame stop (TAG) codon are indicated by a double underline. A single Alul sequence within intron 2 is boxed and its transcriptional orientation is indicated by an arrow. A polyadenylation site is underlined. The putative initiation codons of the three additional ORFs (ORF 1, ORF 2, and ORF 3) also are underlined and their transcriptional directions are indicated by arrows. The type l-specific sequence is encoded by exon 3. The type 2specific sequence immediately follows exon 2 and is underlined. A cryptic donor splice site adjacent to the type 2.specific sequence is overlined. A single Sp 1 binding site is indicated by the wavy arrow (-1.
THE 2091 tgttggccaa
gatggtcttg
2111 atctcctgac
HUMAN
ctcatgatct
ACID
SPHINGOMYELINASE
2131 gcccacctcg
2191 cagccctccc
tttctactct
2211 tatctccagc
caccctcctt
2291 ctggaaatga
tttccccctt
2311 ttttttaagt
2331 gctccagttttttcccacctt
2391 ctaactctta
tccctccccc
2411 atactcctgg
agccctctgc
2491 ctgtqaqctc
cttgcaggtq
2511 qggaa&t
gcctcccaaa
199
GENE 2151 gtgctgggat
tacaggcatg
2251
2231
caaaggtctg
gcagcataac
ctctctatgc
2171 aaccactgca 2271
cccagctgtg
f
tctttgctca
cccag tttc Ah I tgttggccct
atccatccca
2351 tgtcatcttc
cctctgtgtg
2371 gtccttgctt
cccattctag
2431 cctcagagtc
ttttgtgtca
2451 cacagaccca
ataattagaa
2471 ctgtttggtc
tctggctaga
catgtatqct
2531 tttaccctcc
acccaaatgc
2551 ccagcacaqq
aqqaccagga
2571 ttqgaacaag
tqttqacctc
B
mcaccATA
B
cWXCUW
B
B
ORF3+ 2671
tttqtttcag
EXON3 AMTCXWX
-
ClTcATcAAc
2711 TCCACCCATC CCCCACCI\CA
2731 ccnxAma
czcGmxa
2751 ACCTPClLCOC Tccmaxm
2771 CCAMTACACA Arqtgaqggc
2791 cagtagtggg
aacacggtgq
2811 tqctggggqa
caagcaggct
2831 cctqttqagc
tgqagcacct
2851 ctqggcacag
aagttttatt
ttcctqqcat
2891 tgttccctgq
ggattcagct
catggtcact
gttgaaagcc
ttcattcaqt
ccccctttct
ctaqccaggq
ctqcctqqac
ccctqqatqc
ctccctacta
EXON4 qB
ATpcGccAcAlTXQXXC
B
B
S
m
tgagqgtggg
aatagggaca
aaaattccct
tgagcatctc
accatccctg
ttgtcccatg
gaqtgqggag
agaacaggtt
3211 qqaqaaagaq
gtcttcctac
3251 ccctccctag
aatcttctga
3271 atgtagtacc
ttctggccag
2591 tcatqtttac 2691
2991 atcctEiJtt *ORF3 3091 aggacgqaga 3191 gctcctcact
2631
2911
2931
3131
ggcatcctat
3231 ctccccagat
3491 gqaqgttgga
qccagagcct
3411
3431 ATCCXgtga
gtgaqgcaqa
3451 aqqqagcctc
ccttatcctg
3471 qagttggtgq
qataggggaa
3531 gtgqcccctc
cctqgagtta
3551 cccttqctcc
ttgcccctcc
3571 aqtcaqcccc
acatccttgc
3511 gcaaagcatq
qqcagqatgt
3351
3631
3731
3751
3771 CCACAACCTC-
--
3811
3831
3851
3871
-s-m
3911
3931
3951
ll=wxcw----
lGxmcaa4011
4031 BcAARoccrcA
c=mm$?&---
4091
3671
---CCCAAACATA
~cATccAAclT~~-CCCCACCCAC
3991
3371
3651
cxAmaxx--
3691 3711 CCCCGACCCATAaCCCACrCCCILCCCTCK:BBB
3891
3171
3311
EXON6 3611 aqClTA&aXGKTAaxUTACATCCARA
3791
4111
4151 --
4191 B-B
4211
4291
4311
3971 ccAwwzx-COCAACCOCA
4051 AAAADCCCAA-
4131
---ACCATCCCCC4231
4071 cclTcAAccA4171 GfzmmwxBCCCOmGCC
4251
ccnccn;n:-
4271 clcGawwAAcAcAM;AG
~AccQcTAA[;T-
4331
4351
4371
cl@mwxa--BBBBs
Agataagaqa
4391
agacccctgt
CTAGWMqt
3331 ---B-B
B--B
3391 B--B
4411
gtgactqtcc
cctgattacc
3071
3151
ctgaagqctg
tcccaacaag
2971
3051
ggqtgagtgt
B
2071
2951
3031
3111
EXONS
2651
caatttcttc
4451
4431
tttccaggca
agcaqqqcaa FIG.
qqagatcttt
qqaqcaaqat
cataactqag
cttqqactcc
4471
g
I-Continued
(Promega) and kindly provided by Dr. Ruth Kornreich (Mount Sinai School of Medicine, NY). This library was screened at a density of -10,000 plaques/l50-mm petri dish using the full-length type 1 ASM cDNA, pASM-3 (Schuchman et al., 1991a). Filter transfers and plaque hybridizations were performed by standard methods (Sambrook et al., 1989). Random primed labeling of the cDNA probe was performed using [a-32P]CTP (-3000 Ci/mmol; Amersham) and a random
primed labeling kit according to the manufacturer’s instr, -:.:X3 (Amersham). Following three rounds of plaque purification, DNA was isolated from the positive clones by the plate lysate method (Sambrook et al., 19891 and analyzed by Southern hybridization with oligonucleotides (17-mers] spanning the entire coding region of the fulllength ASM cDNA. 5’-End-labeling of the oligonucleotides was performed with T4 polynucleotide kinase (New England Biolabs) and
200
SCHUCHMAN I kb
ORF
I 1.0
o!o
I
I 2.0
I
ET
I 3.0
AL. I
I 4.0
I
I 5.0
I
1 6.0
1 (101 residues):
MVACSHGCPQRGANSVMDCRCHPLHQGCILSPAHQHQPQPPAPALLSASTA APSVLTFSPRLMAVNRPLQIGQVRFPHPKTSRSRGTMRCNLAGWPWGERG* ORF
2 (104 residues):
MRLTGAPGAGGLGGLGGGFGTVGKEMFQDEKMSQCPQVEPRSRPQASDGLS TERLHTSTMSSSKRWTMDWHTAGGAIFSRLHSLMATEPTRATLGSFCSGRS AW*
ORF
3 (158 residues):
MYAFTLHPNAQHRRTRIGTSVDLSCLLCFRIGGFYALSPYPGLRLISLNMN FCSRENFWLLINSTDPAGQLQWLVGELQAAEDRGDKVRASSGNTVVLGDKQ APVELEHLWAQKFYFPGIPNKCSLGIQLMVTVESLHSVPLSLARAAWTPGC PDYHP* FIG. 2. Schematic representation of the human ASM gene. The solid black boxes and straight lines represent the exons respectively. The two putative initiation codons and the first in-frame stop codons also are indicated, as is the location of the Alul its transcriptional orientation. The location of ORFs 1, 2, and 3 are indicated and their transcriptional orientations are shown by predicted amino acid sequence of the three ORFs also is shown; the numbers in parentheses indicate the length of the predicted [T-~*P]ATP (>5000 tions were performed
Ci/mmol; Amersham). by standard methods
Oligonucleotide (Sambrook
hybridizaet al., 1989).
DNA sequencing and computer analyses. Dideoxy sequencing was performed by the method of Sanger et al. (1977) using Sequenase kits SalI/EcoRI restriction frag(United States Biochemical). An -8-kb ment that contained the entire ASM coding region was isolated from the ASM genomic clone, pASMg-1, and digested with HincII (Promega) to generate four fragments of about 2.8,2.0,1.7, and 1.5 kb. The genomic restriction fragments were subcloned into Bluescript SK(+) (Stratagene) or pGEM7Z(+) (Promega) vectors and sequenced in both orientations. Sequencing primers were synthesized on an Applied Biosystems DNA synthesizer using phosphoramidite chemistry (Itakura et al., 1984). Computer analyses were performed using the University of Wisconsin Genetics Computer Group DNA Sequence Analysis Software (version 7.0) and GenBank (release 67) and SwissProt (release 17) DNA and protein databases, respectively.
RESULTS
Isolation of Human ASM Genomic Clones Genomic clones encoding human ASM were isolated from a human genomic library using the full-length ASM cDNA, pASM-3 (Schuchman et al., 1991a), as a
and introns, element and arrows. The polypeptide.
probe. Nine putative positive plaques were purified and characterized by restriction enzyme and Southern hybridization analyses. The inserts ranged in size from about 12 to 20 kb and contained overlapping regions of the ASM gene. One clone, pASMg-1, contained an -3 kb SaZI/EcoRI fragment that hybridized with oligonucleotides constructed from both the 5’ and 3’ ends of the full-length ASM cDNA. This genomic fragment was isolated and sequenced as described above. Organization of the Human ASM Gene Figure 1 shows the entire ASM genomic sequence [4681 nucleotides (nt)] including 210 bp upstream from the first in-frame ATG. This sequence was determined from the 1.5-, 2.0-, and 2%kb HincII fragments of pASMg-1 (see Materials and Methods for details). The 5’ end of exon 1 (nt -87 in Fig. 1) was defined as the first nucleotide of the full-length ASM cDNA, pASM-3 (Schuchman et al., 1991a). An in-frame stop codon (TAA) was identified 15 bp upstream from this nucleotide. As shown schematically in Fig. 2, the ASM gene
THE
Alu
5’ +
HUMAN
ACID
SPHINGOMYELINASE
201
GENE
3’
GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGCGGATCACCTGAGGTCAGG II III II IllI III IIIIIIIIIIllIIIIllIIlIIIlllIIIII III1 IIIII llIIIllIII GCGAMXACGTCACCAAGTACGGACA TTAGGGTCGTGMACCC TCCGGCTCUUXCGTCTAGT ACTCCAGTCC AGTTCGAGACCAGCCTGGCCAACATGGTGGTG~CCCCGTCTCTACTAAAAATACAAAAATTAGCCGGGCGTGGTGG II II IIIIII I IIIIIIlllllIIIllIllII llIllIIllIIIlIlIIIIIIIIIll III IIIIIII TCTAGTTCTGGTAGAACCGGTTGTACCACTTTGGGGTAGMXTGATTTTTATGTTTTTAATCGAC!CC!ACACCACC
CGCGCGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGGAGGCGGAGGTTGCAG II II IIIIII IIIIII IllI IIIIIlllIIIlllI Ill IIIlIIIIIl IIIII IIIII II II GCACGTGGACATCAGGGTCCATGAACCCCCATGnACCCTCCGTC TTAACGAACTTGGGTCCTCCACCTCCGACATC TGAGCCGAGATCGCGCCACTGCACTCCAGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAAAAAAAAAAAAAA lllll lllll I IIIIIIIIIIllllIIIIllII lllllll III III IIIIIIllIIIIIIlllI/II ACTCGACTCTACCACGG TGACGTGAGGTC GGACCCGTTGTCTCGTTCTAAGGTAGWTTTTTTTTTTTTTTTTT 3' + 5' ASM FIG. 3. Ah1 sequence within intron (indicated on the bottom in bold capital nucleotide identity.
2 of the human ASM gene. Bestfit alignment of the Alul letters) and the Alul consensus sequence on top. Vertical
was composed of six exons ranging in size from 77 to 773 bp and five introns ranging in size from 153 to 1059 bp. Exon 2 was unusually large (773 bp) and encoded 258 amino acids, or -44% of the mature ASM polypeptide. Note that exon 3 encoded the type l-specific region, which is alternatively spliced in - 10% of the placental and fibroblast ASM transcripts (Quintern et al, 1989; Schuchman et al., 1991a). The type 2-specific sequence was found at the 5’ end of intron 2 (see Fig. 1). When the nucleotide sequences of the full-length ASM cDNA and genomic sequences were compared, three differences were found within the coding region. Two of these, within codons 322 and 506, were previously identified as ASM polymorphic sites (Schuchman et aZ., 1991a,b). The third difference was a 6-bp deletion (CTGGTG) near the 5’ end of the ASM genomic sequence (nt 103 to 108 in the full-length ASM cDNA; Schuchman et al., 1991a), which predicted a deletion of two amino acids (leucine and valine) within the putative ASM signal peptide. In addition, a difference in intron 2 was found between the sequence determined from the
element within intron 2 of the ASM gene lines between the two sequences represent
ASM genomic clone and a partial genomic sequence previously determined by PCR amplification and sequencing (Schuchman et al., 1991a). The length of the poly(T) tract in the genomic clone (beginning at nt 1881) was 30 nt, whereas in the sequence determined by PCR amplification the poly(T) tract was 23 nt. In intron 2 of the ASM genomic region there was a 291-bp sequence that had more than 90% nucleotide identity with the Alul consensus sequence (Britten et al., 1988) (Fig. 3); this Ah element was inserted in the reverse orientation relative to the ASM coding sequence. The 210-bp region upstream from the first in-frame ATG did not contain CAAT or TATA motifs; however, the sequence was GC rich (-66%) and contained one Spl binding site (nt -150 to -155). IntronlExon
Junctions
Each of the intron/exon junctions within the ASM gene had the gt/ag consensus donor/acceptor splice site sequences (Table 1). However, within donor splice site D3 there was a G to A transition at position -1 (underTABLE 1 lined in Table l), which is conserved in >90% of mamIntron/Exon Junctions in the Human ASM Gene” malian donor splice sites (Mount, 1982). Notably, this donor splice site was at the end of exon 3, which encoded Exon Intron Exon size 5’ Donor size 3’ Acceptor the 172-bp type l-specific region. This exon was alternanumber (nt) Codons splice site splice site (nt) tively spliced in about 10% of the ASM transcripts (Quintern et al., 1989; Schuchman et al., 1991a). At the 3’ 1 398 l-104 AG gtgagc (Dl) 464 tag 4 (Al) end of the type 2-specific region within intron 2 (see Fig. 773 105-362 AG gta@t (D2) 1059 2 tag 4 (A21 3 171 362-419 A& gtgagg (D3) 228 ias G (A31 1) there is a sequence, aag gtgaat, which may serve as a 4 77 420-446 AG gtaggc (D4) 201 tag G (A4) cryptic donor splice site. Thus, the occurrence of the 446-494 TG gtgagt (D5) 153 tag G (A5) 5 145 type 2-specific transcripts may be explained by alterna6 778 495-630 tive splicing of exon 3 (due to the weak D3 donor splice site) and use of the cryptic splice site adjacent to the type ’ The underlined residues represent divergences from the consensus sequences. 2-specific sequence.
202
SCHUCHMAN -1116
-1096
ccctcttcct
tacctagtcc
-1016 aacaaggaaa
tgatcgtgta
cctgcgatcc
aatcattcag
acctaaagac
TATA -796
-916 ttcacggacc
caaccacgaa
-716
-1036
agtccacact
aatataagaa
-956
attgccgaaa
ccctctctcc
attdtccatg
cctccccttc
-936
agagccctca
tccttccggt
ctgtgtggaa
ttccgaattg
-836
-956
tcg dqqttat
ccaaaacatc
aagaatccca
acaacactct
taacttctaa
taattaatat
CbAT/TATA -776
ccggtagtat
ttgttgaggt
-696
gtacadCdggggtgc
tccccagaca
-976
tttggtgcJmdaacccat
AL. -1056
atagtacact
-976
actacccaga
-896
-916
a
-1076
cttgtctaat
-996
tttttttggc
ET
-756
atttacgagc
gaaaatgaca
-676
cgatatacca
gaaatgccga
-736
gcacctatgt
gcctccaccc
-656
aggatcagag
aagtggtaga
tctgggttaa
cccaagggcg w
-636
gattccaaac
aaaggagtag
acttagtgtc
cactcactaa
CAAT
-616
-596
gagtcccact
gagttctccc
-576
gaccacgt
ctcgta
ccttccgcgg
-536
aaggacagtc
tccgttgtgt
gggtcaccct
cctaggacgg
NF-1
YFi
-516
-496
gcgagacgag
-556
cccttagctc
agaggatgga
-476
ggggaagggc
ccgatttgcg
-456
accccagcca
g_accgtcgat
-436
aaggccctta
gactcgcgcc
taagactgtt
tcctctgcag
NF-1 -416
-396
aaggtggctg
gtgtagtgta
-316
-376
cctcgaggtt
tcgtgctcgt
-296
gggggcgtcg
ggcacggggc
-356
gccggagggc
cccgacactg
-276
cccgtcccgc
ccccgtccct
-336
gagttccgcc
tcggggacca
-256
ctccccc
cc
ttagcccc
ctggagtccc
tctcaggggt
-236
c cagggccctc
gcggggcggg
ggaggcggag
gcgtcgcaac
-216 twcg FIG. 4. Putative promoter region of the human ASM gene. Spl lined) binding sites are indicated. Where appropriate, the orientation
Analysis
of the
ASM Promoter
(-),
Region
Because no TATA or CAAT promoter elements were identified in the 210 bp upstream of the first in-frame ASM initiation codon, further DNA sequencing of the pASMg-1 genomic insert was performed. Figure 4 shows an additional 906 nt of upstream sequence. The putative promoter elements identified included four SP-1 binding sites (nt -256 to -261, -267 to -273, -285 to -290, and -715 to -720), two TATA boxes (nt -863 to -868 and -894 to -898), two CAAT boxes (nt -704 to -709 and -867 to -872), one APl site (nt -604 to -609), and two NF-1 sites (nt -461 to -466 and -583 to -587). Overall, this region was GC rich (-63%), suggesting that it was a component of an HTF island. ORFs in the Human
ASM
Gene
In addition to sequences coding for the ASM polypeptide, three other open reading frames (ORFs) were identified in this genomic sequence that may encode functional proteins (Fig. 2). The predicted polypeptides contained 101, 104, and 158 amino acid residues, respectively. The transcriptional orientations of ORF 1 (Fig. 1, nt 176 to 485) and ORF 2 (Fig. 1, nt 753 to 1067) were opposite those of ASM, and the predicted proteins shared no homology with ASM or any other proteins in the Swiss-Prot protein database. In contrast, ORF 3 (Fig. 1, nt 2517 to 2998) was in the same transcriptional orientation and coding phase as the ASM gene. This ORF began within intron 2, overlapped ASM exon 3, and extended into intron 3. DISCUSSION
In this communication the genomic organization and complete nucleotide sequence of the gene encoding hu-
TATA (boxed), of the consensus
CAAT (boxed), NF-1 (underlined), sequence is indicated by arrows.
and Ap-1
(under-
man ASM are described. This housekeeping gene is small (i.e., about 5 kb) and the coding region is divided into six exons. Analysis of the genomic sequence documented the occurrence of alternative splicing at the ASM locus (Quintern et al., 1989; Schuchman et al., 1991a) and further clarified the molecular mechanisms underlying these alternative transcripts. The type l-specific 172-bp sequence was encoded by exon 3, whereas the type 2-specific 40-bp sequence was located at the 5’ end of intron 2, followed by a potential cryptic donor splice site. Furthermore, there was a poor donor splice site (D3; AAA gtgagg) at the exon 3/intron 3 junction. Thus, the occurrence of the type 2 and 3 ASM transcripts resulted from the fact that in about 10% of the ASM transcripts the donor site D3 was not functional, and splicing proceeded either to the cryptic donor splice site (indicated by the overline in Fig. 1) or to donor site D2. The G to A transition of the nucleotide immediately adjacent to the invariant gt consensus dinucleotide in D3 (underlined in Table 1) may cause these alternative splicing events, since this alteration was previously shown to be the cause of abnormal splicing in the procul(1) collagen gene, resulting in Ehlers-Danlos syndrome type VII (Weil et al., 1989). A single Alul element was found within the ASM genomic region. This Alul element may be placed into the “a branch” according to the classification of Jurka and Smith (1988), indicating the ancestral nature of the ASM gene. In addition, three other long ORFs were identified within the ASM genomic region. Although it is not known whether these genomic sequences are transcribed into functional RNAs, there is some precedent for overlapping transcriptional units within lysosomal enzyme genes. For example, within the first intron of the
THE
HUMAN
ACID
SPHINGOMYELINASE
TABLE Genes Gene a-N-Acetylgalactosaminidase Acid phosphatase Acid sphingomyelinase a-Galactosidase A @-Glucosidase P-Glucuronidase P-Hexosaminidase a-chain B-Hexosaminidase P-chain
Encoding
2
Lysosomal
Symbol
Chromosomal location
NAGA ACP2 SMPDl GLA GBA GUSB HEXA HEXB
22q13-qter llpll llp15.1-p15.4 xq21.3-q22 lq21 7q21.2-q22 15q23-q24 5q13
murine ,&glucuronidase structural gene there is an RNA polymerase II promoter motif that drives transcription of an -2.2-kb liver transcript that shares little homology with @-glucuronidase (Wang et al., 1988). To date, the function of this transcript remains unknown. A number of putative regulatory elements were identified within the upstream -1 kb of the ASM gene. This region was GC rich and contained five SPl binding sites. In addition, TATA, CAAT, APl, and NF-1 binding sites were identified. Because the precise site of transcription initiation has not been determined for ASM, no conclusions can be drawn about the functional relevance of these sequences. However, the fact that these regulatory sequences are within 1 kb of the ASM coding region suggests that they comprise all or part of the ASM pro-
Transcription start site(s)
Gene a-N-Acetylgalactosaminidase Acid
phosphatase
Acid
sphingomyelinase
a-Galactosidase
--6
Nucleotides upstream
to -23
nd
A
-60
--657
@Glucuronidase
o Transcriptional
Regions
-347”
fl-Glucosidase
P-Hexosaminidase
Promoter
a-chain start
Enzymes Number of exons
Gene
9 11 6 7 11 12 14 14
13,709 -9 kb 4,708 12,436 6,877 -21 kb -35 kb -40 kb
of Genes
Encoding
analyzed from ATG
Percentage GC
Lysosomal
sp-1
-678 -644 -161 -394 -456 -704 -867
-364 -410 -60 -326
IR
-354 -445
-150 -256 -267 -285 -715 -63 -207
APl NFl
-604 -461 -583
API OCTA
-153 -835 -889 +70 +78 -274 -290 -307 -323
56
none
590
59
none
1117
63
-863 -894
1179
59
-86 -93 -102 -129
200
72
nd
480
66
-247
site for the 3.6-kb
transcript
Enzymes CAAT
-30 -126
start
Wang and Desnick, 1991 Geier et al., 1989 Schuchman et al., 1991 Kornreich et al., 1989 Horowitz et al., 1989 Miller et al., 1990 Proia and Soravia, 1987 Proia, 1988
TATA
1400
1011
transcript;
Ref.
3
-680 -691 none
site for the 2.2-kb
size
moter. This is supported by the fact that transgenic mice containing the human ASM genomic region, including about 1.5 kb of upstream sequences, express human ASM activity at high levels (Schuchman et al., unpublished results). Clearly, further studies (e.g., in vitro mutagenesis and expression experiments) are required to definitively map the ASM control region and determine the significance of these putative regulatory sequences. ASM is the eighth human lysosomal gene for which the genomic organization has been determined and the fourth to be completely sequenced (Table 2). In addition to ASM, sequences encoding cu-N-acetylgalactosaminidase (Wang and Desnick, 1991), acid phosphatase (Geier et aZ., 1989), a-galactosidase A (Kornreich et al., 1989), fl-glucosidase (Reiner et al., 1988; Horowitz et cd.,
TABLE Putative
203
GENE
unknown.
-71 -104 -146 -178 -203
-754 -760 none
-287
Other
c-fos enhancer Chorion enhancer
none -248 -64 f26 nd
AP2
-164
204
SCHUCHMAN
1989), fi-glucuronidase (Miller et al., 1990; Shipley et al., 1991), and the LY and /3 chains of fi-hexosaminidase (Proia and Soravia, 1987; Proia, 1988) have been reported. Of these, nucleotide sequences of the promoter regions are available for seven (Table 3). Aside from the fact that all of these upstream sequences are GC rich, indicating that they may be components of HTF islands, analysis of these regions has not provided any consensus sequence for a lysosomal gene-specific promoter element. Although the promoter regions of many housekeeping genes are GC rich and lack TATA and/or CAAT motifs, the genes encoding a-galactosidase A, ,&hexosaminidase a-chain, /3-glucosidase, and ASM contained these consensus sequences. To date, mutagenesis and expression studies have been performed for three lysosomal gene promoter regions, /3-glucosidase (Horowitz et al., 1989), P-glucuronidase (Shipley et al., 1991), and acid phosphatase (Geier et al., 1989). In the human @-glucosidase gene, a 650-bp genomic fragment containing the putative control region (including two TATA and two CAAT motifs) was inserted upstream from the bacterial chloramphenicol acetyltransferase (CAT) gene and transfected into various human cells. The functional integrity of this regulatory region was demonstrated and, surprisingly, tissue-specific expression was observed. For human fl-glucuronidase, deletion analysis of minigene constructs demonstrated that the 200 bp of sequence upstream from the translation initiation site was sufficient for maximal expression in COS cells. This region was GC rich, but did not contain TATA or CAAT elements. For human acid phosphatase, a 590-bp upstream region that was GC rich and lacked a TATA element was shown to possess promoter activity by expression analysis of CAT constructs in COS cells. In summary, the gene encoding human ASM has been isolated, sequenced, and characterized. These studies have confirmed the nature of alternative splicing at the ASM locus and should facilitate further characterization of this important lysosomal hydrolase. In addition, the availability of the ASM genomic sequence should facilitate further analysis of the mutations that cause Types A and B NPD. To date, analysis of Ashkenazi Jewish patients with Types A and B NPD has revealed two mutations within the ASM gene, R496L and AR608 (Levran et al., 1991a,b). Interestingly, both of these mutations are in exon 6, suggesting that this region may encode all or part of the ASM catalytic site. Clearly, future studies of the ASM mutations causing NPD will provide further insights into the functional organization of the ASM polypeptide. ACKNOWLEDGMENTS The authors thank Dr. Tsutomu Takahashi for assisting in the DNA sequencing and Dr. David Bishop for computer analysis of the ASM promoter region. This work was supported by March of Dimes Basic Research Grant 1-1224, American Cancer Society Basic Investi-
ET
AL.
gation Grant CD-62521, Research Grant 1 ROl HD28607 from the National Institutes of Health, and Grant 5 MO1 RR00071 for the Mount Sinai General Clinical Research Center from the National Center for Research Resources, National Institutes of Health.
REFERENCES Barenholz, Y., Roitman, A., and Gatt, S. (1966). Enzymatic hydrolysis of sphingolipids. II. Hydrolysis of sphingomyelin by an enzyme from rat brain. J. Biol. Chem. 241: 3’731-3737. Barenholz, Y., and Gatt, S. (1982). Sphingomyelinases. In “Phospholipids” (J. N. Hawthorne, J. B. Ansell, and R. M. C. Dawson, Eds.), pp. 129-177, Elsevier, New York. Brady, R. O., Kanfer, J. N., Mock, M. B., and Fredrickson, D. S. (1966). The metabolism of sphingomyelin. II. Evidence of an enzymatic deficiency in Niemann-Pick disease. Proc. Natl. Acad. Sci. USA 55: 366-370. Britten, R. J., Baron, W. F., Stout, D. B., and Davidson, E. H. (1988). Sources and evolution of human Alu repeated sequences. Proc. Natl. Acad. Sci. USA 86: 4770-4774. Geier, C., Von Figura, K., and Pohlmann, R. (1989). Structure of the human acid phosphatase gene. FEBS Lett. 13: 611-616. Horowitz, M., Wilder, S., Horowitz, Z., Reiner, O., Gelbart, T., and Beutler, E. (1989). The human glucocerebrosidase gene and pseudogene: Structure and evolution. Genomics 4: 87-96. Itakura, K., Rossi, J. J., and Wallace, R. B. (1984). Synthesis and use of synthetic oligonucleotides. Annu. Reu. Biochem. 53: 323-356. Jurka, J., and Smith, T. (1988). A fundamental division in the Alu family of repeated sequences. Proc. Natl. Acad. Sci. USA 85: 47754778. Koval, M., and Papano, R. E. (1991). Intracellular tabolism of sphingomyelin. Biochem. Biophys.
transport Acta 1082:
Kornreich, R., Desnick, R. J., and Bishop, D. F. (1989). sequence of the human a-galactosidase A gene. Nucleic 17: 3301-3302.
and me113-125. Nucleotide Acids Res.
Levran, O., Desnick, R. J., and Schuchman, E. H. (1991). NiemannPick disease: A frequent missense mutation in the acid sphingomyelinase gene of Ashkenazi Jewish type A and B patients. Proc. Natl. Acad. Sci. USA 88: 3748-3752. Levran, O., Desnick, R. J., and Schuchman, E. H. (1991). NiemannPick type B disease: Identification of a single codon deletion in the acid sphingomyelinase gene and genotype/phenxpressiootype correlations in type A and B patients. J. Clin. Znuest. 88: 806-810. Miller, R. D., Hoffmann, J. W., Powell, P. P., Kyle, J. J. M., Bachinsky, D. R., and Sly, W. S. (1990). Cloning terization of the human P-glucuronidase gene. Genomics Mount, S. M. (1982). A catalogue of splice site junction Nucleic Acids Res. 10: 459-472. Pereira, L., Desnick, R. J., Adler, D., Disteche, C. M., and E. H. (1991). Regional assignment of the human acid elinase gene by PCR analysis of somatic cell hybrids hybridization to llp15.1-~15.4. Genomics 9: 8531-8539.
W., Shipley, and charac‘7: 280-283. sequences. Schuchman, sphingomyand in situ
Proia, R. L. (1988). Gene encoding the human P-hexosaminidase pchain: Extensive homology of intron placement in the cy and P-chain genes. Proc. Natl. Acad. Sci. USA 95: 188331887. Proia, R. L., and Soravia, E. (1987). Organization of the gene encoding the human @-hexosaminidase a-chain. J. Biol. Chem. 262: 56775681. Quintern, L. E., Weitz, G., Nehrkorn, H., Tager, J. M., Schram, A. W., and Sandhoff, K. (1987). Acid sphingomyelinase from human urine: Purification and characterization. Biochim. Biaphys. Acta 922: 323-336. Quintern, L. E., Schuchman, E. H., Levran, O., Suchi, M., Ferlinz, K., Reinke, H., Sandhoff, K., and Desnick, R. J. (1989). Isolation of cDNA clones encoding human acid sphingomyelinase: Occurrence of alternatively processed transcripts. EMBO J. 8: 2469-2473.
THE
HUMAN
ACID SPHINGOMYELINASE
Reiner, O., Wigderson, M., and Horowitz, M. (1988). Structural analysis of the human glucocerebrosidase genes. DNA 7: 107-116. Sambrook, J., Fritsch, E. F., and Maniatis, T. A. (1989). “Molecular Cloning: A Laboratory Manual,” Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Sanger, F., Nickelson, J., and Coulson, A. R. (1977). DNA sequencing with chain terminating inhibitors. Proc. Natl. Acad. Sci. USA 74: 5463-5467. Schneider, P. D., and Kennedy, E. P. (1967). Sphingomyelinases in human tissues. III. Expression of Niemann-Pick disease in cultured fibroblasts. J. Lipid Res. 8: 202-208. Schuchman, E. H., Suchi, M., Takahashi, T., Sandhoff, K., and Desnick, R. J. (1991a). Human acid sphingomyelinase: Isolation, nucleotide sequence and expression of the full-length and alternatively spliced cDNAs. J. Biol. Chem. 66: 8531-8539. Schuchman, E. H., Levran, O., Suchi, M., andDesnick, R. J. (1991b). An MspI polymorphism in the human acid sphingomyelinase gene (SMPDl) at llp15.1-~15.4. Nucleic Acids Res. 19: 3160. Shipley, J. M., Miller, R. D., Wu, B. M., Grubb, J. H., Christensen,
GENE
205
S. G., Kyle, J. K., and Sly, W. S. (1991). Analysis of the 5’ flanking region of the human @-glucuronidase gene. Genomics 10: 1009-1018. Spence, M. W., and Callahan, J. W. (1989). Sphingomyelin-cholesterol lipidoses: The Niemann-Pick group of diseases. In “The Metabolic Basis of Inherited Diseases” (C. R. Striver, A. L. Beaudet, W. S. Sly, and D. Valle, Eds.), 8th ed., pp. 1655-1676, McGraw-Hill, New York. Wang, A., and Desnick, R. J. (1991). Structural organization and complete sequence of the human a-N-acetylgalatosamindase gene: Homology with the cu-galactosidase gene provides evidence for evolution from a common ancestral gene. Genomics 10: 133-142. Wang, B., Korfhagen, T. R., Gallagher, P. M., D’Amore, M. A., McNeish, J., Potter, S. S., and Ganschow, R. E. (1988). Overlapping transcriptional units on the same strand within the murine fl-glucuronidase gene complex. J. Biol. Chem. 263: 15841-15844. Weil, D., D’Alessio, M., Ramirez, F., De Wet, W., Cole, W. G., Chan, D., and Bateman, J. F. (1989). A base substitution in the exon of a collagen gene causes alternative splicing and generates a structurally abnormal polypeptide in a patient with Ehlers-Danlos syndrome type VII. EMBO J. 8: 1705-1710.