CENOMICS

12,

197-205

(19%)

Structural Organization and Complete Nucleotide Sequence of the Gene Encoding Human Acid Sphingomyelinase (SAWDl) EDWARD H. SCHUCHMAN,’ Division of Medical

and Molecular

ORNA LEVRAN, LYGIA V. PEREIRA, AND ROBERT J. DESNICK Genetics, Mount

Sinai School of Medicine,

ReceivedSeptember5,

Acid sphingomyelinase (ASM; HGMW-approved symbol, SMPDZ) is the lysosomalphosphodiesterasethat hydrolyzes sphingomyelin to ceramide and phosphocholine.The deficient activity of this enzyme results in Types A and B NiemannPick disease(NPD). The full-length cDNA encoding human ASM has been isolated and characterized (E. II. Schuchman, M. Suchi, T. Takahashi, K. Sandhoff, and R. J. Desnick (1991) J. Biol. Chem.66:8531-8639), and the ASM genehas been localized to chromosomalregion llp15.1-~15.4 (L. V. Pereira, R. J. Desnick, D. Adler, C. M. Disteche, and E. II. Schuchman(1991) Genomics9:229-234). Using the cDNA as a probe, a genomic clone containing the ASM genomic region was isolated and the complete nucleotide sequenceof the human ASM gene, including 1116 and 468 nucleotidesupstream and downstream from the ASM coding region, respectively, was determined. This housekeepinggenecontained six exons ranging in size from 77 to 773 bp and five introns ranging in size from 153 to 1059 bp. Exon 2 was unusually large and encoded256 amino acids, or about 44% of the mature ASM polypeptide. The alternatively spliced 172-bp type l-specific sequencewas encodedby exon 3, whereas the type 2-specific sequencewas located at the 5’ end of intron 2. An analysis of the intron/exon junctions revealed that there was a weak donor splice site (AAA gtgagg) at the exon 3jintron 3 junction which occasionally leadsto alternative splicing of exon 3 and the occurrence of the type 2 and 3 ASM transcripts. A single Ah1 elementin the reverse orientation was in intron 2, immediately downstream from the type a-specific sequence.The regulatory region upstream of the ASM coding sequencewas GC rich and contained putative promoter elements,including SPl, TATA, CAAT, NF-1, and AP-1 binding sites. Intriguingly, the ASM genomicregion encodedthree other long open reading frames (ORFs) which predicted polypeptides of 101, 104, and I58 amino acid residues,respectively. o lee2 Academic Press, Inc.

INTRODUCTION

Acid sphingomyelinase (ASM;’ sphingomyelin phosphodiesterase, EC 3.1.4.12) is the lysosomal hydrolase Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under Accession Nos. M81780, M81781. ’ To whom correspondence should be addressed at Division of Medical and Molecular Genetics, Mount Sinai School of Medicine, 100th Street and Fifth Avenue, New York, NY 10029. ’ HGMW-approved symbol, SMPDl.

New York, New York 10029

1991

required for the enzymatic degradation of sphingomyelin to ceramide and phosphocholine (for reviews, see Barenholz and Gatt, 1982; Koval and Papano, 1991). Although the human enzyme was first isolated in 1966 (Barenholz et al.), purified preparations have been obtained only recently. The most highly purified preparation was a 70-kDa glycoprotein with a pH optimum of about 4.5 obtained from urine (Quintern et aZ., 1987). In 1966, Brady et al. identified ASM as the primary enzymatic deficiency in Type A Niemann-Pick disease (NPD). This finding was soon confirmed by Schneider and Kennedy (1967), who also described several other NPD patients with deficient ASM activity, but mild symptomology (i.e., Type B NPD). Type A NPD is a severe, neurodegenerative disease of infancy and generally leads to death by 2 to 3 years of age (Spence and Callahan, 1989). In contrast, Type B NPD patients have little or no neurologic involvement and may survive into adulthood. Recently, three distinct human ASM cDNAs were isolated, characterized, and expressed (Quintern et al., 1989; Schuchman et al., 1991a), the first molecular lesions that cause human NPD were identified (Levran et al., 1991a,b), and the human ASM gene was localized to chromosomal region llp15.1-~15.4 (Pereiraetal., 1991). In this communication, the structural organization and complete nucleotide sequence of the gene encoding human ASM are described. This housekeeping gene is small (“5 kb) and is composed of six exons. Interestingly, exon 2 encoded about 44% of the mature ASM polypeptide. Analysis of the genomic region demonstrated that each of the three ASM transcripts previously identified (Schuchman et al., 1991a) was derived from alternative splicing of a single hnRNA: i.e., type 1 transcripts occurred from normal splicing events, whereas type 2 and 3 transcripts occurred from alternative splicing of the type l-specific exon 3. In addition, three other open reading frames (ORFs) were identified within the ASM genomic region and predicted polypeptides of 101, 104, and 158 amino acid residues, respectively. MATERIALS Genomic library sert size -10-15

197

AND

METHODS

screening. A human genomic library (average kb) was constructed in the phage vector EMBL

in3

o&B-7543/92 $3.00 Copyright 0 1992 by Academic Press, Inc. All rights of reproduction in any form reserved.

198

SCHUCHMAN

-210

tcgacagccg

cccgccaccg

-190

agagatcagc

tgtcagagat

-90 EXONl

-110

gagaagggta

atcgggtgtc

CCCGaxEG

-170

cagaggaaga

AL.

ggaaggggcg

-150

gagctgcttt

gcggccggcc

SPl-50

-70

-

ET

TcuxuPx

B

-130

ggagcagtca

-30

cwuxGGc

fzwsxEu

Acmwc&x

111

131

I.51

171 cxtxmsc~~cTc

191 211 BBBBBBBBBCrCTATTCAC

231

251

271

aggccgaaag$$tgctggg

gctgggggct3gg~qctgatg

&-s--B-

291

311

CGccATcAAc

B

AGqtgagcac

391

tgaaggggct

411

gggctcagaa

491

cattgtgacc

591

aagcgatggc

tgcatccctg tttgtgaagt

atggagaggq

511

aagaaataat

611

ctggtgctgc

ctgagttaca

aaggggtggt

ggccaggggt

691

tggcatctac gcagacagtg

gcagtggagg

431

aatccatcac

531

cctgaggaag

631

gggcaatatc

tggaaggcaa

tggcctggtt

cctctgctct

711

aatggaggcc

331

tgagtttgct tcagcttgcc

451

cccctttggg

551

aagcaaaggc

651

aggtgtgcac

tgagcttggt

gcctctgatt

tc@ccatg

731

gccgactaca

gacacccatg ctcatgccac

ORFl+

ctggtgcgct

471

gctacatgcc

acE&xcc

cORF1

571

aggccgctga

671

gcactgagtc

ctgcccagcc

cgctcctccc

actgcagA&G

751

771

791 911 TixcmzxTBCOCATCAACCBCClYIRROATA

831

oRF2-e iiismaAB--

891 911 BB-CCCATCrCACCOCTCTCCCCsBBACATlTlYlTC

931

951

971

991 1011 ATcJmmx-CAACCCCCCC-sCB~--

1031

1051

1071

1151

B

+ORF2

871

gctaaagaag ccagtttgga

EXON2 CAADCCAATC

1171

BCTACCrCCAC--ararrrarrr-BB-CCATaCOCCCB 1191 ---DCOCCK;ACC----Bs

I.211

1231

1251

I.271

1291 1311 AcAcGAI;AcAT0003(;CACA-B--B---

l331

xi51

1371

l391 1411 camaaGT~----cccDclTcAT-~~

1431

1451

1471

1491 TCAACCCATC--m-B 1591 gtgaatgaaa

1891 tt

ttttttt

1631 ggggcattgt

ctctgattgc

tgccgcacca

ggcttttttt

tttttttttt

agcaatcctc

ccgcctcggc

ctcccaaaat

gagatggaat tcccaagtac

ttttagttta

gtttttgtag

tgcccaggct

gctgggacca

caggcatcag

aggtgcacgc

ctcttcattt

ggctccccta

aqacaagatc

ttgctatgtt

ctactgctcc

tggccctccc

ggcaccatct

cagctcacta

cagcctccac

aqctaatttt

tgtattttta

gtagagatgg

gcccaggctg

1871 tttttttttt

1971

2051 caccacaccc

atctgactcc

1771

1951 ggagtgcagt

gaaaagaXg

1671

1851

2031 ctgggactac

cccaggaagg

1751

1931

2011 tgcctcagcc

gtccttagtg

1831

cttgctctgt

1571 tccgtggaaa

1651 tctagcatga

1731

1911 tttttttttt

1991 caagcaattc

aagggaacct

1811 cctaacctca

1551 gtacttatcg

1711 tctactgttt

1791 gtctcaaaca

1531

1611 gtgaagggag

1691 tccttccctt

I.511

ctcctqggtt

2071 ggtttcacca

FIG. 1. Nucleotide sequence of the gene encoding human ASM. Exonic sequences are shown in bold uppercase letters. Intronic sequences are in lowercase. The two putative ASM initiation codons and the first in-frame stop (TAG) codon are indicated by a double underline. A single Alul sequence within intron 2 is boxed and its transcriptional orientation is indicated by an arrow. A polyadenylation site is underlined. The putative initiation codons of the three additional ORFs (ORF 1, ORF 2, and ORF 3) also are underlined and their transcriptional directions are indicated by arrows. The type l-specific sequence is encoded by exon 3. The type 2specific sequence immediately follows exon 2 and is underlined. A cryptic donor splice site adjacent to the type 2.specific sequence is overlined. A single Sp 1 binding site is indicated by the wavy arrow (-1.

THE 2091 tgttggccaa

gatggtcttg

2111 atctcctgac

HUMAN

ctcatgatct

ACID

SPHINGOMYELINASE

2131 gcccacctcg

2191 cagccctccc

tttctactct

2211 tatctccagc

caccctcctt

2291 ctggaaatga

tttccccctt

2311 ttttttaagt

2331 gctccagttttttcccacctt

2391 ctaactctta

tccctccccc

2411 atactcctgg

agccctctgc

2491 ctgtqaqctc

cttgcaggtq

2511 qggaa&t

gcctcccaaa

199

GENE 2151 gtgctgggat

tacaggcatg

2251

2231

caaaggtctg

gcagcataac

ctctctatgc

2171 aaccactgca 2271

cccagctgtg

f

tctttgctca

cccag tttc Ah I tgttggccct

atccatccca

2351 tgtcatcttc

cctctgtgtg

2371 gtccttgctt

cccattctag

2431 cctcagagtc

ttttgtgtca

2451 cacagaccca

ataattagaa

2471 ctgtttggtc

tctggctaga

catgtatqct

2531 tttaccctcc

acccaaatgc

2551 ccagcacaqq

aqqaccagga

2571 ttqgaacaag

tqttqacctc

B

mcaccATA

B

cWXCUW

B

B

ORF3+ 2671

tttqtttcag

EXON3 AMTCXWX

-

ClTcATcAAc

2711 TCCACCCATC CCCCACCI\CA

2731 ccnxAma

czcGmxa

2751 ACCTPClLCOC Tccmaxm

2771 CCAMTACACA Arqtgaqggc

2791 cagtagtggg

aacacggtgq

2811 tqctggggqa

caagcaggct

2831 cctqttqagc

tgqagcacct

2851 ctqggcacag

aagttttatt

ttcctqqcat

2891 tgttccctgq

ggattcagct

catggtcact

gttgaaagcc

ttcattcaqt

ccccctttct

ctaqccaggq

ctqcctqqac

ccctqqatqc

ctccctacta

EXON4 qB

ATpcGccAcAlTXQXXC

B

B

S

m

tgagqgtggg

aatagggaca

aaaattccct

tgagcatctc

accatccctg

ttgtcccatg

gaqtgqggag

agaacaggtt

3211 qqaqaaagaq

gtcttcctac

3251 ccctccctag

aatcttctga

3271 atgtagtacc

ttctggccag

2591 tcatqtttac 2691

2991 atcctEiJtt *ORF3 3091 aggacgqaga 3191 gctcctcact

2631

2911

2931

3131

ggcatcctat

3231 ctccccagat

3491 gqaqgttgga

qccagagcct

3411

3431 ATCCXgtga

gtgaqgcaqa

3451 aqqqagcctc

ccttatcctg

3471 qagttggtgq

qataggggaa

3531 gtgqcccctc

cctqgagtta

3551 cccttqctcc

ttgcccctcc

3571 aqtcaqcccc

acatccttgc

3511 gcaaagcatq

qqcagqatgt

3351

3631

3731

3751

3771 CCACAACCTC-

--

3811

3831

3851

3871

-s-m

3911

3931

3951

ll=wxcw----

lGxmcaa4011

4031 BcAARoccrcA

c=mm$?&---

4091

3671

---CCCAAACATA

~cATccAAclT~~-CCCCACCCAC

3991

3371

3651

cxAmaxx--

3691 3711 CCCCGACCCATAaCCCACrCCCILCCCTCK:BBB

3891

3171

3311

EXON6 3611 aqClTA&aXGKTAaxUTACATCCARA

3791

4111

4151 --

4191 B-B

4211

4291

4311

3971 ccAwwzx-COCAACCOCA

4051 AAAADCCCAA-

4131

---ACCATCCCCC4231

4071 cclTcAAccA4171 GfzmmwxBCCCOmGCC

4251

ccnccn;n:-

4271 clcGawwAAcAcAM;AG

~AccQcTAA[;T-

4331

4351

4371

cl@mwxa--BBBBs

Agataagaqa

4391

agacccctgt

CTAGWMqt

3331 ---B-B

B--B

3391 B--B

4411

gtgactqtcc

cctgattacc

3071

3151

ctgaagqctg

tcccaacaag

2971

3051

ggqtgagtgt

B

2071

2951

3031

3111

EXONS

2651

caatttcttc

4451

4431

tttccaggca

agcaqqqcaa FIG.

qqagatcttt

qqaqcaaqat

cataactqag

cttqqactcc

4471

g

I-Continued

(Promega) and kindly provided by Dr. Ruth Kornreich (Mount Sinai School of Medicine, NY). This library was screened at a density of -10,000 plaques/l50-mm petri dish using the full-length type 1 ASM cDNA, pASM-3 (Schuchman et al., 1991a). Filter transfers and plaque hybridizations were performed by standard methods (Sambrook et al., 1989). Random primed labeling of the cDNA probe was performed using [a-32P]CTP (-3000 Ci/mmol; Amersham) and a random

primed labeling kit according to the manufacturer’s instr, -:.:X3 (Amersham). Following three rounds of plaque purification, DNA was isolated from the positive clones by the plate lysate method (Sambrook et al., 19891 and analyzed by Southern hybridization with oligonucleotides (17-mers] spanning the entire coding region of the fulllength ASM cDNA. 5’-End-labeling of the oligonucleotides was performed with T4 polynucleotide kinase (New England Biolabs) and

200

SCHUCHMAN I kb

ORF

I 1.0

o!o

I

I 2.0

I

ET

I 3.0

AL. I

I 4.0

I

I 5.0

I

1 6.0

1 (101 residues):

MVACSHGCPQRGANSVMDCRCHPLHQGCILSPAHQHQPQPPAPALLSASTA APSVLTFSPRLMAVNRPLQIGQVRFPHPKTSRSRGTMRCNLAGWPWGERG* ORF

2 (104 residues):

MRLTGAPGAGGLGGLGGGFGTVGKEMFQDEKMSQCPQVEPRSRPQASDGLS TERLHTSTMSSSKRWTMDWHTAGGAIFSRLHSLMATEPTRATLGSFCSGRS AW*

ORF

3 (158 residues):

MYAFTLHPNAQHRRTRIGTSVDLSCLLCFRIGGFYALSPYPGLRLISLNMN FCSRENFWLLINSTDPAGQLQWLVGELQAAEDRGDKVRASSGNTVVLGDKQ APVELEHLWAQKFYFPGIPNKCSLGIQLMVTVESLHSVPLSLARAAWTPGC PDYHP* FIG. 2. Schematic representation of the human ASM gene. The solid black boxes and straight lines represent the exons respectively. The two putative initiation codons and the first in-frame stop codons also are indicated, as is the location of the Alul its transcriptional orientation. The location of ORFs 1, 2, and 3 are indicated and their transcriptional orientations are shown by predicted amino acid sequence of the three ORFs also is shown; the numbers in parentheses indicate the length of the predicted [T-~*P]ATP (>5000 tions were performed

Ci/mmol; Amersham). by standard methods

Oligonucleotide (Sambrook

hybridizaet al., 1989).

DNA sequencing and computer analyses. Dideoxy sequencing was performed by the method of Sanger et al. (1977) using Sequenase kits SalI/EcoRI restriction frag(United States Biochemical). An -8-kb ment that contained the entire ASM coding region was isolated from the ASM genomic clone, pASMg-1, and digested with HincII (Promega) to generate four fragments of about 2.8,2.0,1.7, and 1.5 kb. The genomic restriction fragments were subcloned into Bluescript SK(+) (Stratagene) or pGEM7Z(+) (Promega) vectors and sequenced in both orientations. Sequencing primers were synthesized on an Applied Biosystems DNA synthesizer using phosphoramidite chemistry (Itakura et al., 1984). Computer analyses were performed using the University of Wisconsin Genetics Computer Group DNA Sequence Analysis Software (version 7.0) and GenBank (release 67) and SwissProt (release 17) DNA and protein databases, respectively.

RESULTS

Isolation of Human ASM Genomic Clones Genomic clones encoding human ASM were isolated from a human genomic library using the full-length ASM cDNA, pASM-3 (Schuchman et al., 1991a), as a

and introns, element and arrows. The polypeptide.

probe. Nine putative positive plaques were purified and characterized by restriction enzyme and Southern hybridization analyses. The inserts ranged in size from about 12 to 20 kb and contained overlapping regions of the ASM gene. One clone, pASMg-1, contained an -3 kb SaZI/EcoRI fragment that hybridized with oligonucleotides constructed from both the 5’ and 3’ ends of the full-length ASM cDNA. This genomic fragment was isolated and sequenced as described above. Organization of the Human ASM Gene Figure 1 shows the entire ASM genomic sequence [4681 nucleotides (nt)] including 210 bp upstream from the first in-frame ATG. This sequence was determined from the 1.5-, 2.0-, and 2%kb HincII fragments of pASMg-1 (see Materials and Methods for details). The 5’ end of exon 1 (nt -87 in Fig. 1) was defined as the first nucleotide of the full-length ASM cDNA, pASM-3 (Schuchman et al., 1991a). An in-frame stop codon (TAA) was identified 15 bp upstream from this nucleotide. As shown schematically in Fig. 2, the ASM gene

THE

Alu

5’ +

HUMAN

ACID

SPHINGOMYELINASE

201

GENE

3’

GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGCGGATCACCTGAGGTCAGG II III II IllI III IIIIIIIIIIllIIIIllIIlIIIlllIIIII III1 IIIII llIIIllIII GCGAMXACGTCACCAAGTACGGACA TTAGGGTCGTGMACCC TCCGGCTCUUXCGTCTAGT ACTCCAGTCC AGTTCGAGACCAGCCTGGCCAACATGGTGGTG~CCCCGTCTCTACTAAAAATACAAAAATTAGCCGGGCGTGGTGG II II IIIIII I IIIIIIlllllIIIllIllII llIllIIllIIIlIlIIIIIIIIIll III IIIIIII TCTAGTTCTGGTAGAACCGGTTGTACCACTTTGGGGTAGMXTGATTTTTATGTTTTTAATCGAC!CC!ACACCACC

CGCGCGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGGAGGCGGAGGTTGCAG II II IIIIII IIIIII IllI IIIIIlllIIIlllI Ill IIIlIIIIIl IIIII IIIII II II GCACGTGGACATCAGGGTCCATGAACCCCCATGnACCCTCCGTC TTAACGAACTTGGGTCCTCCACCTCCGACATC TGAGCCGAGATCGCGCCACTGCACTCCAGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAAAAAAAAAAAAAA lllll lllll I IIIIIIIIIIllllIIIIllII lllllll III III IIIIIIllIIIIIIlllI/II ACTCGACTCTACCACGG TGACGTGAGGTC GGACCCGTTGTCTCGTTCTAAGGTAGWTTTTTTTTTTTTTTTTT 3' + 5' ASM FIG. 3. Ah1 sequence within intron (indicated on the bottom in bold capital nucleotide identity.

2 of the human ASM gene. Bestfit alignment of the Alul letters) and the Alul consensus sequence on top. Vertical

was composed of six exons ranging in size from 77 to 773 bp and five introns ranging in size from 153 to 1059 bp. Exon 2 was unusually large (773 bp) and encoded 258 amino acids, or -44% of the mature ASM polypeptide. Note that exon 3 encoded the type l-specific region, which is alternatively spliced in - 10% of the placental and fibroblast ASM transcripts (Quintern et al, 1989; Schuchman et al., 1991a). The type 2-specific sequence was found at the 5’ end of intron 2 (see Fig. 1). When the nucleotide sequences of the full-length ASM cDNA and genomic sequences were compared, three differences were found within the coding region. Two of these, within codons 322 and 506, were previously identified as ASM polymorphic sites (Schuchman et aZ., 1991a,b). The third difference was a 6-bp deletion (CTGGTG) near the 5’ end of the ASM genomic sequence (nt 103 to 108 in the full-length ASM cDNA; Schuchman et al., 1991a), which predicted a deletion of two amino acids (leucine and valine) within the putative ASM signal peptide. In addition, a difference in intron 2 was found between the sequence determined from the

element within intron 2 of the ASM gene lines between the two sequences represent

ASM genomic clone and a partial genomic sequence previously determined by PCR amplification and sequencing (Schuchman et al., 1991a). The length of the poly(T) tract in the genomic clone (beginning at nt 1881) was 30 nt, whereas in the sequence determined by PCR amplification the poly(T) tract was 23 nt. In intron 2 of the ASM genomic region there was a 291-bp sequence that had more than 90% nucleotide identity with the Alul consensus sequence (Britten et al., 1988) (Fig. 3); this Ah element was inserted in the reverse orientation relative to the ASM coding sequence. The 210-bp region upstream from the first in-frame ATG did not contain CAAT or TATA motifs; however, the sequence was GC rich (-66%) and contained one Spl binding site (nt -150 to -155). IntronlExon

Junctions

Each of the intron/exon junctions within the ASM gene had the gt/ag consensus donor/acceptor splice site sequences (Table 1). However, within donor splice site D3 there was a G to A transition at position -1 (underTABLE 1 lined in Table l), which is conserved in >90% of mamIntron/Exon Junctions in the Human ASM Gene” malian donor splice sites (Mount, 1982). Notably, this donor splice site was at the end of exon 3, which encoded Exon Intron Exon size 5’ Donor size 3’ Acceptor the 172-bp type l-specific region. This exon was alternanumber (nt) Codons splice site splice site (nt) tively spliced in about 10% of the ASM transcripts (Quintern et al., 1989; Schuchman et al., 1991a). At the 3’ 1 398 l-104 AG gtgagc (Dl) 464 tag 4 (Al) end of the type 2-specific region within intron 2 (see Fig. 773 105-362 AG gta@t (D2) 1059 2 tag 4 (A21 3 171 362-419 A& gtgagg (D3) 228 ias G (A31 1) there is a sequence, aag gtgaat, which may serve as a 4 77 420-446 AG gtaggc (D4) 201 tag G (A4) cryptic donor splice site. Thus, the occurrence of the 446-494 TG gtgagt (D5) 153 tag G (A5) 5 145 type 2-specific transcripts may be explained by alterna6 778 495-630 tive splicing of exon 3 (due to the weak D3 donor splice site) and use of the cryptic splice site adjacent to the type ’ The underlined residues represent divergences from the consensus sequences. 2-specific sequence.

202

SCHUCHMAN -1116

-1096

ccctcttcct

tacctagtcc

-1016 aacaaggaaa

tgatcgtgta

cctgcgatcc

aatcattcag

acctaaagac

TATA -796

-916 ttcacggacc

caaccacgaa

-716

-1036

agtccacact

aatataagaa

-956

attgccgaaa

ccctctctcc

attdtccatg

cctccccttc

-936

agagccctca

tccttccggt

ctgtgtggaa

ttccgaattg

-836

-956

tcg dqqttat

ccaaaacatc

aagaatccca

acaacactct

taacttctaa

taattaatat

CbAT/TATA -776

ccggtagtat

ttgttgaggt

-696

gtacadCdggggtgc

tccccagaca

-976

tttggtgcJmdaacccat

AL. -1056

atagtacact

-976

actacccaga

-896

-916

a

-1076

cttgtctaat

-996

tttttttggc

ET

-756

atttacgagc

gaaaatgaca

-676

cgatatacca

gaaatgccga

-736

gcacctatgt

gcctccaccc

-656

aggatcagag

aagtggtaga

tctgggttaa

cccaagggcg w

-636

gattccaaac

aaaggagtag

acttagtgtc

cactcactaa

CAAT

-616

-596

gagtcccact

gagttctccc

-576

gaccacgt

ctcgta

ccttccgcgg

-536

aaggacagtc

tccgttgtgt

gggtcaccct

cctaggacgg

NF-1

YFi

-516

-496

gcgagacgag

-556

cccttagctc

agaggatgga

-476

ggggaagggc

ccgatttgcg

-456

accccagcca

g_accgtcgat

-436

aaggccctta

gactcgcgcc

taagactgtt

tcctctgcag

NF-1 -416

-396

aaggtggctg

gtgtagtgta

-316

-376

cctcgaggtt

tcgtgctcgt

-296

gggggcgtcg

ggcacggggc

-356

gccggagggc

cccgacactg

-276

cccgtcccgc

ccccgtccct

-336

gagttccgcc

tcggggacca

-256

ctccccc

cc

ttagcccc

ctggagtccc

tctcaggggt

-236

c cagggccctc

gcggggcggg

ggaggcggag

gcgtcgcaac

-216 twcg FIG. 4. Putative promoter region of the human ASM gene. Spl lined) binding sites are indicated. Where appropriate, the orientation

Analysis

of the

ASM Promoter

(-),

Region

Because no TATA or CAAT promoter elements were identified in the 210 bp upstream of the first in-frame ASM initiation codon, further DNA sequencing of the pASMg-1 genomic insert was performed. Figure 4 shows an additional 906 nt of upstream sequence. The putative promoter elements identified included four SP-1 binding sites (nt -256 to -261, -267 to -273, -285 to -290, and -715 to -720), two TATA boxes (nt -863 to -868 and -894 to -898), two CAAT boxes (nt -704 to -709 and -867 to -872), one APl site (nt -604 to -609), and two NF-1 sites (nt -461 to -466 and -583 to -587). Overall, this region was GC rich (-63%), suggesting that it was a component of an HTF island. ORFs in the Human

ASM

Gene

In addition to sequences coding for the ASM polypeptide, three other open reading frames (ORFs) were identified in this genomic sequence that may encode functional proteins (Fig. 2). The predicted polypeptides contained 101, 104, and 158 amino acid residues, respectively. The transcriptional orientations of ORF 1 (Fig. 1, nt 176 to 485) and ORF 2 (Fig. 1, nt 753 to 1067) were opposite those of ASM, and the predicted proteins shared no homology with ASM or any other proteins in the Swiss-Prot protein database. In contrast, ORF 3 (Fig. 1, nt 2517 to 2998) was in the same transcriptional orientation and coding phase as the ASM gene. This ORF began within intron 2, overlapped ASM exon 3, and extended into intron 3. DISCUSSION

In this communication the genomic organization and complete nucleotide sequence of the gene encoding hu-

TATA (boxed), of the consensus

CAAT (boxed), NF-1 (underlined), sequence is indicated by arrows.

and Ap-1

(under-

man ASM are described. This housekeeping gene is small (i.e., about 5 kb) and the coding region is divided into six exons. Analysis of the genomic sequence documented the occurrence of alternative splicing at the ASM locus (Quintern et al., 1989; Schuchman et al., 1991a) and further clarified the molecular mechanisms underlying these alternative transcripts. The type l-specific 172-bp sequence was encoded by exon 3, whereas the type 2-specific 40-bp sequence was located at the 5’ end of intron 2, followed by a potential cryptic donor splice site. Furthermore, there was a poor donor splice site (D3; AAA gtgagg) at the exon 3/intron 3 junction. Thus, the occurrence of the type 2 and 3 ASM transcripts resulted from the fact that in about 10% of the ASM transcripts the donor site D3 was not functional, and splicing proceeded either to the cryptic donor splice site (indicated by the overline in Fig. 1) or to donor site D2. The G to A transition of the nucleotide immediately adjacent to the invariant gt consensus dinucleotide in D3 (underlined in Table 1) may cause these alternative splicing events, since this alteration was previously shown to be the cause of abnormal splicing in the procul(1) collagen gene, resulting in Ehlers-Danlos syndrome type VII (Weil et al., 1989). A single Alul element was found within the ASM genomic region. This Alul element may be placed into the “a branch” according to the classification of Jurka and Smith (1988), indicating the ancestral nature of the ASM gene. In addition, three other long ORFs were identified within the ASM genomic region. Although it is not known whether these genomic sequences are transcribed into functional RNAs, there is some precedent for overlapping transcriptional units within lysosomal enzyme genes. For example, within the first intron of the

THE

HUMAN

ACID

SPHINGOMYELINASE

TABLE Genes Gene a-N-Acetylgalactosaminidase Acid phosphatase Acid sphingomyelinase a-Galactosidase A @-Glucosidase P-Glucuronidase P-Hexosaminidase a-chain B-Hexosaminidase P-chain

Encoding

2

Lysosomal

Symbol

Chromosomal location

NAGA ACP2 SMPDl GLA GBA GUSB HEXA HEXB

22q13-qter llpll llp15.1-p15.4 xq21.3-q22 lq21 7q21.2-q22 15q23-q24 5q13

murine ,&glucuronidase structural gene there is an RNA polymerase II promoter motif that drives transcription of an -2.2-kb liver transcript that shares little homology with @-glucuronidase (Wang et al., 1988). To date, the function of this transcript remains unknown. A number of putative regulatory elements were identified within the upstream -1 kb of the ASM gene. This region was GC rich and contained five SPl binding sites. In addition, TATA, CAAT, APl, and NF-1 binding sites were identified. Because the precise site of transcription initiation has not been determined for ASM, no conclusions can be drawn about the functional relevance of these sequences. However, the fact that these regulatory sequences are within 1 kb of the ASM coding region suggests that they comprise all or part of the ASM pro-

Transcription start site(s)

Gene a-N-Acetylgalactosaminidase Acid

phosphatase

Acid

sphingomyelinase

a-Galactosidase

--6

Nucleotides upstream

to -23

nd

A

-60

--657

@Glucuronidase

o Transcriptional

Regions

-347”

fl-Glucosidase

P-Hexosaminidase

Promoter

a-chain start

Enzymes Number of exons

Gene

9 11 6 7 11 12 14 14

13,709 -9 kb 4,708 12,436 6,877 -21 kb -35 kb -40 kb

of Genes

Encoding

analyzed from ATG

Percentage GC

Lysosomal

sp-1

-678 -644 -161 -394 -456 -704 -867

-364 -410 -60 -326

IR

-354 -445

-150 -256 -267 -285 -715 -63 -207

APl NFl

-604 -461 -583

API OCTA

-153 -835 -889 +70 +78 -274 -290 -307 -323

56

none

590

59

none

1117

63

-863 -894

1179

59

-86 -93 -102 -129

200

72

nd

480

66

-247

site for the 3.6-kb

transcript

Enzymes CAAT

-30 -126

start

Wang and Desnick, 1991 Geier et al., 1989 Schuchman et al., 1991 Kornreich et al., 1989 Horowitz et al., 1989 Miller et al., 1990 Proia and Soravia, 1987 Proia, 1988

TATA

1400

1011

transcript;

Ref.

3

-680 -691 none

site for the 2.2-kb

size

moter. This is supported by the fact that transgenic mice containing the human ASM genomic region, including about 1.5 kb of upstream sequences, express human ASM activity at high levels (Schuchman et al., unpublished results). Clearly, further studies (e.g., in vitro mutagenesis and expression experiments) are required to definitively map the ASM control region and determine the significance of these putative regulatory sequences. ASM is the eighth human lysosomal gene for which the genomic organization has been determined and the fourth to be completely sequenced (Table 2). In addition to ASM, sequences encoding cu-N-acetylgalactosaminidase (Wang and Desnick, 1991), acid phosphatase (Geier et aZ., 1989), a-galactosidase A (Kornreich et al., 1989), fl-glucosidase (Reiner et al., 1988; Horowitz et cd.,

TABLE Putative

203

GENE

unknown.

-71 -104 -146 -178 -203

-754 -760 none

-287

Other

c-fos enhancer Chorion enhancer

none -248 -64 f26 nd

AP2

-164

204

SCHUCHMAN

1989), fi-glucuronidase (Miller et al., 1990; Shipley et al., 1991), and the LY and /3 chains of fi-hexosaminidase (Proia and Soravia, 1987; Proia, 1988) have been reported. Of these, nucleotide sequences of the promoter regions are available for seven (Table 3). Aside from the fact that all of these upstream sequences are GC rich, indicating that they may be components of HTF islands, analysis of these regions has not provided any consensus sequence for a lysosomal gene-specific promoter element. Although the promoter regions of many housekeeping genes are GC rich and lack TATA and/or CAAT motifs, the genes encoding a-galactosidase A, ,&hexosaminidase a-chain, /3-glucosidase, and ASM contained these consensus sequences. To date, mutagenesis and expression studies have been performed for three lysosomal gene promoter regions, /3-glucosidase (Horowitz et al., 1989), P-glucuronidase (Shipley et al., 1991), and acid phosphatase (Geier et al., 1989). In the human @-glucosidase gene, a 650-bp genomic fragment containing the putative control region (including two TATA and two CAAT motifs) was inserted upstream from the bacterial chloramphenicol acetyltransferase (CAT) gene and transfected into various human cells. The functional integrity of this regulatory region was demonstrated and, surprisingly, tissue-specific expression was observed. For human fl-glucuronidase, deletion analysis of minigene constructs demonstrated that the 200 bp of sequence upstream from the translation initiation site was sufficient for maximal expression in COS cells. This region was GC rich, but did not contain TATA or CAAT elements. For human acid phosphatase, a 590-bp upstream region that was GC rich and lacked a TATA element was shown to possess promoter activity by expression analysis of CAT constructs in COS cells. In summary, the gene encoding human ASM has been isolated, sequenced, and characterized. These studies have confirmed the nature of alternative splicing at the ASM locus and should facilitate further characterization of this important lysosomal hydrolase. In addition, the availability of the ASM genomic sequence should facilitate further analysis of the mutations that cause Types A and B NPD. To date, analysis of Ashkenazi Jewish patients with Types A and B NPD has revealed two mutations within the ASM gene, R496L and AR608 (Levran et al., 1991a,b). Interestingly, both of these mutations are in exon 6, suggesting that this region may encode all or part of the ASM catalytic site. Clearly, future studies of the ASM mutations causing NPD will provide further insights into the functional organization of the ASM polypeptide. ACKNOWLEDGMENTS The authors thank Dr. Tsutomu Takahashi for assisting in the DNA sequencing and Dr. David Bishop for computer analysis of the ASM promoter region. This work was supported by March of Dimes Basic Research Grant 1-1224, American Cancer Society Basic Investi-

ET

AL.

gation Grant CD-62521, Research Grant 1 ROl HD28607 from the National Institutes of Health, and Grant 5 MO1 RR00071 for the Mount Sinai General Clinical Research Center from the National Center for Research Resources, National Institutes of Health.

REFERENCES Barenholz, Y., Roitman, A., and Gatt, S. (1966). Enzymatic hydrolysis of sphingolipids. II. Hydrolysis of sphingomyelin by an enzyme from rat brain. J. Biol. Chem. 241: 3’731-3737. Barenholz, Y., and Gatt, S. (1982). Sphingomyelinases. In “Phospholipids” (J. N. Hawthorne, J. B. Ansell, and R. M. C. Dawson, Eds.), pp. 129-177, Elsevier, New York. Brady, R. O., Kanfer, J. N., Mock, M. B., and Fredrickson, D. S. (1966). The metabolism of sphingomyelin. II. Evidence of an enzymatic deficiency in Niemann-Pick disease. Proc. Natl. Acad. Sci. USA 55: 366-370. Britten, R. J., Baron, W. F., Stout, D. B., and Davidson, E. H. (1988). Sources and evolution of human Alu repeated sequences. Proc. Natl. Acad. Sci. USA 86: 4770-4774. Geier, C., Von Figura, K., and Pohlmann, R. (1989). Structure of the human acid phosphatase gene. FEBS Lett. 13: 611-616. Horowitz, M., Wilder, S., Horowitz, Z., Reiner, O., Gelbart, T., and Beutler, E. (1989). The human glucocerebrosidase gene and pseudogene: Structure and evolution. Genomics 4: 87-96. Itakura, K., Rossi, J. J., and Wallace, R. B. (1984). Synthesis and use of synthetic oligonucleotides. Annu. Reu. Biochem. 53: 323-356. Jurka, J., and Smith, T. (1988). A fundamental division in the Alu family of repeated sequences. Proc. Natl. Acad. Sci. USA 85: 47754778. Koval, M., and Papano, R. E. (1991). Intracellular tabolism of sphingomyelin. Biochem. Biophys.

transport Acta 1082:

Kornreich, R., Desnick, R. J., and Bishop, D. F. (1989). sequence of the human a-galactosidase A gene. Nucleic 17: 3301-3302.

and me113-125. Nucleotide Acids Res.

Levran, O., Desnick, R. J., and Schuchman, E. H. (1991). NiemannPick disease: A frequent missense mutation in the acid sphingomyelinase gene of Ashkenazi Jewish type A and B patients. Proc. Natl. Acad. Sci. USA 88: 3748-3752. Levran, O., Desnick, R. J., and Schuchman, E. H. (1991). NiemannPick type B disease: Identification of a single codon deletion in the acid sphingomyelinase gene and genotype/phenxpressiootype correlations in type A and B patients. J. Clin. Znuest. 88: 806-810. Miller, R. D., Hoffmann, J. W., Powell, P. P., Kyle, J. J. M., Bachinsky, D. R., and Sly, W. S. (1990). Cloning terization of the human P-glucuronidase gene. Genomics Mount, S. M. (1982). A catalogue of splice site junction Nucleic Acids Res. 10: 459-472. Pereira, L., Desnick, R. J., Adler, D., Disteche, C. M., and E. H. (1991). Regional assignment of the human acid elinase gene by PCR analysis of somatic cell hybrids hybridization to llp15.1-~15.4. Genomics 9: 8531-8539.

W., Shipley, and charac‘7: 280-283. sequences. Schuchman, sphingomyand in situ

Proia, R. L. (1988). Gene encoding the human P-hexosaminidase pchain: Extensive homology of intron placement in the cy and P-chain genes. Proc. Natl. Acad. Sci. USA 95: 188331887. Proia, R. L., and Soravia, E. (1987). Organization of the gene encoding the human @-hexosaminidase a-chain. J. Biol. Chem. 262: 56775681. Quintern, L. E., Weitz, G., Nehrkorn, H., Tager, J. M., Schram, A. W., and Sandhoff, K. (1987). Acid sphingomyelinase from human urine: Purification and characterization. Biochim. Biaphys. Acta 922: 323-336. Quintern, L. E., Schuchman, E. H., Levran, O., Suchi, M., Ferlinz, K., Reinke, H., Sandhoff, K., and Desnick, R. J. (1989). Isolation of cDNA clones encoding human acid sphingomyelinase: Occurrence of alternatively processed transcripts. EMBO J. 8: 2469-2473.

THE

HUMAN

ACID SPHINGOMYELINASE

Reiner, O., Wigderson, M., and Horowitz, M. (1988). Structural analysis of the human glucocerebrosidase genes. DNA 7: 107-116. Sambrook, J., Fritsch, E. F., and Maniatis, T. A. (1989). “Molecular Cloning: A Laboratory Manual,” Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Sanger, F., Nickelson, J., and Coulson, A. R. (1977). DNA sequencing with chain terminating inhibitors. Proc. Natl. Acad. Sci. USA 74: 5463-5467. Schneider, P. D., and Kennedy, E. P. (1967). Sphingomyelinases in human tissues. III. Expression of Niemann-Pick disease in cultured fibroblasts. J. Lipid Res. 8: 202-208. Schuchman, E. H., Suchi, M., Takahashi, T., Sandhoff, K., and Desnick, R. J. (1991a). Human acid sphingomyelinase: Isolation, nucleotide sequence and expression of the full-length and alternatively spliced cDNAs. J. Biol. Chem. 66: 8531-8539. Schuchman, E. H., Levran, O., Suchi, M., andDesnick, R. J. (1991b). An MspI polymorphism in the human acid sphingomyelinase gene (SMPDl) at llp15.1-~15.4. Nucleic Acids Res. 19: 3160. Shipley, J. M., Miller, R. D., Wu, B. M., Grubb, J. H., Christensen,

GENE

205

S. G., Kyle, J. K., and Sly, W. S. (1991). Analysis of the 5’ flanking region of the human @-glucuronidase gene. Genomics 10: 1009-1018. Spence, M. W., and Callahan, J. W. (1989). Sphingomyelin-cholesterol lipidoses: The Niemann-Pick group of diseases. In “The Metabolic Basis of Inherited Diseases” (C. R. Striver, A. L. Beaudet, W. S. Sly, and D. Valle, Eds.), 8th ed., pp. 1655-1676, McGraw-Hill, New York. Wang, A., and Desnick, R. J. (1991). Structural organization and complete sequence of the human a-N-acetylgalatosamindase gene: Homology with the cu-galactosidase gene provides evidence for evolution from a common ancestral gene. Genomics 10: 133-142. Wang, B., Korfhagen, T. R., Gallagher, P. M., D’Amore, M. A., McNeish, J., Potter, S. S., and Ganschow, R. E. (1988). Overlapping transcriptional units on the same strand within the murine fl-glucuronidase gene complex. J. Biol. Chem. 263: 15841-15844. Weil, D., D’Alessio, M., Ramirez, F., De Wet, W., Cole, W. G., Chan, D., and Bateman, J. F. (1989). A base substitution in the exon of a collagen gene causes alternative splicing and generates a structurally abnormal polypeptide in a patient with Ehlers-Danlos syndrome type VII. EMBO J. 8: 1705-1710.

Structural organization and complete nucleotide sequence of the gene encoding human acid sphingomyelinase (SMPD1).

Acid sphingomyelinase (ASM; HGMW-approved symbol, SMPD1) is the lysosomal phosphodiesterase that hydrolyzes sphingomyelin to ceramide and phosphocholi...
938KB Sizes 0 Downloads 0 Views