GENOMICS

8, 371-379

(1990)

Human Ul-70K Ribonucleoprotein Antigen Gene: Organization, Nucleotide Sequence, and Mapping to Locus 19q13.3 RICHARD A. SPRITz,*,t KATHLEEN STRUNK,* CAROL 5. SUROWV,* AND HARVEY W. MOHRENWEISERS Departments

of *Medical Genetics and t Pediatrics, University of Wisconsin, Madison, Wisconsin 53706; and *Biomedical Sciences Division L-452, Lawrence Livermore National Laboratory, Livermore, California 94550 Received

April

16, 1990;

revised

June 15, 1990

and each other in an ordered manner. Each snRNP is composed of one or two discrete small nuclear RNAs (U snRNAs), a group of at least seven “core” polypeptides common to all U snRNPs, and several specific proteins. Ul snRNP, which mediates recognition of 5’ splice sites, consists of the 164-nucleotide (nt) Ul snRNA, the core snRNP proteins, and three specific proteins: Ul-70K, Ul-A, and Ul-C (Bringmann and Ltihrmann, 1986). Ul-70K (Query et al., 1989; Surowy et al., 1989) and Ul-A (Scherly et al., 1989) specifically bind Ul snRNA. In addition to their probable roles in pre-mRNA splicing, these snRNP proteins are of direct medical importance: the snRNP core proteins constitute the “Sm” antigens and the Ul-70K protein constitutes the principal “RNP” antigen recognized by autoimmune sera from patients with systemic lupus erythematosus, mixed connective tissue disease, and other rheumatic diseases (reviewed in Tan, 1989). Isolation of cDNAs encoding the 52-kDa human Ul-70K protein (Theissen et aZ., 1986; Spritz et al., 1987) permitted definition of the highly unusual structure of this protein and mapping of its gene to chromosome 19. Ul-70K is a sequence-specific RNAbinding protein that specifically binds the loop I region of Ul snRNA (Surowy et al., 1989). The Ul-70K polypeptide shares with a number of other nuclear RNA-binding proteins a particular 100-amino-acid segment, the “RNP sequence domain” (reviewed in Dreyfuss et al., 1988), which constitutes the binding site for Ul snRNA (Query et al., 1989; Surowy et al., 1989). We previously isolated a number of different Ul-70K cDNA species and suggested that the corresponding different mRNAs are produced by alternative splicing of a single Ul-70K pre-mRNA (Spritz et al., 1987). Surprisingly, several of these Ul-70K cDNAs included within the RNP consensus domain an extra alternative exon containing an in-frame translational termination codon. The truncated

We have isolated and sequencedthe gene encoding the human Ul-70K snRNP protein. Ul-70K is an RNA-binding protein that is a speciilc componentof the Ul small nuclear ribonucleoprotein complex (snRNP) and constitutes the major antiRNP autoimmune antigen. We have mapped the Ul-70K gene to the distal portion of chromosome19, at band q13.3. The gene is greater than 44 kb in size and consistsof 11 exons. The general structure of the genehasbeencompletely conserved during vertebrate evolution and accounts for the production of several different Ul-70K mRNA speciesby alternative pre-mRNA splicing. Comparisonof the predicted amino acid sequencesof animal Ul-70K proteins reveals a high degree of conservation, particularly in the region of the RNP consensusdomain. Even more striking is the complete conservation of the nucleotide sequence of an alternative included/exeludedexon containing an in-frame translational termination codon. This conservation also includessignificant portions of the downstream intervening sequence. This extraordinary conservation at the nucleotide sequencelevel suggeststhat alternative splicing of this exon serves an important function, perhaps in regulating the production of functional Ul-70K protein. Q iaao Academic press, IDC.

INTRODUCTION Splicing of mRNA precursors (pre-mRNAs)l to mRNAs takes place in a large nuclear ribonucleoprotein (RNP) complex, the spliceosome (reviewed in Green, 1986; Steitz et aZ., 1987). The spliceosome contains several small nuclear ribonucleoprotein complexes (snRNPs) that interact with the pre-mRNA Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under Accession No. 504772. i Abbreviations used: bp, base pair(s); cDNA, DNA complementary to mRNA; IVS, intervening sequence; kb, kilobase pair(s); kDa, kilodalton; nt, nucleotide(s); pre-mRNA, mRNA precursor; RNP, ribonucleoprotein; snRNA, small nuclear RNA, snRNP, small nuclear ribonucleoprotein. 371

All

Copyright 0 1990 rights of reproduction

osss-7543/90 $3.00 by Academic Press, Inc. in any form reserved.

372

SPRITZ

Ul-70K protein encoded by the corresponding mRNAs would not be able to bind Ul snRNA. Here, we describe the isolation and characterization of the human Ul-70K gene and its precise localization to the distal portion of segment 19q13.3. The gene is more than 44 kb in size and consists of 11 exons. This general structure of the Ul-70K gene has been conserved through vertebrate evolution and is completely consistent with the alternatively spliced mRNAs that we predicted on the basis of observed cDNAs (Spritzetal., 1987). Furthermore, the alternative included/excluded exon has been completely conserved, along with the downstream intron sequences, suggesting that this segment may have an important function, perhaps in regulating the production of functional Ul-70K protein. MATERIALS

Isolation

of Ul-70K

AND METHODS

Genomic Clones

Preliminary genomic mapping was carried out by Southern blot hybridization (Southern, 1985) using as probe the almost complete Ul-70K cDNA FL1.7 (Spritz et al., 1987). Ul-70K genomic recombinants were isolated by screening two normal human genomic libraries with this probe. The first consisted of DNA partially digested with EcoRI, size-selected in the range 15-20 kb, and cloned in phage X Charon 32. The second consisted of sorted human chromosome 19 DNA partially digested with Mb01 and cloned in X Charon 40. Both phage libraries were propagated in Escherichia coli K802 recA.

Mapping and DNA Sequence Analyses of Ul-70K Clones Recombinant phage were mapped by restriction enzyme digestion and Southern blot hybridization using as probes all or parts of the FL1.7 Ul-70K cDNA (Spritz et aZ., 1987). Specific Ul-70K gene fragments were then subcloned in M13mp18, M13mp19, or various plasmid vectors for detailed mapping and DNA sequence analyses. DNA sequences were determined by the method of Sanger et al. (1977) using either single-stranded or double-stranded DNA templates.

Primer

Extension

Analysis

To define the 5’ terminus of human Ul-70K mRNA, a 5’-end-radiolabeled 21-mer oligonucleotide (5’-CTGATTCCGTCGCCCGCTAAG-3’) complementary to the 5’ portion of Ul-70K mRNA was annealed to 6 pg of polyadenylated HeLa cell mRNA, extended exactly as described (Kingston, 1987), and

ET AL.

the radiolabeled primer extension products were electrophoresed in a 6% sequencing gel and autoradiographed.

Gene Mapping The human Ul-70K gene was mapped by in situ hybridization as previously described (Mohrenweiser et al., 1989), using as probe biotinylated XhUl-70K.5, a recombinant phage containing the 3’-terminal 15.8-kb EcoRI fragment (Fig. 1). RESULTS

Isolation

of Human

Ul-70K

Genomic Clones

The human Ul-70K gene was first mapped by Southern blot hybridization analysis of genomic DNA (Southern, 1985). The virtually complete’ Ul-70K cDNA, FL1.7 (Spritz et al., 1987), hybridized to four EcoRI fragments of human genomic DNA, sized approximately 16, 13, 10, and 7 kb (data not shown). Hybridizations with probes consisting of specific fragments of the FL1.7 cDNA showed that the 16-kb EcoRI fragment contains the 3’ half of the Ul-70K gene and the 13-kb fragment contains the 5’ part of the gene. To clone the human Ul-70K gene, we first screened an EcoRI partial digest library size-selected in the range 15-20 kb. From this library we isolated clones that contained the 3’-terminal 15.8-kb EcoRI fragment but no clones that contained the other EcoRI fragments. Therefore, we next screened a library of sorted chromosome 19 DNA partially digested with MboI. The library was repeatedly screened with different specific fragments of the FL1.7 cDNA to ensure isolation of all of the exons. As shown in Fig. 1, screening of the two libraries yielded a total of seven independent recombinants. The recombinant phage were mapped by Southern blot hybridization analyses (Southern, 1975) using various fragments of FL1.7 as probes, and the inserts were found to assort into three nonoverlapping clusters, spanning a total of 65 kb of genomic DNA. Two gaps in the Ul-70K gene, within the 2nd and 6th introns, are not covered by clones. The minimum size of the human Ul-70K gene is thus 44 kb.

Mapping

of the Human

Ul- 70K Gene

Using somatic cell hybrids, we previously mapped the human Ul-70K gene to chromosome 19 (Spritz et al., 1987). To localize the gene more precisely, we carried out in situ hybridization analysis using as probe XhUl-70K.5, a Ul-70K genomic phage clone containing the 3’ end of the gene. A total of 10 chromosome spreads were scored. As shown in Fig. 2,80%

HUMAN AhUl-70K

Ul-70K

373

GENE

IO

hhUl-70K15 htwl-70K

9

hhUl-70K

hhUl-70K

2

7 AhUl-70K

8

AhUl-70K

5

1 kb

FIG. Vertical A 1-kb

1. Structure bars indicate size standard

of the human Ul-70K the exons (numbered). is indicated.

gene. A composite map and the extents of the various Discontinuities in the map indicate gaps not represented

of all storable signal was associated with chromosome 19. For the 20 chromosomes 19 examined, in 15 cases signal localized to q13.3 on both chromatids. In four cases signal localized to only one chromatid: one at ~13.2, one at q13.3, and two at q13.4. Two chromosomes 19 did not yield storable signal. All background signals involved only one chromatid of the pair, and each non-19 signal was observed only once. Thus, more than 70% of all storable signal localized at 19q13.3.

Nucleotide

Sequence of the Human

Ul-70K

Gene

DNA sequence analysis (Fig. 3) showed that the human Ul-70K gene consists of 11 exons. Exon l(311 nt) includes only 5’-untranslated sequence and extends about 100 bases beyond the 5’ end of the most extensive Ul-70K cDNA (Spritz et aZ., 1987). Exon 2 (157 nt), which contains the translational initiation codon, exon 3 (63 nt), exon 4 (55 nt), exon 5 (65 nt), exon 6 (63 nt), exon 7 (82 nt), exon 8 (60 or 72 nt), exon 9 (102 nt), and exon 10 (88 nt) are all relatively short, whereas exon 11 (approx 790 or 817 nt) is considerably longer, representing almost half the length

p13.3 f-7

p13.1 PI2 Pll q11 q12 q13.1

q13.3

it.............

q13.4 u

19

FIG. 2. Idiogram of chromosome 19 illustrating the location of fluorescent hybridization signals obtained following in situ hybridization with XhUl-70K.5 recombinant phage. Details of the analysis are given in the text.

h human Ul-70K clones are shown. by X clones. E, EcoRI restriction sites.

of Ul-70K mRNA. The size range of the 10 intervening sequences is also very broad, from 82 bp (IVSlO) to more than 22 kb (IVSB). Within the Ul-70K coding region, the RNP sequence domain (Dreyfuss et al., 1988), which is necessary for binding Ul snRNA (Query et al., 1989; Surowy et al., 1989), is divided into four exons (5,6,7, and 9); the most highly conserved segment, the “RNP consensus sequence,” occurs in exon 7. As discussed below, exon 8, an alternatively spliced included/excluded exon present in about one-third of Ul-70K cDNAs (Spritz et al., 1987), contains an in-frame translational stop codon. Thus, inclusion of exon 8 in spliced Ul-70K mRNA would result in termination of translation within the RNP sequence domain, producing a truncated Ul-70K polypeptide that could not bind Ul snRNA. The two prominent Arg-Asp/ Arg-Ser domains in the carboxyl half of the Ul-70K protein, the precise functions of which are unknown, are both encoded within exon 11.

5’ End of Human

Ul-70K

mRNA

To locate the 5’terminus of human Ul-70K mRNA, we carried out primer extension analysis of poly(A)+ mRNA from HeLa cells using a 21-nt primer complementary to nts 535-555 within the first exon (CTGATTCCGTCGCCCGCTAGG). As shown in Fig. 4, we detected a single extension product, sized approximately 198 nt. This suggests that the 5’ end of human Ul-70K mRNA is at approximately nt 358 in the sequence shown in Fig. 3. Analysis of sequences upstream of nt 358 reveals a number of features suggestive of a functional promoter. First, there are two potential binding sites for the transcription factor SPl (reviewed in Kadonaga et al., 1986), at nts 207 and 271. Second, the entire region from bases 2 to 303 in Fig. 3, except the segment between the two SPl-binding sites, shows homology to the promoter region of the Xenopus Ul-70K gene (Etzerodt et al., 1988), which contains one potential SPl-binding site. Neither the human nor the Xenopus Ul-70K genes contain CCAAT or TATA promoter motifs. Third, the region from bases 1 to 668 of the human Ul-70K gene

ttgctttaatcacgtgctgtacccccaaagcccagacatgcctagcacacacgagatgctcgatgaatagttattgattgaatagattagacatcaacgc

100

aatcaggctgttcacctagggactctttgctaggcggcggagaggtacgttacagttctctacgcaaaacagatgcgcgaaagatcegggaagtgtgStC

200

ggaaagg~gagcatcagggagagtctcgcggacataggcggttcggcgcggaaagcgggaggtggagg~gcttggggcaagcgcgcgcgcgcag

300 400 500

GCGCGGGTGGCTGAGCAGCGGCCTGGTGCGCTCGCTTAGCGGGCGACGG~TCAGACGGACGTGGACGCCCCCGGAGTGGMGCCGMGCAGGAGTTGT~ \ GTTGCTGAGGGGCTGCCGCAGCCGCCGCGAGCCTCCGGACAGACGCCAGAGCGAGGAGGGCGCTACGCgtgagtgagtgtgaCtgegtggCCCgaCgggt

600

gcgggcccgctacccagaa..........1.4-1.9 iI9

770

700

tccctcctcgggtccttgatttccccagaaattcccagggaacctctcCCg i20

kb..........

670

ccccacagctcgtgtcagaggactctgagacttcactcttcaggaacccttttccagccgttatttttgttttctttaccggcctttcacctcCCCCCCC

970

accaagacgtacccttttctccgcaacacaggactgttctctgcccttacagagcctgttttaattccgagactttctcctgtcttcttatttgagagtc

1070

aaggagtacagggcttctctgtgcagacccgatctaacctaaaggctgtcctgcttctgctctcaagACTTGGCAAGATGACCCAGTTCCTGCCGCCC~

INIThrGlnPheLeuProProAs 1170

GCGCCGTACATTCGAGAGTTTGAGgtgagttcactgagcaggccaggaatggtttgggttctggggagcaacgggttggatttgcgagtccagcttCtgg AlaProTyrIleArgGluPheGlu

1270

caattttgcca..........,ZO izel

1344

kb..........

ctgcagaccttgctttgatc)gtgccactcccctccctccactgtgtcaggggaagtgtgcgt ins2

ggatgtttatttttgtagggggtgggggtgggagggttaacctgtttatttccttggtgagtgtcccttgccatttccccagtccttcccaccctgtcac

1444

ctctcttctttgttcccagGACCCTCGAGATGCCCCTCcTCCAACTCGTGCTGAAACCCGAGAGGAGCGCATGGAGAGG~gtatgtcatttttgcttc AspProArgAspAlaProProProThrArgAlaGluThrArgGluGluArgMctGluArgLys

1544 1644

ctgaccccctgttttaccactgtctccagatgatccttgacactagggcacttctatctgcattcacttctccacctcccctttctcctgacagAGACGG ArgArg

GAAAAGATTGAGCGGCGACAGCAAGAAGTGGAGACAGAGCTTAAAATGTgtaagtctctcatccaccatttggctctctcctctcccaaaccctctgtat GluLysIlsGluArgArgGlnGlnGluValGluThrGluLysMtT

1744

tcttcccaggtcatacccaggaccctgctgtctccctaagggctgtgttgtggggaacagggttgttagttcatttgtttagcccctgccacttcacaag

1644

cactgttttggtta..........

1916

3.2

kb..........

is50

gacagatccatctcaaaaaaaaaaaaaaaaaaaaacagttccaggccaggcacggagctc

is59

atgcctgttgtccagcactttgggaggctgattcaggcagatc..........(2.5

kb...

i961

. . . . . ..gtacaccatgtcctccagttagggcctctc i962

1991

ccaatgcaccgttttcttcttgcctctgcccatcgtatttgttttgctacctgagaccatggacaccttcaacttataaaattcctttcttcctgcacag

2091

GGGACCCTCACMTGATCCCAATGCTCAGGGGGATGCCTTCMGACTCTCTTCGTGGCGAGAGTGgtaagtccccagctcctagctcctggaaccccacg

2191

rpAspProHisAsnAspProAs~laGlnGlyAspAlaPhcLysThrLtuPhcValAlaArgVal

2291 tccatctccttgta(lMTTATGACACMCAGMTCCAAGCTCCGGAGAGAGTTTGAGGTGTACGGACCTATCMMGAgtaagtggagtgggtcagggtg AsnTyrAspThrThrGluSerLysLeuArgArgGl~heGluVa1TyrGlyProI1eLysArg tttgaattgggggtgcatggaggggctgtatcctgtatcctgagagggagggagagaggtcca..........Distance

2391 unknown..........gaatt

2455

2450

2451

cccgtcccattggcattcacaactccctcctgtttccctaattccatttgtgtcctcctgactccacgtagaccacccacattggcccttccttgctcaa

2555

cacccttcccagctcctcagcgaccacacttctcggcctggcattggggcgcttgatgaccccgggcctgcccacctttgtggcttcatctcctgggact

2655

ctctgtcatcagtacatgagccagccctgccgaagcgcatgcccagcctcctgccctcctgcccc..........O.32

2727

kb..........aggcctc

2720

2721

tccccagggcctggtgtgcaggctgttgtggggttggtggggagccgccctgtggcaggactcacccctgtccccatgtgccccacagATACACATGGTC

2827 IleHisMetVal 2937

gacctgctctcacttctctgctgccccagcccctcccagtctgtcccctcccccaccctgccacacctgacattagcgatgtctcttctcct~ctccctc

3037

tgtttctgatgtatcttttttttttttatatacgtgttttttaaaaaacaaaaaccaaatcccccacaacgtgtgtgtgtgtgtg~a==tggggaggtaa gtgtcacacgttgaacctcagggcaagaggactggtggggctgggcagctg..........G.12 5106 gCCccccacCCtgttcctcagcgggcagtggggtgataggcatgttgagcgcctgcccctacagttccaaagccccctttgaggctgccgcggcgtccac

FIG. 3. Nucleotide sequence of the human Ul-70K alternative exon segments; exon 8 is nts 3400-3471/3484. flanking sequences. IN1 and TER denote the translational

3137 kb....

gene. Nucleotide positions Uppercase letters indicate initiation and termination 374

. . . . ..agcctcgacaaccccaaccctg ha9

3210 3310

are numbered. Dashed and dotted underlines denote exon sequences and lowercase letters intervening and codons, respectively. Dotted lines indicate gaps in the

~tgggc~gggatgggtgggggAgggtgggcagggctgcggtgtgtccaggggctgccgtattaattg~ctgttcttattgtc~ctgc8gCCACCACGCAG

3410

crThrThrGln ---

CTGGCTTGCAGCTGATGCTTACGCTCCCCTTACMCACCTCAGTGTATGGTTGGTATAAGGgtttgtatC~t~tu((rtc8ctc~gC~CCgC8C8Cg6 LeuAlaCyaSerTER .e a m B e m... .. . .. . .. .

3610

gcccagctt~gccaggcagctcaggagc~gcg8ggcgcccctcctcccgcccc~gtcgcccccgt~gcctccccgc~ccc~tgg...,.....0.14

kb

3694

5&S, . . . . . . . . ..gccg~caggtttct~ggcctccctagccctctttgta8~ttcacacgagatagtcc~ggctttcc~gcgccc6gcttggatg~t~atcct i606

3664

cgtctcccccactct8aggcctccttg~g~tttctttggggtctacc8cgtcctctgcctgtctcc~ggtggtacaggagatgtggttcctctccctctc

3764

ctggctccct~g~~cccccc~cttcccctccctgt8gcttt~gctg~ccccgtggtggtgggtgtggggtctgtgcgcgtgctcaggta~gctt.....

3678

. i676

. ...0.62

kb..........

3967

tggg~gtgt~t~acccgcctctctgctccctgct8g~gtttcctagtg8agatttctcccacccccttctggttc~ttt i679

4067

tctgttctgatgtctgttccccccacactc8cccccctcc~ra8~~~ana~asst~g~~aaa~ccggtgtggtctggggtgngg~gngtcrc~gctggcc

4167 4267 4367 4467 4667 GGCTGGAGGCCCCGGCGGCTAGgtgagcac~tcctgccttcgacgggctctcggggccctggcctggtggccttgttctcccttctctgctgctttctgc GlyTrpArgProArgArgLouG

4667 4777

tgggtt~ctgtgagg~gc~ggggcaccagcctgg~gaggcCt..........2.1

kb . . . . . . . . ..amgcttttccaggcagtgcgagatagagg8gc

i619

49bl

i620 6061 6161 5261 63bl

6461 ggacacgcagccaagagccgcggctcagacgctsgcagggcccgctcttgctgcctctaccaccagttgacgaggagtttggaccttaggctggggacgc

6661 6661 6751 6661 5951

GMGAGGAGGGGCTGATGTGMCATCCGGCATTCAGGCCGCGATGACACCTCCCGCTACGATGAGAGgt88gattgggcgaccggtgtcctggggtgggg

6061

rgArgGlyGlyAlaAspVa1AsnI1eArgHlsSarQlyArgAepAspThrSarArg~rA~pGl~r ggcggtcacggggggagcccagccac~caggtctgcccacctcatccagGCCCGGCCCCTCCCCGCTTCCGCACA~GACCGGGAcCGGGACcGTQAGCG gProGlYProScrProLeuProSHsArgA~pArgAspArgAspArgGluAr -------

6161 6261

GAGCGGAGGCGCTCCAGGGAGCGGAGCMGGACMGGACCGGGACCGGMGCGGCGMGCAGCCGGAGTCGQGAGCGQQCCCGQCGGGAGCOGGAQCQCA

G1~rgArgArgSerArgGluArgSarLysAspLYsAspArgA~~rgLysArgArgSerSerArgSerArgGl~rgAl~rgArgGl~rgGluArgL

6361 6461 65bl 6651 6761 6651

GTAGGTGTCTCATTTGTTCTGGccCCTTGGATTTAA~~ATTAATTTCCTGTTGATAGTGGGctcctcggtttcttcctca~tgctctgttttcc~

tt t

t

6961

cagaaggttgttcctgacttgatagccctgcttgcttggtttggtttccagggtgtacttaaaacttgagcgttaggacatgaggctggctctgccagtagcca

7051

gagctgttgaaagtgggtgaggc

7074

sequence and their approximate sizes. The rightward arrow indicates the approximate polyadenylation signals are overlined; upward arrows indicate the known multiple denotes sequence complementary to the oligonucleotide used for primer extension.

375

location of the cap site. Potential sites of mRNA polyadenylation.

SPl-binding Underlining

and arrow

376

SPRITZ

1 404+

242/238$A

Human U1-70K ribonucleoprotein antigen gene: organization, nucleotide sequence, and mapping to locus 19q13.3.

We have isolated and sequenced the gene encoding the human U1-70K snRNP protein. U1-70K is an RNA-binding protein that is a specific component of the ...
1MB Sizes 0 Downloads 0 Views