The EMBO Journal vol.10 no.9 pp.2627-2634, 1991

Cloning of a yeast U1 snRNP 70K protein homologue: functional conservation of an RNA-binding domain between humans and yeast V.Smith and B.G.Barrell Medical Research Council Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK Communicated by B.G.Barrell

We have cloned and sequenced a gene encoding a yeast homologue of the Ul snRNP 70K protein. The gene, SNPI, encodes a protein which has 30% amino acid identity with the human 70K protein and has a predicted molecular weight of 34 kDa. The yeast and human sequences are more closely related to each other than to other (non-Ul) RNA-binding proteins, but diverge considerably in their C-terminal portions. In particular, SNP1 lacks the charged carboxy terminus of the human 70K protein. A yeast strain, aall5, was constructed in which one allele of the SNPI gene contained a 554 bp deletion. Tetrad analysis of aall5 showed that the SNPI gene is essential for the viability of yeast cells. The complete human 70K gene did not complement snpl, but the lethal snpl mutation was rescued by plasmids bearing a chimera in which over half the yeast gene was replaced with the homologous region of the human 70K gene, including the RNA-binding domain. These results suggest that SNPI encodes a functional homologue of the Ul snRNP 70K protein. Key words: 70K protein/U1 snRNP/Saccharomyces cerevisiae

Introduction The splicing of eukaryotic pre-mRNA involves cleavage at the 5' exon/intron boundary to generate a linear exon and an intron-exon species in a 'lariat' configuration. These intermediates are then converted to a spliced mRNA and a lariat intron structure (reviewed in Maniatis and Reed, 1987; Sharp, 1987; Guthrie and Patterson, 1988; Beggs et al., 1989). The reactions take place in the spliceosome, a complex consisting of a number of components, including small nuclear ribonucleoprotein particles (snRNPs). Each of the different snRNPs, Ul, U2, U5 and U4/U6, consists of at least one small nuclear RNA (snRNA) and several proteins. An early event in the splicing pathway is recognition of the 5' splice site by the Ul snRNP (Mount et al., 1983; Ruby and Abelson, 1988; Seraphin et al., 1988; Seraphin and Rosbash, 1989). The Ul snRNA binds to the 5' splice site by base pairing interactions with the 5' end of the intron (Kramer et al., 1984; Zhuang and Weiner, 1986), in a process which is dependent on the presence of the U 1 snRNP proteins (Mount et al., 1983). Included in the U 1 snRNP

© Oxford University Press

is a set of seven core or Sm proteins, which are common to the other snRNPs (Hinterberger et al., 1983), and three unique proteins: A, C and 70K. The human 70K protein has been shown to bind directly to the first stem-loop at the 5' end of the Ul snRNA (Patton and Pederson, 1988; Query et al., 1989a; Surowy et al., 1989). Sequence alignment of a variety of RNA-binding proteins has revealed a conserved RNA-binding domain (Query et al., 1989b; Bandziulis et al., 1989) which is present in the human 70K protein. The functional importance of this domain for sequence-specific binding of the human 70K protein to the Ul snRNA has been demonstrated by in vitro studies (Query et al., 1989b; Surowy et al., 1989). Comparisons of sequences derived from humans, Xenopus laevis (Etzerodt et al., 1988), mice (Hornig et al., 1989) and Drosophila melanogaster (Mancebo et al., 1990) show that the 70K protein is highly conserved. While the precise role of the 70K protein in splicing is unknown, its unique presence in the Ul snRNP suggests that it is likely to be involved in some function specifically performed by the Ul complex, such as 5' splice site recognition or subsequent interaction with other snRNP complexes. Most of the evidence concerning the properties and function of the 70K protein has come from in vitro studies in mammalian systems. A more tractable organism for in vivo analysis is yeast. The yeast Saccharomyces cerevisiae shares the basic splicing pathway with higher eukaryotes (reviewed in Beggs et al., 1989), but with much more uniformity in intron size and in intron/exon boundary and branch point sequences (Langford and Gallwitz, 1983; Langford et al., 1984; Teem et al., 1984; Thompson-Jager and Domdey, 1987). Homologues have been found for the snRNAs in S. cerevisiae (Ares, 1986; Riedel et al., 1986; Kretzner et al., 1987; Siliciano et al., 1987) and human autoimmune anti-Sm antisera will precipitate snRNAs from S. cerevisiae splicing extracts (Tollervey and Mattaj, 1987), indicating that the Sm proteins are conserved between humans and yeast. To date, however, the only snRNP proteins which have been identified in yeast by genetic analysis are a component of the U5 snRNP (Lossky et al., 1987) and a component of the U4/U6 snRNP (Bjorn et al., 1989; Banroques and Abelson, 1989). Despite overall differences between the Ul snRNAs of humans (164 nucleotides) and S. cerevisiae (568 nucleotides), the sequence and secondary structure of stem -loop 1, the site of binding of the human 70K protein, is highly conserved (Kretzner et al., 1990). One might therefore expect that a homologue to the 70K protein exists in yeast, and this homologue should possess those sequences conserved in RNA-binding proteins. In this paper we present the sequence of a homologue for the 70K protein in S. cerevisiae and provide evidence for the essential nature of the function it performs.

2627

V.Smith and B.G.Barrell

molecular weight of 57 kDa (Etzerodt et al., 1988). The yeast homologue is somewhat smaller, being composed of 300 amino acids with a predicted molecular weight of 34 kDa. Overall, the human and X. laevis proteins have 70% amino acid identity, the human and S. cerevisiae sequences 30 % amino acid identity and the X. laevis and S. cerevisiae sequences 29 % amino acid identity. Screening of the SWISS PROT (Bairoch, 1990) and PIR (George et al., 1986) databases with the program FASTA (Pearson and Lipman, 1988) and alignment with the program AMPS (Barton and Sternberg, 1987) indicate that the yeast protein is more closely related to the human and X. laevis 70K proteins than to other RNA-binding proteins, such as polyadenylatebinding protein and helix destabilizing protein. The human 70K protein and many other RNA-binding proteins share an RNA recognition motif (RRM) which spans 90 amino acids (positions 101-194 in Figure 4) (Query et al., 1989b; Bandziulis et al., 1989). Included in the RRM is the previously identified highly conserved RNP octamer (RNP1) (Adam et al., 1986; Sachs et al,. 1986, 1987) and the six amino acid segment, RNP2 (Dreyfuss et al., 1988). The RNP2 consensus is less well conserved than the RNP octamer, but its location relative to the RNP octamer and the general character of its amino acid composition is nonetheless well preserved (Bandziulis et al., 1989). The overall homology between the yeast and vertebrate proteins across the RRM is quite high. The RNP octamer consensus is completely conserved in all three sequences and another

Results Sequence and chromosomal localization of the 70K homologue gene SNP1 The 70K homologue gene, SNPI, was identified in the course of sequence analysis of chromosome IX of S. cerevisiae from a library of mapped lambda clones supplied by M.V.Olson. The sequence of the 900 nucleotide DNA sense strand of SNPJ and predicted protein coding sequence is shown in Figure 1. Physical map data from M.V.Olson (unpublished data) allow SNPJ to be placed within 40 kb of Sup 17 on chromosome IX (Mortimer et al., 1991). The SNPJ gene was shown to be single copy by digestion with restriction enzymes and hybridization (Figure 2) to a 500 bp DNA probe DPI from the 5' end of the coding sequence (Figure 3). The sizes of the bands produced by each enzyme were the same as those predicted from the 21 kb of contiguous sequence which includes the SNPI gene.

-

Comparison of the predicted protein sequence of the Ul snRNP 70K genes from humans, X.laevis and S.cerevisiae The predicted protein sequences for the human and X. laevis 70K proteins and the S. cerevisiae homologue are aligned in Figure 4. The human 70K protein is 437 amino acids in length and has an actual molecular weight of 52 kDa (Theissen et al., 1986; Query et al., 1989b). The 70K protein from X. laevis consists of 471 amino acids, with a predicted

M N Y N L

ACGCATCTGAAG^AAAAAGAACTGCAGhTAATGACACACAGCAGTCATCG1CTGTCGAAGGA Y

P

D

D V

S

K P R P

R L F

P

70

60

50

40

30

20

10

L S Y

K

R P

T

K

S

AAAAAAMTAATCGGCACCTCAACCTTGATGRATTATAATCTATCCAA 120

110

100

90

80

D Y P Y A K R Q T N P N I

T

G V A N L

GTATCCAGACGACGTGTCGAGACTTCAAGCCAAGGCCCCTTTATCTTACAAAAGACCAACGATTACCCATATGCGAAGAGACAAACAAATCCAAATATCACTGGCGTTGCAAACTT 240

230

220

210

200

190

180

170

160

150

140

130

L S T S L K H Y M E E F P E G S P N N H L Q R Y E D I K L S K I K N A Q L L D R TCCTGA UGGATCTCCAAACaTCTCCAAkAGATACGAAGACATCAAACTTTCCAAGATCA AAAATGCTCAAMTTGTTAGACCG ACTATCAACCT GCT CATAATGAG GAGTTTS 360 350 340 330 320 310 300 290 280 270 260 250

R L

Q N W N P

N V D

P H

I

K D T

D

P

Y R T

I

F I

G

R L

P Y D L D

E

I

E L Q K Y F

GAGACTACAAAATTGGATCCTAACGTTGACCCTCATATCAAGGACACAGATCCCTACAGAACGATATTTATTGGGAGGCTACCATACGATCTTGsCGAAATTGRACTGCAAAAGTATTT V

K F

G

E

I

E

K

I

R

I

V

K D

K

I

T

Q

K S

K G

Y A F

I V F

470

480

K M A F

K E

460

450

440

430

420

410

400

390

380

370

K D P

I

S

S

DPACCCATAAGTAGTAMATGGCATTAAGGA ITAACCCAGAAGK AAAGCTACGCCTTCATAGT TGTTAAGTTVGGCGGIRYTCGDAAAAATTAGGATAGTCAAG 520

510

500

490

560

550

540

530

590

580

570

I G V H R G I Q I K D R I C I V D I E R G R T V X Y F K P R R L G G G ATCTGCVDAGRRTVGCCYGAGTTGG RGAGTAAT GATTG 700 690 680 670 660 650 640 630 620 610 Y

S

N R D

S

R L P

G

R F A

CTATTCCAACAGGAGRAG A D

R Y

750

740

730 G

S

S

T

L

S A S

T

S

N P A E

760

D A R Y

770 R G N

L L

800

790

780 R P

R N Y A P

R L

P R R E

T

S

L G G

S

A A T

P

810

T A A V T

S V Y

720

710 S

S A Y

K S

S

CCGCATTAG 830

820

R G

GGCAGAGG

GAATATGCTACTTCTTCCT

AGCAATCTCAATC

600

R N S

840 R T

CGCTGATAGATACGGCAGTTCCACATTGGhCGCGAGGTACCGTGGRAACAGGCCATTGCTCTCCGCCCCCACTCCTACTGCTGCTGTTACTTCTGTATATAAATCTAGAAACTCACGGAC 850 R

E

S

870

860 P A9 P

K

880

890

900

910

920

930

940

950

960

E A P D Y

TCGAGAGTCTCAACCAGCTCCCAAAAAGCGCCGACTATTGCATATAAGTIAATACTTAAACTCTGCTACTTGTTGTCTATGZ2TTACCCGCACATCAGTTAATTGGATACTTTTmCCC 970

980

990

1000

1010

1020

1030

1040

1050

Fig. 1. Sequence of the DNA sense strand and predicted protein coding sequence of the 70K homologue gene SNPI.

2628

1060

1070

1080

Yeast Ul snRNP 70K protein homologue

substantial region of homology between positions 109 and 119 includes RNP2. Other positions in the RRM which are significantly conserved across 32 RNA-binding proteins are indicated in Figure 4 using the convention of Bandziulis et al. (1989). These conserved amino acids are present in the vertebrate and yeast sequences with the exception in the yeast protein of an aspartate residue instead of glycine at position 175. However, this substitution is also present in other RNAassociated proteins (Bandziulis et al., 1989). The conservation of these well defined motifs and the overall homology in the RRM indicate that the yeast

23.19.46.64.4-

-

2.32.0-

0.56-

Fig. 2. Southern blot of DNA from the yeast strain AB972 probed with the 500 bp DNA probe DPI (Figure 3). The positions of the molecular weight markers (HindIlI digest of lambda DNA) are indicated on the left. Lane 1: XhoI digest, predicted band 1.76 kb. Lane 2: HindIII digest, predicted band 9.54 kb. Lane 3: StI digest, predicted band 0.55 kb. Lane 4: PstI digest, band generated predicted to be >4.8 kb in size; observed size -9 kb. S.-

Srf-

l

homologue possesses the features of an RNA-binding protein. However, the similarity between the yeast and vertebrate sequences extends beyond the RRM and is particularly strong in the glycine-rich region between positions 188 and 215. The minimal set of amino acids from the human 70K protein which is necessary and sufficient for selective precipitation of the U1 snRNA from total HeLa cell RNA includes the RRM and the extra residues encompassed by positions 95-213 in Figure 4 (Query et al., 1989b). In addition, this minimal RNA-binding domain will bind selectively in vitro U1 snRNA which is truncated to its first stem-loop (Query et al. 1989a). The high degree of homology between the yeast and human proteins across the glycine-rich region necessary for specific RNA binding by the human 70K protein suggests that the yeast homologue possesses additional features that are characteristic of proteins which bind to stem -loop 1 of the Ul snRNA. Carboxy terminal to the Ul snRNA-binding region, the yeast and vertebrate sequences diverge. The carboxy terminus of the yeast protein is much shorter, consisting of only 95 amino acids after the U 1 snRNA-binding domain while the human and X. laevis proteins consist of 238 and 269 residues, respectively. The human and X. laevis proteins also have less homology in their carboxy termini than in the preceding sequences, the X. laevis protein being some 31 amino acids longer and much less glycine-rich in the regions encompassed by positions 327-357 and 401-422 of the human 70K protein. However, both proteins have two prominent, arginine-rich charged regions at positions 246-321 and 367-413 (Figure 4). In contrast, the yeast carboxy terminus is only moderately charged, with 20.7% R+K and 8% D+E compared with the carboxy terminus of the human protein (31.1 % R+K and 26.5 % D +E) and X. laevis protein (35.0% R +K and 32.3 % D +E). All three proteins have potential phosphorylation sites (Kishimoto et al., 1985; Woodgett et al., 1986; Marin et al., 1986; Kuenzel et al., 1987), but the serine/threonine content of the carboxy terminus of the yeast protein is somewhat higher than in the human and X. laevis proteins (21 % S +T compared with 9.2 % and 7.8%). The 70K homologue gene SNP1 is essential in yeast To investigate whether the 70K homologue performs an essential role in yeast, the SNPI gene was disrupted. A plasmid was constructed in which a 554 bp StyI fragment

s'

Sria,.

1 DP

Dill

I I I I .! ., .;, 1! .1

I

T ir EcoRV SIi

Fig. 3. Schematic diagram of the SNPI chromosomal disruption, showing the location of the restriction sites, DNA probes (DPI, DP2 and DP3) and oligonucleotide primers (D520, D501, D301 and D320) used in this study. The oligonucleotide primers are not drawn to scale. 2629

V.Smith and B.G.Barrell

F K Y~~ A

~"NPI Y LK ~~RP'S K

r

N E P K P P

P A

P

r

V

';.MERK.RR

PH

R R F~~~~

H H N Q P

C TN PTTY

H

Z

RNP 2

A.

~~TTPRTTT77CIoTPPr7~~~,77R.PHSR

E

ID

C-T

S

RY

D

E

R

R S K

S

R

KR

E K

RE

E

q LaFP

K

K

K I

T V

±K

RF

Y

F

D

KD

A y 17,R~~~~~~~~~~C

A

T SA

.1ZI

(P

C -S P 14P ~~RPF Y A

D

~~~~~~~~~~~~R

3 - '; S S R s Rs A

P

L- SPRIY..

P

LP

-Y5~~'

Is

PR

I

I

G ~~~~~~~~~~~~~~~I

K RNPI

T T TR~~

I

VKKIDGRRIVLvD

RGYAFIEYEHERDN>~SPFKA D

D

v

:81 ~~~

-4

H N

~~~~~~~4

FrG

v

L

--

WD

K M

H1 VY SIKlRI~ PV K

5 K TJ '1"T V..A. R.:V N) Y P 1 T E. S K P R F F E V Y G P 1 K R P Y DElE DL Fl ~~ I 'LVL F L Q KY FJ V KF GEIK

AQ

E

Q Q E V F T

jA7P.Y

KDW-PHND KiIwRQDVNE

I 3

N

APY

y

.,

j'G

R ER DRD Ar AA

V

M

A

E

PT

S

VfP

RUKU

[E

A

E A

A

G

DJDA

" LiP

Q

D

GT

P

P

DCCA QIUI

!.SPrS c

3 91 3 8 IL S E R E R R R D R D R D R G E K C R D K D R D R D R

'I, 7

R E R D R D

R E R

Li

4I FE,A

41~~~~~~~~~~~~~~~~~7

1.9

LTT A

P

E N

A APf

Nm Fig.

of the

Alignment

4.

follows:

tA,GI

tL,I,VI

predicted protein

IF,Yj

Conserved features of the RRM conserved

position

must

have

sequences of the 70K

proteins

from humans, X. laevis and S. cerevisiae. Amino acids

are

grouped

as

IH,R,KiJ D,Ej are

no

(after Taylor, 1986). The RNA recognition motif (RRM) is encompassed by positions 101 -194. indicated under the aligned sequences using the convention of Bandziulis et al. (1989), which states that a

more

than two different amino acids in each

The conserved features of the two RNP domains

are

position

in at least 20 out of the 32 RNA-associated

RNPI

(the RNP octamer): KIR G

DNA

fragment

FlY G/A F V/I

proteins aligned.

X FlY; and RNP2: L/I FlY V/I

G/K

N/G L.

was

deleted from SNP] and

encoding

the

replaced

URA43 gene (Figure

with

a

3). The disrupted gene

was

the plasmid by digestion with the SpeI and SnaBI to produce a linear fragment containing the disrupted snpl gene flanked by 696 bp of 5' sequence and 709 bp of 3' sequence. This was used strain aa 00 to URA to transform the ura3 diploid with the primers D520 and PCR prototrophy. Screening by D320 (Figure 3) identified a clone, named aal1 15, in which

then

excised

from

restriction enzymes

homologous recombination in the SNPJ gene had taken place. Southern blot analysis with the DNA probes DP2 (Figure 3) and DP3 (Figures 3 and 5) confirmed that aa 1 15 had

one

intact

snplAStyl:: structure.

2630

chromosomal

URA3

The

allele

fragment

with

copy the

of

SNPJ

expected

and

one

disrupted

deleted from SNPJ, nucleotides

(which translates to the amino acids Figure positions 20 and 208 in Figure 4), included all of the region with homology to the RRM and all but the last five amino acids of the Ul snRNA-binding domain. Sporulation of the SNPJ -heterozygous diploid aa 1 15 and 157

706 in

between

dissection of 25 tetrads resulted in two viable and two inviable spores in each

case

(Table I).

All of the viable spores

produced colonies which were ura auxotrophs, indicating that they did not inherit the disrupted copy of SNPJ. Microscopic examination of the 50 inviable spores revealed that they underwent four or five rounds of cell division to produce between 20 and 35 cells in the first 24 h after germlination, in all but four cases in which only a single spore was observed. Tetrad analysis of the parent diploid aalOO yielded four viable spores for each of the 10 tetrads dissected.

Yeast U1 snRNP 70K protein homologue

These results indicate that SNPJ is essential for the viability of yeast cells. To confirm this result, a 2305 bp SpeI -SnaBI restriction fragment containing a complete SNPI gene with 696 bp of 5' flanking sequence and 709 bp of 3' flanking sequence was inserted into the centromere plasmid pRS315 (Sikorski and Hieter, 1989) which carries a functional LEU2 gene. This plasmid, pRS3D, was transformed into aa 115. Two independent diploids to which LEU prototrophy was conferred were selected for tetrad analysis. A similar pattern of spore viability was observed for both diploids. A total of 15 tetrads were dissected for this LEU2 strain, named aa 120, and the results are summarized in Table I. The copy number and segregation of the plasmid varied between the individual tetrads, but on average two spores in each tetrad received at least one copy of pRS3D. Fifteen of the 45 viable spores produced colonies which were of URA LEU phenotype, suggesting that the fragment including the intact SNPI gene on pRS3D was sufficient to compensate for the chromosomal snplAStyl:: URA3 disruption. Analysis by PCR with the primers D501 and D301 (Figures 3 and 6) confirmed that these haploid colonies contained an intact and a disrupted SNPI gene. No viable spores of URA leu phenotype were observed, confirming that disruption of SNPI is lethal. The amino terminal half of the SNP1 gene is functionally exchangeable with the homologous region of the human 70K gene The functional significance of the yeast 70K homologue was addressed by substituting sequences in SNPI with sequences from the human 70K gene. The 554 bp Styl fragment in SNPI was removed from pRS3D and replaced with the corresponding region from the human 70K gene (encoding amino acids at positions 20-208 in Figure 4) to create pRS3RB. The amino acid identity between the yeast and human sequences across this region is 32.8%. Sequence analysis confirmed that the 538 bp human 70K gene fragment was in frame with the flanking SNPI sequences and contained no mutations. The strain aa 115 was transformed with pRS3RB to produce aaI30. Tetrad analysis of two independent diploids with LEU prototrophy resulted in a pattern of spore viability similar to that observed for aal20 (Table I). A total of 17 tetrads were dissected, and 12 of these tetrads produced more than two viable spores. Of the 51 viable spores, 17 produced colonies of URA LEU phenotype, and PCR analysis confirmed that these haploid cells possessed one disrupted snpl allele and the 538 bp human 70K gene fragment (Figure 6). This indicates that the pRS3RB chimera was compensating for the chromosomal snplAStyI: URA3 disruption, although spores in which pRS3RB was necessary for viability grew initially at 50-75% of the rate of their SNPI sister spores, an effect which was not observed with pRS3D. Nonetheless, these data indicate that 62% of SNPI is exchangeable with the corresponding region of the human 70K gene. Moreover, the substituted fragment includes those sequences in the human 70K gene which are known to be necessary for specific binding of the protein to the first stem -loop of the Ul snRNA (Patton and Pederson, 1988; Query et al., 1989a,b; Surowy et al., 1989). The apparent functional exchangeability of this region from the human to the yeast

5 6

7

8

9.46-6 4--

Fig. 5. Southern blot analysis of the snpJAStyI: URA3 disruption using

the DP3 DNA probe (Figure 3). Lanes 2, 4, 6 and 8 contain aa 100 DNA digested with the restriction enzymes SntI, SpeI, HindIll and EcoRV to give bands of predicted sizes of 2.66 kb, 3.66 kb, 9.54 kb and >8.3 kb (observed size -9 kb), respectively. Lanes 1, 3, 5 and 7 contain acu 115 DNA digested with the same enzymes, to produce one band corresponding to a wild-type SNPI allele and one band representing the chromosomal snplAStyl::URA3 disruption. The StvI sites in the disrupted snpl allele are destroyed and URA3 contains a SryI site (Figure 3), so the predicted size of the band produced by the mutated allele is 3.0 kb (lane 1). For the enzymes SpeI and HindIII, the disrupted allele produces a band which is 617 bp larger than the wild-type band; 4.28 kb and 10.71 kb, respectively. URA3 also contains an EcoRV site (Figure 3), resulting in a 5.21 kb band for the EcoRV digestion of the snpl disruption. Molecular weight markers, a HindIII digest of lambda DNA, are indicated on the left.

gene is strong evidence that the yeast 70K homologue does indeed bind to the yeast U 1 snRNA in vivo. To investigate the differences between the carboxy termini of the yeast and human proteins, a chimera was constructed in which the 554 bp Styl fragment from SNPI was replaced with the whole of the human 70K gene except for the first 15 amino acids. This was done by digesting pRS3RB with StuI and inserting a StuI-Dral restriction fragment from the human 70K gene. Preservation of the coding frame was confirmed by regeneration of the Stul restriction site. The resulting plasmid, pRS3HC, encodes the first 18 amino acids of the yeast protein in frame with the remainder of the human 70K protein. The amino terminus and RNA-binding domain of this chimera is identical to pRS3RB, but the carboxy terminus includes the human 70K charged tail and stop codon, and it is out of frame with the remaining yeast coding sequence. When ao 115 was transformed to LEU prototrophy with pRS3HC to produce aa 140, dissection of 18 tetrads from two independent diploids resulted in two viable and two inviable spores in each case. All 36 viable spores were ura auxotrophs, indicating that none had inherited a disrupted snpl allele. The inviable spores did undergo four or five rounds of cell division to produce between 20 and 30 cells in the first 24 h after germination. Over two-thirds of the viable spores were of LEU phenotype, which demonstrates that pRS3HC was not somehow lost in the sporulation process. This suggests that replacement of the last 98 amino acids of the yeast protein with the carboxy terminus of the 2631

V.Smith and B.G.Barrell Table I. Details of the tetrad analysis of the yeast strains described in this paper and phenotypes of the haploid colonies produced by viable spores Strain

Number of viable spores per tetrad

Number of tetrads

Phenotype of viable spores ura leu

acal l5

I 2 3 4

0 25 0 0

0 50 0 0

0 0

acsl2O

1 2 3 4

0 4 7 4

0 7 8 0

0 0 0

1 2 3 4

0 5 7 5

1 2 3 4

0 18

130 acd30 aoz

au

140

0 0

URA LEU

ura LEU

URA leu

0 0

0 0

0

0 1 6 8

0 8 6 2

0 0 0 0

0 2 8 8

0 0 7 10

0 10 0 0

0

0

0 0 0

26 0 0

0 0 0 0

human protein is detrimental to the viability of yeast spores which lack a wild-type SNPI gene.

&_

7

8

C.)

Discussion The human 70K protein is one of the best characterized of the snRNP proteins as a result of the availability of specific antisera from patients with autoimmune diseases (White et al., 1982; Pettersson et al., 1984). Antibody protection assays and RNA footprinting experiments have indicated that the 70K protein binds directly to the first stem -loop of the Ul snRNA (Patton and Pederson, 1988; Query et al., 1989a; Surowy et al., 1989). Identification of homologues of the 70K protein in other vertebrates reveals a high degree of conservation: the human 70K protein has 94% amino acid identity with a 379 amino acid fragment of the mouse protein (Hornig et al., 1989) and 70% amino acid identity with the full-length X. laevis homologue (Etzerodt et al., 1988). We have identified a homologue of the Ul snRNP 70K protein in the yeast S. cerevisiae. This protein, which shares 30% amino acid identity with the human 70K protein, is encoded by a 900 bp single copy gene that we have called SNPI which is located on chromosome IX. Many diverse nuclear and cytoplasmic RNA-associated proteins share a homologous sequence encompassing approximately 90 amino acids which is thought to form an RNA-binding domain (Query et al., 1989b; Bandziulis et al., 1989). The recent solution of the structure of the human Ul snRNP A protein (Nagai et al., 1990), which binds to the second stem - loop of the U 1 snRNA, has revealed a clustering of residues which are critical to U 1 snRNA binding on one face of the molecule, around the RNP1 and RNP2 consensus. From the homology between the vertebrate and yeast 70K protein sequences across the RRM and U 1 snRNA-binding domain, we conclude that the yeast homologue described in this paper possesses not only the features of an RNA-binding protein, but more specifically the primary sequence characteristics of a protein which binds to stem -loop 1 of the U 1 snRNA. Nearly two-thirds of the SNP1 gene was successfully subsituted in vivo with the

2632

Fig. 6. PCR analysis of viable haploid colonies. In lanes 1-5, the primers D501 and D301 (Figure 3) have been used. These primers produce a 941 bp band for wild-type SNPI alleles and a 1558 bp band for snpJAStyI::URA3 alleles. Lane 1: diploid aca100. Lane 2: diploid at1 15. Lane 3: ura leu haploid. Lane 4: URA LEU haploid (pRS3D). Lane 5: URA LEU haploid (pRS3RB). Lane 6: PCR of the haploid strain in lane 5 with D501 and a primer from the 3' end of the inserted human 70K gene fragment (HR3) to produce a 612 bp band. Lane 7: PCR of the haploid strain in lane 5 with D301 and a primer from the 5' end of the inserted human 70K gene fragment (HR5) to produce a 853 bp band. The molecular weight markers on the left of the gel are a HaeIII digest of phiX174 DNA. The sizes of the upper four bands arel353 bp, 1078 bp, 872 bp and 603 bp respectively.

homologous sequence sharing 32.8 % amino acid identity from the human 70K gene, including sequences encoding all of the RRM and all but the last five amino acids of the Ul RNA-binding domain. This exchange of the yeast sequence for a sequence-specific human U1 snRNA stem-loop 1 binding domain is compelling evidence to support the functional identity of this protein as the yeast homologue of the Ul snRNP 70K protein. Preliminary experiments to investigate the RNA-binding properties of

Yeast U1 snRNP 70K protein homologue

this protein have been carried out, and binding to stem-loop 1 of the yeast U 1 snRNA was observed with an unpurified soluble extract from Escherichia coli of the expression product of the predicted RNA-binding domain of the yeast protein (our unpublished results). However, binding was not observed when the experiment was repeated with purified protein, and the reasons for this are being investigated. The reduced growth rate of viable spores bearing a disrupted snpl allele and the human/yeast chimera as compared with their wild-type SNPI sister spores may be the result of substitution of the less highly conserved 80 amino acid region to the amino terminus of the RRM. This region may be involved in different protein or RNA interactions in the yeast and human U 1 snRNPs. Alternatively, this difference in growth rate may reflect differences between key amino acid positions in the RRM of the yeast and human proteins. The tractability of the RRM from both systems to genetic assay will permit the investigation of the features which distinguish the yeast 70K protein homologue. The carboxy terminus of the yeast 70K homologue differs substantially from the vertebrate 70K proteins. The yeast protein is much shorter and lacks the extensive charged regions found in the human and X. laevis proteins. There is no evidence from sequence analysis to suggest that the SNPI gene is spliced to another exon, and the recovery of spores bearing a disrupted snpl allele with a 2305 bp fragment including the SNPJ gene indicates that this fragment encodes all the protein sequence necessary to compensate for the lethal snpl mutation. Moreover, substitution of the short yeast carboxy terminus with the full human carboxy sequence is detrimental to the viability of dividing yeast cells, despite the presence of the RRM and Ul snRNA-binding domain. The differences in the length and secondary structure of the yeast and human Ul snRNAs beyond the first 50 nucleotides suggest that the carboxy terminus of the 70K proteins from each system may be interacting with quite different snRNP proteins or RNA structures. The yeast and human U1 snRNAs are predicted to form similar stem-loop structures within the first 50 nucleotides, in which 9 out of the 10 nucleotides in the loop are conserved (Kretzner et al., 1987; Siliciano et al., 1987). The one position in the loop which is not identical varies phylogenetically and is the only position which can be changed in the human Ul snRNA without greatly decreasing or abolishing binding of the 70K protein in vitro (Surowy et al., 1989). However, Liao et al. (1990) observed that deletion of 9 of the 10 nucleotides of loop 1 of the yeast Ul snRNA did not produce any growth phenotype, although splicing was reduced by 4-fold and lethality was observed when this deletion was combined with an otherwise nonlethal deletion of 95 nucleotides involving stem-loops VII and VIII and part of stems V, VI, IX and X. As we have found SNPI to be essential, if the gene product is indeed the yeast 70K homologue, it is possible that other protein-RNA or protein-protein interactions are sufficient to retain the 70K protein in the yeast U 1 snRNP. Studies in yeast have revealed that the base pairing interaction between the 5' splice site and the U 1 snRNA is not on its own sufficient for correct recognition and cleavage of the 5' splice site and progression to the next step in the splicing pathway (Siliciano and Guthrie, 1988; Seraphin et al., 1988). The human 70K protein is unique to the U1

snRNP and it is possible that this protein is involved in some part of the process of specific 5' splice site recognition. Investigation of the human and X. laevis 70K proteins is restricted to in vitro experiments, and the genes encoding these proteins have complex patterns of splicing and exon distribution (Etzerodt et al., 1988; Spritz et al., 1987, 1990). Identification of a 70K homologue in yeast permits the detailed investigation of the role of this protein in splicing using a combination of in vitro mutagenisis and in vivo expression of modified SNPJ genes. In addition, the functional exchange of sequences encoding over 180 amino acid residues between the human 70K protein and yeast homologue, including the RRM and Ul snRNA-binding domain, allows the investigation of the human 70K protein in yeast, which may contribute to our understanding of the differences between the splicing specificities of yeast and mammalian systems.

Materials and methods DNA sequencing We have been provided with a mapped library of lambda clones of chromosome IX of S. cerevisiae (strain AB972) by M.Olson (Olson et al., 1986). Sonicated fragments of clone 5610 were subcloned into M13mpl8 and sequenced by the dideoxynucleotide chain termination method using the Klenow fragment of DNA polymerase I (Boehringer) (Sanger et al., 1977; Bankier et al., 1987; Smith et al., 1990). The sequence was determined completely on both strands. Compilation of the sequence was carried out with the program SAP (previously DBAUTO and DBUTIL) (Staden, 1982) and open reading frames identified using the program DIANA (J.Crooke, T.Horsnell and B.G.Barrell, unpublished). Protein comparisons were done with the FASTA program (Pearson and Lipman, 1988) against the PIR (release 25, 1990) (George et al., 1986) and SWISS PROT (release 14, 1990) (Bairoch, 1990) databases. The alignment was performed using the programs AMPS (Barton and Stemnberg, 1987) and HOMED (Stockwell,

1988). Southern blot analysis Total genomic DNA was isolated from yeast cells (Struhl et al., 1979), digested with restriction enzymes and size fractionated on 0.8% agarose gels. The DNA was transferred by capillary blotting (Southern, 1975) to Hybond N membrane (Amersham) following the manufacturer's instructions. All DNA probes were prepared by the polymerase chain reaction (PCR) (Saiki et al., 1988) and labelled with dUTP-linked digoxigenin (Boehringer) by random priming (Feinberg and Vogelstein, 1983). Hybridizations were carried out at 65°C in 5 xSSC, 0.5 % blocking reagent (Boehringer), 0.1 % N-laurylsarcosine and 0.02% sodium dodecyl sulphate. Colorimetric detection was performed with anti-digoxigenin polyclonal antibody coupled to alkaline phosphatase, and the substrates 5-bromo-4-chloro-3-indolyl-phosphate and nitroblue tetrazolium salt, using a non-radioactive DNA labelling and detection kit (Boehringer) in accordance with the manufacturer's recommended conditions. Genetic analysis The yeast strains used for genetic analysis of SNPI were a657 (MATa ura3 trpl leu2 ade2-1 canl-JOhis4); and a700 (M4Ta ura3 trpl leu2 ade2-1 canl -100 his3), kindly supplied by A.Newman. These were mated on His- plates to produce the His+ diploid acalOO. All yeast strains were maintained on selective media containing 2% glucose. To disrupt SNPI, ao100 was transformed with 1 ytg of linear DNA by the lithium acetate procedure (Ito et al., 1983). For the rescue of disrupted SNPI alleles, act 15 was transformed to LEU prototrophy with 0.5 ug of pRS3D, pRS3B or pRS3HC. After growth on selective medium for 48 h, yeast strains were streaked on sporulation plates (1.5 % potassium acetate, 0.25 % yeast extract, 0.1 % glucose) and incubated at 30°C for 4 days. At this stage, - 60% of the cells had sporulated. The cells were then transferred into 10% glusulase (Du Pont) and incubated at 30°C for 5- 12 min, with removal of aliquots at regular intervals to check the extent of ascus digestion. Tetrads were dissected under phase microscopy on YEPD plates using a Leitz micromanipulator. The dissected spores were incubated at 30'C and their growth was observed over the next 4 days.

2633

V.Smith and B.G.Barrell Viable haploid cells were streaked on selective media for phenotypic analysis. To confirm the presence of chromosomal disruptions and intact SNPJ and human 70K gene sequences, PCR analysis was used. A trace amount of a yeast colony was transferred intoS ,ul of sterile water and a PCR reaction mix was added to create the following conditions in a final volume of 20 jl: 50 mM KCI, 10 mM Tris-HCI pH 8.3, 1.5 mM MgCl2, 250 ttM each dNTP, 0.5 gM each oligonucleotide primer, 0.5 U Taq polymerase (Cetus) for 40 cycles of thermal treatment (92°C for I min, 50°C for 1 min, 72°C for 2 min). The sequences of the oligonucleotide primers used in this study are: D501: 5'-AAT CGG ATC CTC AAC CTT GAT GAA T; D301: 5'-GTT TAA GCT TTA CTT ATA TTG AAT A; D520: 5'-ATA GCA ACG ATA CCA CAC AAT; D320: 5'-TCT GAT TTG TCA CTC GTT TCT; HR5: 5'GGA CCC TAT TCC ATA CCT GCC; HR3: 5'GCC TCC TCC TAG CCG CCG GGG.

Acknowledgements We thank Maynard Olson and Linda Riles for lambda clones and the physical map of chromosome IX, Rex Bentley for the human 70K cDNA, Andrew Newman, Gos Micklem and John Kilmartin for advice on tetrad analysis, and Mark Chee and Andrew Newman for critical reading of this manuscript. We are also grateful to Kiyoshi Nagai and Chris Oubridge for assistance with protein expression and purification and helpful discussion. V.S. is supported by a scholarship from the Association of Commonwealth Universities.

References Adam,S.A., Nakagawa,T.Y., Swanson,M.S., Woodruff,T. and Dreyfuss,G. (1986) Mol. Cell. Biol., 6, 2932-2943. Ares,M.(1986) Cell, 47, 49-59. Bairoch,A. (1990) Swiss-Prot protein sequence data bank releasel4 compiled by Amos Bairoch, Departement de Biochimie Medicale, Centre Medical Universitaire, 1211 Geneve 4, Switzerland. Bandziulis,R.J., Swanson,M.S. and Dreyfuss,G. (1989) Genes Dev., 3, 431-437. Bankier,A.T., Weston,K.M. and Barrell,B.G. (1987) Methods Enzymol., 155, 51-93. Banroques,J.and Abelson,J.N. (1989) Mol. Cell. Biol., 9, 3710-3719. Barton,G.J. and Stemberg,M.J.E. (1987) J. Mol. Biol., 198, 327-337. Beggs,J., Lossky,M.and Anderson,G.J. (1989) Biochem. Soc. Symp., 55, 69-75. Bjorn,S.P., Soltyk,A., Beggs,J.D. and Friesen,J.D. (1989) Mol. Cell. Biol., 9, 3698-3709. Dreyfuss,G., Swanson,M.S. and Pinol-Roma,S. (1988) Trends Biochem. Sci., 13, 86-91. Etzerodt,M., Vignali,R., Ciliberto,G., Scherly,D., Mattaj,I.W. and Philipson,L. (1988) EMBO J., 7, 4311-4321. Feinberg,A.P. and Vogelstein,B. (1983) Anal. Biochem., 132, 6-13. George,D.G., Barker,W.C. and Hunt,L.T. (1986) Nucleic Acids Res., 14, 11-15. Guthrie,C. and Patterson,B. (1988) Annu. Rev. Genet., 22, 387-419. Hinterberger,M., Pettersson,I.and Steitz,J.A. (1983) J. Biol. Chem., 258, 2604-2613. Hornig,H., Fischer,U., Costas,M., Rauh,A. and Luhrmann,R. (1989) Eur. J. Biochem., 182, 45-50. Ito,H., Jukuda,Y., Murata,K. and Kimura,A. (1983) J. Bacteriol., 153, 163-168. Kishimoto,A., Nishiyama,K., Nakanishi,H., Uratsuji,Y., Nomura,H., Takeyama,Y. and Nishizuka,Y. (1985) J. Biol. Chem., 260, 12492-12499. Kramer,A., Keller,W., Appel,B. and Luhrmann,R. (1984) Cell, 38, 299-307. Kretzner,L., Rymond,B. C. and Rosbash,M. (1987) Cell, 50, 593-602. Kretzner,L., Krol,A. and Rosbash,M. (1990) Proc. Natl. Acad. Sci. USA, 87, 851 -855. Kuenzel,E.A., Mulligan,J.A., Sommercorn,J. and Krebs,E.G. (1987) J. Biol. Chem., 262, 9136-9140. Langford,C.J. and Gallwitz,D. (1983) Cell, 33, 519-527. Langford,C.J., Klinz,F.-J, Donath,C. and Gallwitz,D. (1984) Cell, 36, 645-653. Liao,X., Kretzner,L., Seraphin,B. and Rosbash,M. (1990) Genes Dev., 4, 1766-1774. Lossky,M., Anderson,G.J., Jackson,S.P. and Beggs,J. (1987) Cell, 51, 1019-1026.

2634

Mancebo,R., Lo,P.C.H. and Mount,S.M. (1990) Mol. Cell. Biol., 10, 2492 -2502. Maniatis,T. and Reed,R. (1987) Nature, 325, 673-678. Marin,O., Meggio,F., Marchiori,F., Borin,G. and Pinna,L.A. (1986) Eur. J. Biochem., 160, 239-244. Mortimer,R.K., Schild,D., Contopoulou,C.R. and Kans,J.A. (1991) Methods Enzymol., 194, 827-863. Mount,S.M., Pettersson,I., Hinterberger,M., Karmas,A. and Steitz,J. (1983) Cell, 33, 509-518. Nagai,K., Oubridge,C., Jessen,T., Li,J. and Evans,P. R. (1990) Nature, 348, 515-520. Olson,M.V., Dutchik,J.E., Graham,M.Y., Brodeur,G.M., Helms,C., Frank,M., MacCollin,M., Scheinman,R. and Frank,T. (1986) Proc. Natl. Acad. Sci. USA, 83, 7826-7830. Patton,J.R. and PedersonT. (1988) Proc. Natl. Acad. Sci. USA, 85, 747-751. Pearson,W.R. and Lipman,D.J. (1988) Proc. Natl. Acad. Sci. USA, 85, 2444-2448. Pettersson,I., Hinterberger,M., Mimori,T., Gottlieb,E. and Steitz,J.A. (1984) J. Biol. Chem., 259, 5907-5914. Query,C.C., Bentley,R.C. and Keene,J.D. (1989a) Mol. Cell. Biol., 9, 4872 -4881. Query,C.C., Bentley,R.C. and Keene,J.D. (1989b) Cell, 57, 89-101. Riedel,N., Wise,J.A., Swerdlow,H., Mak,A. and Guthrie,C. (1986) Proc. Natl. Acad. Sci. USA, 83, 8097-8101. Ruby,S.J. and Abelson,J. (1988) Science, 242, 1028-1035. Sachs,A.B., Bond,W.M. and Kornberg,R.D. (1986) Cell, 45, 827-835. Sachs,A.B., Davis,R.W. and Kornberg,R. (1987) Mol. Cell. Biol., 7, 3268-3276. Saiki,R.K., Gelfand,D.H., Stoffel,S., Scharf,S.J., Higuchi,R., Horn,G.T., Mullis,K.B. and Erlich,H.A. (1988) Science, 239, 487-491. Sanger,F., Nicklen,S. and Coulson,A.R. (1977) Proc. Natl. Acad. Sci. USA, 74, 5463-5467. Seraphin,B. and Rosbash,M. (1989) Cell, 59, 349-358. Seraphin,B., Kretzner,L and Rosbash,M. (1988) EMBO J., 7, 2533-2538. Sharp,P. (1987) Science, 235, 766-771. Sikorski,R.S. and Hieter,P. (1989) Genetics, 122, 19-27. Siliciano,P. and Guthrie,C. (1988) Genes and Dev., 2, 1258-1267. Siliciano,P.G., Haltiner Jones,M. and Guthrie,C. (1987) Science, 237, 1484-1487. Smith,V., Brown,C.M., Bankier,A.T. and Barrell,B.G. (1990) DNA Sequence, 1, 73-78. Southem,E.M. (1975) J. Mol. Biol., 98, 503-517. Spritz,R.A., Strunk,K., Surowy,C.S., Hoch,S.O., Barton,D.E. and Francke,U. (1987) Nucleic Acids Res., 15, 10373-10391. Spritz,R.A., Strunk,K., Surowy,C.S. and Mohrenweiser,H.W. (1990) Genomics, 8, 371-379. Staden,R. (1982) Nucleic Acids Res., 10, 4731-4751. Stockwell,P.A. (1988) Trends Biochem. Sci., 13, 322-324. Struhl,K., Stinchcomb,D.T., Scherer,S. and Davis,R.W. (1979) Proc. Natl. Acad. Sci. USA, 76, 1035-1039. Surowy,C.S., van Santen,V.L., Scheib-Wixted,S.M. and Spritz,R.A. (1989) Mol. Cell. Biol., 9, 4179-4186. Taylor,W.R. (1986) J. Mol. Biol., 188, 233-258. Teem,J.L. et al. (1984) Nucleic Acids Res,. 12, 8295-8312. Theissen,H., Etzerodt,M., Reuter,R., Schneider,C., Lottspeich,F., Argos,P., Luhrmann,R. and Philipson,L. (1986) EMBO J., 5, 3209-3217. Thompson-Jager,S. and Domdey,H. (1987) Mol. Cell. Biol., 7, 4010-4016. Tollervey,D. and Mattaj,I.W. (1987) EMBO J., 6, 469-476 White,P.J., Billings,P.B. and Hoch,S.O. (1982) J. Immunol., 6, 27512756. Woodgett,J.R., Gould,K.L. and Hunter,T. (1986) Eur. J. Biochem., 161, 177-184. Zhuang,Y. and Weiner,A.M. (1986) Cell, 46, 827-835. Received on January 7, 1991; revised on May 16, 1991

Note added in proof This sequence has been deposited with the EMBL database and is available under accession no. X59986.

Cloning of a yeast U1 snRNP 70K protein homologue: functional conservation of an RNA-binding domain between humans and yeast.

We have cloned and sequenced a gene encoding a yeast homologue of the U1 snRNP 70K protein. The gene, SNP1, encodes a protein which has 30% amino acid...
2MB Sizes 0 Downloads 0 Views