Cell, Vol. 66, 383-394,

July 26, 1991, Copyright

0 1991 by Cell Press

Functional Expression of Cloned Human Splicing Factor SF2: Homology to RNA-Binding Proteins, Ul 70K, and Drosophila Splicing Regulators Adrian R. Krainer, Akila Mayeda, Diane Kozak, and Georgia Binns Cold Spring Harbor Laboratory Cold Spring Harbor, New York 11724-2208

Summary SF2 is a protein factor essential for constitutive premRNA splicing in HeLa cell extracts and also activates proximal alternative 5’ splice sites in a concentrationdependent manner. This latter property suggests a role for SF2 in preventing exon skipping, ensuring the accuracy of splicing, and regulating alternative splicing. Human SF2 cDNAs have been isolated and overexpressed in bacteria. Recombinant SF2 is active in splicing and stimulates proximal 5’ splice sites. SF2 has a C-terminal region rich in arginine-serine dipeptides, similar to the RS domains of the Ul snRNP 70K polypeptide and the Drosophila alternative splicing regulators transformer, transformer-2, and suppressor-of-white-apricot. Like transformer-2 and 70K, SF2 contains an RNP-type RNA recognition motif. Introduction The mechanisms responsible for the high degree of specificity of metazoan pre-mRNA splicing are poorly understood, especially in the processing of very large introns and multi-intron pre-mRNAs. Essential aspects of the specificity of splice site selection include the accurate recognition of degenerate signals, the complete exclusion of cryptic splice sites in the presence of authentic sites, and the avoidance of exon skipping (reviewed in Aebi and Weissmann, 1987; Krainer and Maniatis, 1988). On the other hand, the specific recognition of splice sites and the matching of exons can adjust to gene-specific variations in response to tissue-specific or developmentally controlled signals. Thus, a large number of genes express multiple protein isoforms by using alternative 5’ and/or 3’ splice sites, often in a regulated manner (reviewed in Smith et, al., 1989). A number of proteins that directly or indirectly regulate specific alternative splicing pathways in Drosophila have already been identified genetically and biochemically (reviewed in Maniatis, 1991; Baker, 1989; Bingham et al., 1988). The genes encoding these proteins are suppressor-of-white-apricot (su(wa)), Sex-lethal (Sxl), transformer (tra), and transformer-2 (Pa-2). The su(wB) gene product, which affects the eye color phenotype that results from a copia retrotransposon inserted at the white locus, negatively regulates the splicing of the first two out of seven introns in its own pre-mRNA (reviewed in Bingham et al., 1988). The remaining genes are involved in the control of somatic sex determination (reviewed in Baker, 1989). The Sxl gene product regulates the splicing of both its own pre-mRNA and that of tra by repressing a male-specific

default alternative 3’splice site in female flies. The tra and tra-2 gene products in turn regulate alternative splicing of the doublesex (dsx) pre-mRNA by activating femalespecific alternative 3’ splice and polyadenylation sites in female flies. Recently, an activity from HeLa cells termed SF2, which is required for the first cleavage-ligation step of splicing, was purified to apparent homogeneity using a biochemical complementation assay (Krainer et al., 199Oa). Purified, active SF2 consists of three polypeptides of 32-33 kd.. SF2 is required for the formation or stabilization of the earliest detectable prespliceosome complexes, binds RNA with little, if any, sequence specificity in the absence of other components, and possesses an intermolecular RNAannealing activity. In addition to its general role in the splicing reaction, SF2 has been shown to influence splice site selection in vitro (Krainer et al., 1990b). High concentrations of SF2 result in preferential use of proximal over distal 5’ splice sites with avarietyof pre-mRNAscontainingalternative5’splice sites. No significant effects have been observed with 3’ splice sites. A protein called ASF, isolated from 293 cells, was shown to have comparable effects on 5’splice selection with SV40 tumor antigen pre-mRNAs (Ge and Manley, 1990). Higher amounts of ASF activity are detected in 293 cell extracts than in HeLa cell extracts, and this correlates with the preferential use in vivo and in vitro of the t antigen 5’ splice site in 293 cells, and the T antigen 5’ splice site in HeLa cells. SF2 and ASF are now known to be identical (Ge and Manley, unpublished data; A. R. K., unpublished data; see also Ge et al., 1991 [this issue]). The properties of SF2/ASF suggest that changes in the concentration or activity of this factor could serve to regulate alternative splicing in vivo, e.g., in a tissue-specific or developmentally regulated manner. Quantitative changes in the activity of SF2/ASF may be mediated by posttranslational modifications. This factor may also play an important role in ensuring accurate splice site selection and preventing exon skipping in constitutively spliced premRNAs. To study the structure and function of SF2/ASF, cDNAs encoding this factor have been isolated and characterized. Results Characterization of HeLa Cell SF2 Polypeptides The most abundant polypeptides present in purified, active SF2 preparations from HeLa cells have been analyzed in detail. A doublet of polypeptides with relative electrophoretic mobility of 33 kd is consistently detected by Coomassie blue (Figure 1) or silver staining (Krainer et al., 1990a) after the final purification step by Phenyl-Superose chromatography. These two polypeptides will be referred to collectively as ~33, and individually as p33-A (top band) and p33-B (bottom band). In addition, a slightly smaller polypeptide is also found in the active fractions, and will be referred to as ~32. This polypeptide peaks in an adjacent

Cell 384

M

I

several peaks were sequenced perimental Procedures).

2

J -

p33-A p33-B p32

14.4 -

Figure

1, Polypeptide

Composition

of Purified

HeLa Cell SF2

SF2 was purified from HeLa cells as described (Krainer et al., 1990a), and fractions were analyzed by SDS-PAGE and Coomassie blue staining. Lane 1, inactive fraction containing pure ~32; lane 2, active fraction containing p33-A, p33-B, and ~32. The relative mobilities and sizes of the molecular weight markers (lane M) are indicated.

fraction that is free of p33-A and p33-B and is by itself inactive in the complementation assayforsplicing (Krainer et al., 1990a). The presence of p32 in the active fractions may be fortuitous, or this polypeptide may be necessary but not sufficient for SF2 activity. The overall amino acid composition of the most highly purified active fraction was determined, as was the individual composition of the p33-A and p33-B bands following SDS-PAGE and electroblotting to a polyvinylidene difluoride membrane. The amino acid composition of the p32 band was determined from the preceding fraction, in which it is virtually homogeneous. p33-A and p33-B have a similar amino acid composition, characterized by a very high content of arginine, glycine, and serine (data not shown). In contrast, p32 proved to be acidic and low in arginine content. The same two fractions were subjected to automated sequencing and yielded a single sequence, indicating either that all three polypeptides shared the same N-terminus or that one or both of the p33 polypeptides have a blocked N-terminus (data not shown; see below and Experimental Procedures). Preparations of pure p32 were used to extend the N-terminal sequence to 41 amino acids (see Experimental Procedures). A fraction containing highly purified p33-A and p33-B polypeptides from an SF2 preparation that had lower than usual amounts of p32 was digested in solution with Staphylococcus aureus V8 protease, the resulting peptides were separated by reverse-phase HPLC, and

(data not shown; see Ex-

cDNA Cloning of SF2 Polypeptides Degenerate oligonucleotides were designed based on the most reliable and least degenerate portions near the ends of each of the peptide sequences (see Experimental Procedures). The C-terminal antisense oligonucleotides derived from each peptide sequence were used to prime cDNA synthesis on HeLa cell poly(A)+ mRNA. Polymerase chain reaction (PCR; Saiki et al., 1988) products were generated after amplification in the presence of the same antisense oligonucleotides, together with appropriate N-terminal sense oligonucleotides from either the same or a different peptide. The most abundant, discrete amplification products were subcloned and sequenced. In the case of the p32 N-terminal 41 amino acid sequence, the length of the expected PCR product was known, and the unique DNAsequence bridging the degenerate primers matched the experimentally determined amino acid sequence (data not shown; see Experimental Procedures). In the case of the V8 peptides from the mixture of p33-A and p33-B, the corresponding degenerate oligonucleotides were tested in each of the two possible relative positions for each pair of peptides. One particular combination generated a discrete PCR product from HeLa poly(A)+ mRNA. Its sequence matched the experimentally determined amino acid sequence immediately adjacent to the primers, and provided additional sequence between the two peptides (data not shown; see Experimental Procedures). ~32 cDNA Oligonucleotides corresponding to the unique nucleotide sequence bridging each pair of degenerate primers were end labeled and used to screen a Xl149 HeLa cell cDNA library. A single clone, cDNA32, corresponding to the p32 polypeptide was obtained from a screen of 3 x 1 O5recombinants, and the cDNA was sequenced (Figure 2). The sequence has no significant homologies to current entries in the protein and DNA data bases. cDNA32 is 1,271 bp long and encodes a highly acidic, 209 amino acid protein of 23,784 daltons. The theoretical isoelectric point predicted from the amino acid sequence is 4.04. The predicted polypeptide sequence initiates with a leucine CTG codon, which corresponds to the N-terminus of the purified HeLa cell protein. The absence of upstream ATG codons, or of consensus 3’ splice sites, suggests that the mature N-terminus does not arise by proteolytic cleavage of a precursor. Instead, synthesis of p32 may initiate at the CTG codon, an unusual but not unprecedented finding (Prats et al., 1989; Saris et al., 1991). The presumptive 5’ untranslated region is 207 nucleotides (nt) long, lacks termination or initiation codons, and has the sequence characteristics of CpG islands, which are usually found at the 5’ ends of housekeeping genes (Gardiner-Garden and Frommer, 1987). The 3’ untranslated region is 392 nt, not including the poly(A) tail. The sequence AATAATAAT, which begins 27 nt upstream of the apparent site of poly(A) addition, is the most similar to the conserved AATAAA pre-mRNA 3’ end processing signal (reviewed in Proudfoot, 1991).

Structure 365

and Function

of SF2

Figure 2. Nucleotide quences of p32 cDNA

GAA

TGG

AAG

GAT

ACT

AAT

TAT

ACA

CTC

ACA T AAG K CAA 0 TTT F AAC

TCA s AAG K GAA E CAG 0 ACA

ACT T GCC A GAC D TCC S GAT

CCC P CTT L GAG E ACT T TCC

~33 cDNAs An end-labeled oligonucleotide corresponding to the unique nucleic acid sequence determined for the p33 polypeptides was used to probe the same cDNA library. Two independent clones, designated cDNA33-1 (1,155 bp) and cDNA33-2 (2.79 kb), were obtained from a screen of 3 x lo5 recombinants. They contain the same apparently complete coding sequence, but different lengths of 5’ and 3’untranslated sequences (Figure 3). They encode a basic 248 amino acid protein of 27,744 daltonswith the expected high content of arginine, glycine, and serine residues, and

AAT N GTG ” GCT A GGC G TTG

TIC F TTG L GAG E GAG E GAC

GTG ” GA-2 D AGT s TCT S TGG

and

Amino

Acid

Se-

The nucleotide sequence of the sense strand is shown. The protein sequence of p32 is shown in the one-letter code below the corresponding codons. The nucleotide positions are shown on the right, the amino acid positions on the left. The boxed nucleotides indicate the proposed CTG initiation codon (+I) and the amber codon (+628). The EcoRl linkers are shown in lowercase.

480

no sequence relation to the acidic p32 polypeptide. The coding sequence appears to be full length; it contains the expected V8 peptides (see Experimental Procedures) and does not share N-terminal sequence with ~32. Since SF2 preparations containing more p33 than p32 polypeptides yielded the same N-terminal sequence, we conclude that the HeLa cell p33 polypeptides have blocked N-termini, most likely N-acetylserine (Brown and Roberts, 1976). Comparisons of the p33 coding sequence to the protein and nucleic acid data bases reveal similarities to proteins with RNP-type RNA recognition motifs (RRMs; Query et

127 64

Figure 3. Nucleotide and quences of p33 cDNAs

Amino

Acid

Se-

The nucleotide sequence of the sense strand is shown. The protein sequence of p33 is shown in the one-letter code below the corresponding codons. The nucleotide positions are shown on the right, the amino acid positions on the left. The boxed nucleotides indicate the ATG initiation codon (+l), the ochre codon (+745), and the putative polyadenylation signal. The underlined G (-69) and T (+I 065) residues mark the first and last nucleotides of genomic sequence in cDNA33-1; the remaining 5’ and 3’ untranslated sequences are from cDNA33-2. The EcoRl linkers are shown in lowercase. The RNP-I and RNP-2 elements are indicated in reverse type. The lowercase boxed letters designate an octapeptide that also occurs in the Ul snRNP 70K polypeptide. The shaded boxes indicate the RS or SR dipeptides. The boxed G residues show the glycinerich region. The dots denote a gap of approximately 1 .l kb of undetermined sequence.

Cell 386

B

A

A

4 TO?

-

M

4.40

M

42.7 622 527 -

-

+

.a ,,

SOL.

.

.I.

404-

2.37

-

1.35

-

0.24

309

-

..-

I Figure

123 110 -

90 -

I23 4. Analysis

+

9.49 Kb 7.46

._c

Figure

B -

5. Expression

of Cloned

2

p32 and p33 cDNAs

3 in E. coli

The p32 and p33 cDNAs were subcloned into T7 promoter vectors designed for overproduction of authentic proteins in E. coli containing an inducible T7 RNA polymerase gene (Studier et al., 1990; see Experimental Procedures). Thirty microliters of each bacterial culture was analyzed by SDS-PAGE and Coomassie blue staining. (A) Expression of ~33. Lane 1, uninduced cells; lane 2, induced cells. (B) Expression of ~32. Lane 1, uninduced cells; lane 2, induced cells; lane 3, cleared extract prepared from induced cells. The relative mobilities and sizes of the molecular weight markers are indicated (lanes M). The arrows show the overproduced p32 and p33 polypeptides.

of HeLa Cell p33 mRNA

(A) Northern blot. HeLa cell total poly(A)’ RNA (5 Kg) was fractionated by agarose-formaldehyde gel electrophoresis, blotted, and probed with an antisense p33 RNA probe containing the entire cDNA33-1 sequence. The relative mobilities and sizes of the RNA molecular weight markers are indicated. (8) Primer extension. RNA was analyzed by reverse transcription with a 5’ end-labeled oligonucleotide primer complementary to p33 sequences from +4 to -21. The products were analyzed by PAGE-urea and autoradiography. Lane 1, DNA molecular weight markers (5’endlabeled Hpall digest of pBR322); lane 2, reverse transcripts from 2 ng of control T3 RNA transcript synthesized from pGEMEXcDNA33-2+; lane 3, reverse transcripts from 2.5 pg of HeLa cell total poly(A)+ RNA. The relative mobilities and sizes of the molecular weight markers are indicated.

al., 1989) glycine-rich domains, and/or arginine- plus serine-rich domains (RS domains) (see Figure 3 and Discussion). The theoretical isoelectric point predicted from the amino acid sequence is 10.77. The first methionine codon has an excellent match to the optimal initiation context consensus (Kozak, 1987). The presumptive 5’ untranslated region lacks any termination or initiation codons and is rich in CpG dinucleotides. Northern blot analysis with an antisense RNA probe containing the entire cDNA33-1 sequence reveals a predominant 2.8-2.9 kb polyadenylated SF2 mRNA in HeLa cells (Figure 4A). The minor band of 1.8-l .9 kb is due to crosshybridization with residual 18s rRNA (data not shown). Primer extension analysis with an antisense oligonucleotide primer beginning at position +4 of the p33 nucleotide sequence (Figure 3) indicates that the total length of the

mRNA 5’ untranslated region is approximately 180 nt, of which 57 nt are missing from cDNA33-2 and 92 nt are missing from cDNA33-1 (Figure 48). Longer primer extension products of 460 and 640 nt may be due to readthrough transcripts of unknown significance. cDNA33-1 has a 3’ untranslated region of 317 nt, whereas that of cDNA33-2 is about 1,900 nt and has not been completely sequenced. We believe that cDNA33-1 arose by oligo(dT) priming in a GA-rich region of the 3’ untranslated region, which marks the point of divergence between the two cDNAs (Figure 3). Thus, no AATAAA signal is found upstream of the short poly(A) tail of cDNA33-1. In the case of cDNA33-2, the poly(A) tail appears to have been lost upon preparation of the ends for EcoRl linker addition, resulting in linker addition within an AT-rich region. It is likely that this region is very near the polyadenylation site, because an AATAAA signal is found just upstream. The expected length of SF2 mRNA based on cDNA33-2 is 2.8 kb plus the poly(A) tail, which is consistent with the Northern blot analysis (Figure 4A). Bacterial Expression of ~32 and p33 cDNAs To study further the structure and function of p32 and ~33, the respective cDNA coding sequences were subcloned into several related T7 vectors designed for inducible expression of the authentic protein in E. coli (Studier et al., 1990; see Experimental Procedures). The p33 protein appears to be toxic to the host, and despite the use of vectors and hosts that minimize basal expression (see Experimental Procedures), only relatively low level expression of p33

Structure 387

and Function

of SF2

was obtained, ranging in several experiments from lo/o5% of total bacterial protein (Figure 5A and data not shown). The recombinant protein is soluble (data not shown) and can be detected as a medium abundance polypeptide after induction (Figure 5A). It migrates with an apparent molecular weight of 31.6 kd (Figure 5A and data not shown), although its true molecular weight is only 27.7 kd (Figure 3). The authentic HeLa cell protein migrates even more slowly, with an apparent molecular weight in the range of 33 kd (Figure 1). Following mutagenesis of the leucine CTG codon to ATG, p32 was also overproduced in soluble form using the T7 expression system (Figure 58). The recombinant product has the correct electrophoretic mobility compared with the authentic protein, i.e., 32 kd, despite the fact that its true molecular weight is only 23.8 kd (Figure 2). It represents approximately 50% of the total bacterial protein in induced cells. We reasoned that the low level of expression and/or the reduced apparent molecular weight of recombinant p33 may be due to instability of the C-terminal RS domain or to inefficient translation due to stalling of the ribosomes within this domain, which includes several arginine and serine codons that are rarely used in E. coli (Wada et al., 1990). Previous work showed that, at least in some cases, the effects of codon usage on overexpression may be limited to the N-terminus of the expressed protein (Chen and Inouye, 1990). However, the RS domain is extensive, and other proteins containing similar domains have not been successfully expressed in bacteria. Therefore, the C-terminal portion of the cloned cDNA was replaced by synthetic DNA encoding the same protein sequence but containing only the preferred E. coli codons (see Experimental Procedures). We did not observe a significant improvement in p33 expression, nor a reduction in electrophoretic mobility. Nevertheless, the synthetic fragment was employed in subsequent work. It is possible that if optimization of other parameters results in improved expression, the presence of rare codons may become a limiting factor. Functional Assays of Recombinant SF2 Polypeptides The bacterial extracts contain significant ribonuclease activities, and probably other inhibitors that interfere with the splicing assays. However, because of its abundance, recombinant p32 could be assayed directly in splicing reactions after extensive dilution. The largest amount of bacterial extract that did not interfere with splicing or cause RNA degradation corresponded to more p32 than is normally present in a standard splicing reaction with HeLa cell extract. Variable amounts of recombinant p32 had no effect on splicing or splice site selection by itself, or in the presence of purified HeLa cell or recombinant p33 (data not shown). Since recombinant p33 was produced in relatively low amounts, it had to be extensively purified before its activity could be determined (see Experimental Procedures). Bacterial nucleic acids were efficiently removed by cesium chloride density gradient ultracentrifugation, but after dialysis to remove the cesium chloride, p33 was recovered

quantitatively as insoluble material (data not shown). Although this resulted in a significant enrichment, such that p33 was now the predominant protein in the pellet, it could not be resolubilized by high salt, and the suspension could not be assayed because of the continued presence of ribonuclease. The observed irreversible precipitation of p33 suggests that RNA may be required for its solubility. Consistent with this possibility, a small amount of hnRNA copurifies with HeLa SF2 (Krainer et al., 1990a). To assay recombinant ~33, the insoluble material was dissolved in buffer containing SDS and fractionated by preparative SDS-PAGE (Hager and Burgess, 1980). The protein wasvisualized by cold potassium chloride staining, eluted by diffusion, denatured with guanidinium hydrochloride, and allowed to renature (Hager and Burgess, 1980). The resulting material was assayed for splicing activity in the complementation assay with SlOO extract (Figure 6A; Krainer and Maniatis, 1985; Krainer et al., 199Oa), as well as for stimulation of proximal 5’splice site selection in the presence of SlOO extract (Figure 6B; Krainer et al., 1990b). In the presence of constant amounts of Sl 00 extract, which provides all essential splicing factors except SF2, the efficiency of splicing with wild-type human 6-globin pre-mRNA increased in proportion to the concentration of recombinant p33 (Figure 6A). The same was true for thalassemic 8-globin pre-mRNA in which the mutant 5’ splice site is inactive and three cryptic 5’ splice sites become available for alternative splicing to the authentic 3 splice site (Figure 6B; Krainer et al., 1990b). In this case a gradual switch from the first upstream cryptic 5’ splice site (cr.2) to the downstream cryptic 5’splice site (cr.3) was seen with increasing amounts of recombinant ~33. The predominant 5’ splice site used with the highest amounts of recombinant p33 tested was the proximal cr.3 5’splice site (Figure 6B, lane 7) whereas the distal cr.2 5’ splice site was favored in nuclear extract over cr.1 (lane 1). The use of the distal site in nuclear extract was previously shown to be solely due to a low effective concentration of SF2 activity (Krainer et al., 1990b). In summary, the renatured recombinant p33 polypeptide was active in both splicing complementation and proximal 5’splice site stimulation assays, demonstrating that a single polypeptide is sufficient for both activities. Hence, the p33 polypeptide (27.7 kd) encoded by cDNA33-1 and cDNA33-2 (Figure 3) will be referred to as SF2, keeping in mind that a role for p32 has not been ruled out (see Discussion). In Vitro Translation of SF2 As mentioned above, recombinant SF2 migrates more rapidly on SDS-PAGE than either p33-A or p33-B isolated from HeLa cells. This difference in mobility could be due to the recombinant protein having incorrect N-or C-termini as a result of proteolysis or inappropriate translation, or to a lack of one or more posttranslational modifications. Alternatively, the cDNAs we isolated may encode a small and possibly rare isoform of SF2 that may be similar but not identical to biochemically isolated HeLa SF2. The N-terminus of recombinant SF2 was determined to be an unmodified serine, corresponding to position 2 in the se-

Cdl 388

A

6

SIOO y

_---+---1

Figure 6. Functional SF2 (~33)

SIOO

rSF2

y

2

rSF2

123

El cl‘

(r_

I234567M

MI

Assays

of Recombinant

(A) Splicing complementation assay. SP6 premRNA containing the first two exons and first intron of wild-type human 8-globin was spliced in vitro with HeLa cell nuclear extract (NE) or with HeLa cell cytoplasmic SlOO and increasing amounts of purified recombinant p33 (rSF2). The products were analyzed by PAGEurea and autoradiography. Lane 1, 10 nl of nuclear extract; lanes 2-7,7 nl of SIOO plus 0, 8, 16, 32, 64, and 128 ng of ~33, respectively. Lane M, DNA molecular weight markers. The structures and relative mobilities of the premRNA, intermediates, and products are indicated at left. (B) Splice site selection assay. SP6 pre-mRNA derived from a thalassemic human 6-globin allele wasspliced in vitro under conditions identical to those of (A). The mutant 5’ splice site is inactive, and the mRNA products are generated by activation of one of three cryptic 5’ splice sites. The relative mobilities and sizes of the corresponding mRNAs are indicated (cr.l-cr.3).

234567

quence (data not shown). The N-terminus of HeLa SF2 is blocked and is almost certainly the same serine, modified by acetylation (Brown and Roberts, 1976). Although the remaining explanations have not been fully addressed yet, we favor the idea that the difference in mobility is due to posttranslational modification. Further modifications may also account for the appearance of HeLa SF2 as a doublet, although protein isoforms generated by alternative splicing or gene duplication may also contribute to this heterogeneity. To address the above questions, transcripts derived from the cloned SF2 cDNAs were translated in vitro (Figure 7). The electrophoretic mobilities of the in vitro translated polypeptides were slower than that of recombinant SF2, and were indistinguishable from those of HeLa p33-A and p33-B (Figures 1,5A, and 7, and data not shown). Furthermore, the in vitro translated products often consist of closely spaced doublets, which is especially apparent in the wheat germ extract translation (Figure 7, lane 4). The mRNA used in this experiment contains the T7 gene 70 ribosome-binding site and lacks human 5’ untranslated sequences. It was chosen because it should be identical to the mRNA generated in E. coli except for the presence of a 5’ cap, and also because we found that this mRNA was translated much more efficiently than mRNAs containing the natural 5’ untranslated region. When the latter mRNAs were used, the same qualitative pattern was observed and the doublet appearance was more obvious in reticulocyte extracts (data not shown).

tive 5’splice sites in vitro can be modulated by the concentration of SF2 in a polar manner, such that with a large number of natural or model pre-mRNAs, proximal 5’splice sites are favored at high concentrations of SF2 (Krainer et al., 1990b; A. M. and A. R. K., unpublished data; Helfman and A. R. K., unpublished data). This latter property was also observed with human ASF (Ge and Manley, 1990), a factor now known to be identical to SF2 (Ge and Manley, unpublished data; A. R. K., unpublished data; see also Ge et al., 1991). The mechanisms responsible for these biochemical effects on splicing are still unknown. For example, it is not known whether SF2/ASF directly activates proximal 5’ splice sites, or whether these sites are indirectly activated as a result of repression of the ciscompeting distal 5’ splice sites. The molecular basis for the polarity of alternative 5’ splice site selection is also unknown, and this property may be intrinsic to SF2/ASF or it may reflect other aspects of the splicing machinery. In the latter case, the concentration dependence on SF2/ ASF may simply be due to the different affinities of this factor for each alternative 5’ splice site. In this regard, it is known that different constitutively spliced pre-mRNA substrates require different amounts of SF2 for maximal splicing efficiency in vitro (Krainer et al., 1990a). What is known so far is that SF2 is a general RNA-binding protein required for early spliceosome assembly. It also appears to change RNA higher order structure and thus can promote annealing of complementary RNAs in an ATPindependent manner (Krainer et al. 199Oa).

Discussion

The Recombinant SF2 Polypeptide p33 Is Active in Splicing and Splice Site Selection Active HeLa SF2 reproducibly copurifies with two extremely basic polypeptides termed p33-A and p33-B (Figure 1). The same active fractions always contain detectable,

Human SF2 has been demonstrated biochemically to be necessary for splicing a variety of pre-mRNA substrates (Krainer et al., 1990a). In addition, the selection of alterna-

Structure 389

and Function

of SF2

RETICULOCYTE I+

WHEATGERM -

+

3

4

T7-SF2

mRNA

97.4 66.2 -

42.7 -

31.0 -

21.5 -

I Figure

7. In Vitro Translation

2

of p33 cDNA

Capped T7-SF2 mRNA was generated from linearized pT7A.A~SF2 plasmid by in vitro transcription with T7 RNA polymerase. The RNA was translated in rabbit reticulocyte lysate (lanes 1 and 2) or wheat germ extract (lanes 3 and 4) in the presence of [%S]methionine. The proteins were analyzed by SDS-PAGE; the gel was stained to locate the molecular weight markers, and the labeled translation products were visualized by fluorography. Lanes 1 and 3, no RNA; lanes 2 and 4, T7-SF2 mRNA. The relative mobilities and sizes of the molecular weight markers are indicated.

albeit variable, amounts of a smaller acidic polypeptide termed ~32. One or both of the p33 polypeptides seemed most likely to be SF2, because UV cross-linking experiments show that they can bind RNA (Krainer et al., 1990a), whereas purified p32 does not cross-link to RNA. Independent studies on ASF have yielded a very similar polypeptide profile, using a different purification protocol (Ge and Manley, 1990). Purified HeLa p32 by itself cannot coTpIement an St 00 extract for splicing and does not affect splice site selection when added to nuclear extract or to SlOO extract and ~33. The same is true for recombinant ~32, which should differ from HeLa cell p32 only in having a methionine rather than a leucine residue at the N-terminus. However, the fact that some p32 copurifies with p33-A and p33-B through four chromatographic steps, including anion exchange, suggests a possible interaction between these polypeptides, although its physiological relevance is uncertain. Whereas active HeLa SF2 devoid of p32 could not be obtained biochemically, we have now shown that recombinant p33 is sufficient for complementation (Figure 6). The biochemical complementation test shows that p32 is not limiting, but does not exclude a role for this polypeptide in splicing. Thus, p32 may be present in the Sl 00 extract and may physically associate with recombinant p33 to form a putative active protein complex. Gel filtration and glycerol gradient analysis of purified mammalian SF2/ASF gave an apparent native molecular weight in the range of 30-40

kd, and substantially larger with cruder material (Ge and Manley, 1990; Krainer et al., 1990a). The so-called p33 polypeptides are actually 27.7 kd according to the primary sequence, not taking into account posttranslational modifications. The p32 polypeptide is only 23.8 kd according to the primary sequence. Therefore, it is not certain at present whether active SF2 consists of a monomer of just one of the p33 polypeptides. Further studies are necessary to determine whether p32 plays any role in splicing, or whether its copurification with SF2 is merely coincidental or represents an artifact of isolation. The specific activity of recombinant SF2 is significantly lower than that of purified HeLa SF2. The most likely explanation for this is that the renaturation step after preparative SDS-PAGE was incomplete, since the conditions have not been optimized forthis particular protein. Another plausible explanation is the lack of posttranslational modifications in the recombinant protein. In this regard, it is noteworthy that bacterial SF2 migrates more rapidly than either band in the doublet observed with the mammalian protein (Figures 1 and 5). Since the HeLa cell protein has a blocked N-terminus, it probably lacks the methionine and carries an acetylated serine (Brown and Roberts, 1976). The bacterial protein contains unmodified serine at its N-terminus, which reflects the loss of formylmethionine from the primary translation product (Adams and Capecchi, 1966; Webster et al., 1966). Additional modifications are likely responsible for the differences in electrophoretic mobility, and preliminary results are consistent with the presence of one or more phosphorylated residues in HeLa cell SF2 (our unpublished data). Posttranslational modification may also explain the slight electrophoretic mobility difference between HeLa cell p33-A and p33-B. Thus, the p33 cDNAs generate doublets of the expected electrophoretic mobility upon in vitro transcription followed by translation in reticulocyte or wheat germ extracts (Figure 7). Alternatively, these two polypeptides may represent two different isoforms generated by gene duplication or alternative splicing. Because of the limited quantities of purified HeLa p33-A and p33-B available, we have not determined whether all the V8 protease fragments are common to both polypeptides. We also note that other HeLa cell polypeptides besides p33-A and p33-B may have SF2 activity. One potential candidate is a 55 kd polypeptide previously described in biochemical fractionation and immunoprecipitation experiments (Krainer et al., 199Oa). The range of recombinant SF2 concentration that results in detectable complementation is 0.6 to 4.8 pmol in a 25 ~1 reaction with 20 fmol of pre-mRNA substrate. This represents a 30- to 240-fold stoichiometric excess of protein. This seemingly large excess may be accounted for in part by several technical aspects of the experiment. First, as mentioned above, only a small proportion of the renatured SF2 is likely to be active. Second, a significant fraction of the purified, active SF2 may be complexed by nonspecific RNAs and proteins in the crude SlOO extract. Third, it is possible that during the course of the splicing reaction only a fraction of the recombinant SF2 acquires posttranslational modifications that may be important for

Cell 390

UI snRNP 70K :i

,,Z,“d

fra -2

264oa

1III1 111111 Sufw4

964aa

llllII IIIII u SF2

Figure

240 m

8. Distribution

of ArgininelSerine

Dipeptides

in RS Domain

Proteins

The proteins shown are drawn to scale, with the length in amino acids indicated above each protein. Each thin vertical line denotes a single RS or SR dipeptide. The thicker lines indicate clustering of dipeptides, and are drawn to scale. The stippled boxes indicate indicate FiRMs (see also Figure 9). Except for SF2, the other proteins have additional R-rich or R- and S-rich regions not made up exclusively of RS or SR dipeptides (see Mancebo et al., 1990). Some of the proteins have additional alternatively spliced isoforms. References are given in the text.

function. Fourth, most preparations of HeLacell SF2 have higher specific activities than recombinant SF2. Therefore, it is not possible to determine at present whether SF2 acts stoichiometrically or catalytically. It is likely that at least in modulating Ysplice site selection, SF2 binds stoichiometrically to the pre-mRNA, and possibly in excess. Conserved Sequence Motifs in the p33 Polypeptides Several clusters of alternating arginine and serine residues are found near the C-terminus of the p33 sequence (Figure 3). The last 45 amino acids are composed of 40% arginine and 40% serine, mostly as RS or SR dipeptides, including a continuous stretch of 16 alternating R and S residues. These clusters constitute regions of significant similarity to the Drosophila proteins tra (Boggs et al., 1987) tra-2 (Amrein et al., 1988; Goralski et al., 1989) and su(wE) (Chou et al., 1987) all of which are involved in the regulation of specific alternative splicing pathways (reviewed in Maniatis, 1991; Baker, 1989; Bingham et al., 1988). Similar RS domains are present in a phylogenetitally conserved region of the 70K protein of the Ul snRNP (Theissen et al., 1988; Spritz et al., 1987; Etzerodt et al., 1988; Mancebo et al., 1990) and in the E2 proteins of several isolates of papillomaviruses (Fuchs et al., 1986; Zachow et al., 1987). No other proteins in the existing data bases have RS domains. A unique feature of the arginine-rich region of SF2 is the occurrence of arginines almost exclusively as RS dipeptides; only three RR dipeptides occur in the entire protein (Figures 3 and 8). In contrast, other arginine-rich proteins, including those with RS domains, contain multiple uninterrupted stretches of arginine residues, and in some cases arginine interspersed with aspartic or glutamic acid (Mancebo et al., 1990). The p33 cDNA clones also contain a region with strong homology to the 80 amino acid RNP-type recognition motif (RRM), including the two characteristic internal consensus elements RNP-1 and RNP-2 (Figure 9). The RRM is common to a large family of RNA-binding proteins, including several hnRNP and snRNP proteins, the Drosophila alter-

native splicing regulators Sxl (Bell et al., 1988) and tra-2 (Amrein et al., 1988; Goralski et al., 1989), and many other prokaryotic and eukaryotic proteins (reviewed in Query et al., 1989; Bandziulis et al., 1989). The presence of two prolines in the RNP-1 consensus octamer of SF2 is unusual, since arginine or lysine followed by glycine is the norm (Figure 9). These prolines are located in the predicted variable loop, according to the three-dimensional structure recently solved for an RRM fragment from the Ul snRNP A protein (Nagai et al., 1990; Hoffman et al., 1991) and as such they would not perturb the expected 8 strand that involves the rest of the RNP-1 residues. On the other hand, these unusual residues may influence the RNA-binding properties of SF2. This putative RNA-binding domain near the N-terminus is followed by a stretch of multiple consecutive glycines, which separates it from the C-terminal RS domain. This is the region of maximal theoretical flexibility, and thus it may act as a hinge region between N-terminal and C-terminal domains. The Ul snRNP 70K protein has a similar organization of RRM, glycine hinge, and RS domains, despite its larger size and lack of strong global homology (Queryet al., 1989; Etzerodt et al. 1988; Spritz et al. 1987; Mancebo et al., 1990). This protein has additional glycine-rich domains as well as arginine-rich domains containing glutamic or aspartic acid, but lacking serine. In addition, the octapeptide EFEDPRDA occurs in SF2 and in the human and X. laevis 70K proteins; this octapeptide does not occur in any other proteins or open reading frames in the current data bases. The corresponding sequence in the D. melanogaster 70K protein is DFEDPKDT. The function of this sequence is unknown; it overlaps the C-terminus of the RNP-1 motif of human SF2, whereas it precedes the RRM motif of the 70K proteins and actually falls outside the experimentally determined Ul RNA-binding domain of human 70K (Queryet al., 1989). The identification of multiple sequence elements shared by SF2 and the Ul snRNP 70K protein, together with the fact that both SF2/ASF (Krainer et al., 1990b; Ge and Manley, 1990) and Ul snRNP (Zhuang and Weiner, 1986; Seraphin et al., 1988;

Structure 391

and Function

of SF2

RNP-2 human h”RNP Al #1 human hnRNP Al #2 human hnRNP A2,Bl #, human hnRNPAZB1 #2 human hnRNP CIICZ human “1 snRNP 70 K human “1 s”RNP A #1 human “1 S”RNP A #2 human “2 s.nRNP B’#, human U2 snRNP 8’#2 human poly(A) bp #I human poly(A) bp #2 human poly(A) bp #3 human poly(A) bp #4 fly SXI #I fly 5x1 #2 fly tra2 fly hdp #I fly hdp #2 human

Figure

SF2

9. Sequence

RNP-1 93

15 106 1 DR2 1011113 17 104 11 209 8 152 12 100 192 292 l*6 212 100 32 123

184 88,100 179,191 89 183 91 282 88 225 91 177 267 369 205 291 177 110 201

17

93

Alignment

of Selected

RNA Recognition

Motifs

The 20 RRMs shown are from 11 human and Drosophila proteins with a role in mRNA processing. The RNP-1 and RNPP elements are shown in reverse type, and eight other highly conserved positions are shown in stippled boxes (Query et al., 1989; Bandziulis et al., 1989). The alignment was anchored on these ten blocks of sequence. Other less conserved positions are not highlighted, and slightly different alignments are possible, depending on the subclasses of RRM sequences shown and on the introduction of different or additional gaps. Some of the RRMs shown are from proteins that have high overall sequence homology. The numbers on the left and right indicate the location of each RRM in the respective amino acid sequences. Some of the proteins have additional alternatively spliced isoforms. Poly(A) bp = polyadenylate-binding protein (Grange et al., 1987); hdp = helix-destabilizing protein. Other references are given in the text and in Query et al. (1989).

Siliciano and Guthrie, 1988; Yuo and Weiner, 1989) are involved in 5’ splice site selection, suggests that these regions of homology interact with each other and/or are involved in similar protein-protein or protein-RNA interactions. The conserved sequence motifs identified in the p33 cDNAs can account in theory for some of the observed biochemical properties of SF2. For example, the RNAbinding properties of SF2 (Krainer et al, 199Oa) are fully consistent with the presenceof an RRM. Furthermore, this element is probably responsible for the ATP-independent RNA-annealing activity of SF2 (Krainer et al., 199Oa), since a homologous element is found in proteins with comparable activities, e.g., hnRNP Al (Pontius and Berg, 1990; S. Munroe, submitted). The RS domain of SF2 is especially intriguing because homologous domains are found in three of four genetically identified regulators of alternative splicing in Drosophila (Figure 8). This has led to the proposal that the RS domain may be diagnostic of splicing regulators, although its precise function remains unknown (Bingham et al., 1988; Mancebo et al., 1990). The presence of an RS domain in SF2 is consistent with this idea, as the RS domain may turn out to be responsible for the specific effects of SF2 on 5’ splice site selection. Although the presence of numerous arginines may be expected to confer RNA-binding properties, the RS domain has not been shown to be an RNA-binding domain, and several of the RNA-binding proteins in which it has been identified also contain separate RNA recognition motifs of the RNP type (Figures 8 and 9). We note that the arginine-rich motif identified as an RNA-binding element in human immunodeficiencyvirus regulatory proteins and in bacterial transcription antiterminators (Lazinski et al., 1989) differs from the RS motif in primary sequence. Further experiments are necessary to determine whether the RS domain is responsible for protein-protein or proteinRNA interactions, as well as to dissect the contributions

of the conserved sequence elements alternative 5’ splice site selection. Experimental

to splicing

and to

Procedures

Buffers Buffer A is 20 mM HEPES-Na+ pH (8.0), 5% (vollvol) glycerol, 0.1 M KCI, 0.2 mM EDTA, 1 mM DTT. Buffer B is 0.1% (wt/vol) TFA. Buffer C is 70% (vol/vol) acetonitrile, 0.085% (wtlvol) trifluoroacetic acid. Oligonucleotides Oligonucleo0des were obtained from the Cold Spring Harbor Oligonucleotide Facility or were purchased from Bio-Synthesis. Long oligonucleotides were gel purified. Desalting was carried out by reverse-phase or gel filtration chromatography. Oligonucleotide sequences are as follows (r = a+g; y = t+c; d = a+g+t): oligo#l : cttggatcccayacxgayggxgayaarg; oligo#2: ctcaagcttgtyttrtgyttytgdatytt; oligo#3: cttggatccgaggcxggxgatgtxtgtta; oligo#4: ctcaagcttttxcgxacxgcgtaxgtcat; oligo#5: gagcatatgtcgggaggtggtgt; oligo#6: cgcggatccggtaagataaccac; oligo#7: cgcggatcctgtaaactcgaggg; oligo#& gagcatatgcacaccgacggaga; oligo#9: tatgctgatgtttaccgagatggcactggtgtcgtggagtttgtacggaaagaagatatgac; oligo#lO: tttctttcctccttaatttcatcactcaggaaatcaacaaaagc; oligo#l 1: catgggccccgtagcccgagctatggccgtagccgcagccgttctcgctctcgtagccgctctcgcagccgtagcaacagccgttctcgtagctat; oligo#12: cttggatccttaggtacgagaacggctgcggctgtgacgcgggctataac~ gcggagagccacggdgcgacgcgggctatagctacgagaacggctgt; oligo#l3: acatggcggtgacgaaaagcgcgga. Pre-mRNA Substrates The wild-type and thalassemic human P-globin pre-mRNAs thesized from the BamHI-linearized plasmids SP64-Hf3A6 HPAGIVSl-1 A, respectively (Krainer et al., 1984).

were synand SP64-

HeLa Cell Extracts and Purified Fractions SF2 was purified from HeLa cells as described (Krainer et al., 199Oa) using step elution for the Mono Q column and loading of the PhenylSuperose column at 1.7 M (NH&SO+ Nuclear and cytoplasmic SlOO

extracts were prepared as described (Dignam et al., 1983; Krainer et al., 1984).

Cell 392

Splicing Reactions Standard conditions were employed for the 25 nl splicing reactions (Krainer et al., 1984) except that the amounts of nuclear extract, SIOO extract, and purified HeLa cell or recombinant SF2 indicated in the figure legends were employed. The total volume of extract and fractions was adjusted to 15 ul with buffer A. The pre-mRNA substrates were present at 0.8-3 nM. The synthesis of labeled pre-mRNA substrates, RNA extraction, and electrophoretic conditions have been described (Krainer et al., 1984; Krainer and Maniatis, 1985). Electroblotting Protein samples were electroblotted onto lmmobilon P membranes (Millipore) in 25 mM Tris base, 192 mM glycine, 20% (vol/vol) methanol in a Hoefer apparatus for 2 hr at 0.5 A, as described (Matsudaira, 1987). The samples were visualized by staining with Coomassie blue in water, and destained with water. The bands were excised and the protein was hydrolyzed in situ. Amino Acid Analysis Amino acid analysis was performed at the Cold Spring Harbor Protein Chemistry Facility. Following hydrolysis of samples in 6 N HCI (vapor phase) at 105% in vacua for 24 hr, amino acids were derivatized with phenyl isothiocyanate, and the phenylthiocarbamyl derivatives were separated by reverse-phase HPLC on a Hewlett-Packard 1090M Analytical ChemStation equipped with a diode-array detector, as previously described (Kligman and Marshak, 1985). Protease Digests A PhenyCSuperose fraction containing the p33-A and ~33-6 polypeptides from a preparation that had low amounts of p32 was used to generate proteolytic fragments. S. aureus V8 protease digestion was carried out in 25 mM NH,HCO, (pH 7.8) 0.01% (wffvol) SDS, 1 mM DTT in a total volume of 150 ul in a siliconized Eppendorf tube. The reaction contained 90 ul (5 ug) of SF2 in buffer A and 0.3 ug of V6 protease (Boehringer); incubation was for 24 hr at room temperature. Peptide Purification HPLC separation of peptides was carried out at the Cold Spring Harbor Protein Chemistry Facility. The V8 digest was diluted to 200 pl with buffer 6 and injected onto an RIP 300 (220 mm x 2.1 mm) 7 pm particles, 300 A pore size C8 column (Brownlee) at 0.1 mllmin in the above-mentioned Hewlett-Packard 1090M instrument. Peptides were eluted with a gradient from 100% buffer B to 100% buffer C in 70 min, and the eluate was monitored at 220 and 280 nm. The peaks of interest were sequenced. Amino Acid Sequencing and Degenerate Primer Design Protein sequence analysis was performed at the Cold Spring Harbor Protein Chemistry Facility by automated sequential Edman degradation on an Applied Biosystems 475 automated instrument with an online 120A PTH analyzer. Samples were applied to a Biobrene-coated glass fiber filter (ABI) and dried. All chemical cycles were performed according to the manufacturers instructions. The p32 experimentally determined N-terminal sequence corresponds to amino acids 1-41 in Figure 2, with ambiguities at positions 30,33,36, and 40. The degenerate oligonucleotides successfully used for PCR were oligo#l, which includes base pairs 4-22 (amino acids 2-8) in the sense strand; and oligo#2, base pairs 64-83 (amino acids 22-28) in the antisense strand. The p33 internal V8 peptides correspond to amino acids 144-161 in Figure 3, with ambiguities at 151 and 155, and to amino acids 167175, with ambiguity in cycle 1. Oligo#3 maps to base pairs 427-446 (amino acids 143-l 49) in the sense strand. Oligo#4 maps to base pairs 502-521 (amino acids 168-174) in the antisense strand. For oligo#3 and oligo#4, primer degeneracy was reduced by allowing for potential G-T base pairing (Martin and Castro, 1985). cDNA Synthesis with Degenerate Primers First-strand cDNA was generated from 5-10 pg of HeLa cell poly(A) mRNA. The pool of antisense degenerate oligonucleotides (oligo#2 or oligo#4; 5 pmol) was hybridized to mRNA, adjusted to standard reverse transcription conditions (Sambrook et al., 1989) in 50 pl, and incubated with 90 U of avian myeloblastosis virus reverse transcriptase (Life

Sciences) for 2 hr at 42%. The RNA was degraded by alkaline treatment, and the cDNA was recovered by phenol extraction and ethanol precipitation and resuspended in 100 ul of water. PCR Conditions Amplification reactions were in 100 1.11with l-5 ul of first-strand cDNA, 100 pmol of each degenerate oligonucleotide pool, l-2 mM MgCI,, 0.2 mM dXTPs, 10 mM Tris-HCI (pH 8.3), 50 mM KCI, 10 uglml gelatin, and 2.5-5 U of AmpliTaq polymerase (Cetus). Amplifications of cloned cDNAs were carried out with 1 ng of plasmid DNA under the same conditions as above, except that VENT polymerase and buffer (New England BioLabs) were used. The samples were overlaid with 100 ul of light mineral oil and incubated in an Ericomp Twinblock instrument. Cycling times for denaturation, annealing, and extension were, respectively, 1.5 min, 5 min, and 5 min for the first cycle, and 30 s, 4 min, and 5 min for 30-40 repetitive cycles. Final extension was for 20 min. Denaturation was at 95% annealing at 55% or 60°C, and extension at 72% Library Screening Recombinant phagefrom an amplified HeLacell cDNAlibraryinhll49 (a generous gift from C. Schneider) were plated at a density of IO5 per 15 cm petri dish in E. coli LE392. Plaque lifts were prepared and probed on duplicate filters as described (Sambrook et al., 1989). The probes were 5’ end-labeled oligo#9 and oligo#lO. The positive phage were isolated, and their DNAwasextracted bystandardmethods(Sambrook et al., 1989). Plasmid Constructions DNA purified from recombinant phage isolates was digested with EcoRI, and the inserts were subcloned in both orientations into the EcoRl site of the vector pGEMEX (Promega) to generate pGEMEXcDNA32+, pGEMEXcDNA32-, pGEMEXcDNA33-l’, pGEMEXcDNA33-i-, pGEMEXcDNA33-2+, and pGEMEXcDNA33.2(the plus sign indicates that the sense strand is in the same orientation as that of T7 gene 70). Discrete PC&lified DNA fragments generated with degenerate oligonucleotides (oligo#l-oligo#4) were purified on 8% polyacrylamide gels, digested with Hindlll and BamHI, and subcloned into the corresponding sites of pGEMEX. For bacterial expression, unique Ndel and BamHl sites were introduced into the inserts by PCR amplification of the recombinant plasmid with VENT polymerase and oligo#5 plus oligo#6 or oligo#7 plus oligo#8. The Ndel sites contain the initiating methionine codon. In the case of the p32 polypeptide, the initiating leucine CTG codon was thus mutated to ATG. The amplified fragments were purified on 1% agarose gels and subcloned into the vectors PET-SC, PET-I lc, and/or pT7A.A, to generate PETScSF2, PET1 lc-SF2, pT7A.A~SF2, and pT7A.A-32. pT7A.A is a derivative of pT7 (Johnson et al., 1987) containing a phagemid origin and NdelBamHl cloning sites (a gift from J. Kuret). PET-SC and PET-I 1 C (Novagen) are more recent vectors designed to maximize the expression of toxic proteins (Studier et al., 1990). To alter the codon usage, a synthetic DNA fragment was prepared by annealing oligo#ll and oligo#12, filling in with Sequenase II (USBC), and digesting with BamHl and Apal. The resulting fragment was used to replace the corresponding fragment in PETSc-SF2, to give PETSc-SF2RS. All subcloning steps were carried out in the strain DH5a. The expression plasmids were subsequently transformed into the strains BL21(DE3) and HMSl74(DE3). The strains also contained the pACYC-derived plasmid pNK627 (a gift from D. Roberts), which bears IecP and a tetracyclin resistance gene, or pLysS (Novagen; Studier et al., 1990) which carries the T7 lysosyme and chloramphenicol resistance genes. Transformants were selected with ampicillin, tetracycline, kanamycin, and/or chloramphenicol, as appropriate, and grown as described (Studier et al., 1990). The data shown were obtained with plasmids pT7A.A-32 and pNK627 in BL21(DE3), and with plasmids PETScSF2RS and pLysS in BL21(DE3), which gave the highest levels of expression. Nucleotide Sequencing Nucleotide sequencing was carried out with Sequenase II kits (USBC) using double-stranded DNA templates prepared by alkaline lysis and cesium chloride density gradient fractionation. Multiple synthetic oligonucleotides (18-mers or longer) were used to sequence both DNA

Structure 393

and Function

of SF2

strands, Plasmid constructs gions from double-stranded 1989).

were verified by sequencing relevant reboiling miniprep DNA (Sambrook et al.,

Isolation of Poly(A)+ RNA Polyadenylated RNA was obtained from HeLa cells grown in spinner culture by oligo(dT)-cellulose chromatography using standard procedures (Sambrook et al., 1989) or using a Fast Track kit (InVitrogen). Northern Blotting Agarose-formaldehyde electrophoresis and aqueous hybridization and washing were carried out as described (Sambrook et al., 1989). The gel was stained with ethidium bromide to localize the markers, and the RNA was transferred to 0.2 pm Optibind nitrocellulose (Schleicher and Schuell) in a vacuum blotting apparatus (Pharmacia) in 20x SSC at 50-55 cm H20. The hybridization probe was an [a-32P]UTP-labeled T3 transcript (Sambrook et al., 1989) made from Hindlll-digested pGEMEXcDNA33-I-, with a specific activity of 2 x 1O9 cpmlpg. Primer Extensions Hybridization was as described (Sambrook et al., 1989) with IO5 cpm of 5’“P-labeled antisense oligonucleotide (oligo#l3) corresponding to positions +4 to -21 in the p33 sequence (Figure 3), and 5 ug of HeLa poly(A)+ RNA, or 20 ng of T3 RNA transcript synthesized from pGEMEXcDNA33-2’digested with Hindlll. Primer extension was carried out with Superscript reverse transcriptase and buffer (BRL) for 1 hr at 45%. The cDNA was deproteinized and ethanol precipitated, and the indicated portions were analyzed by PAGE-urea. In Vitro Translation Capped mRNAs weresynthesizedfrom BarnHI-linearized pT7A.A-SF2 plasmid with T7 RNA polymerase in the presence of 7-methyl-GpppG (New England Biolabs). In some experiments, capped T3 mRNA was generated by transcription of Hindlll-linearized pGEMEXcDNA33-2+. One-half microgram of RNA was incubated with rabbit reticulocyte or wheat germ extracts in 25 pi reactions in the presence of [%]methionine and cysteine (ICN) under conditions suggested by the manufacturer (Promega). One microliter was analyzed by SDS-PAGE; the gels were stained with Coomassie blue and destained to locate the molecular weight markers, prior to fluorography with Entensify (DuPont). The corners of the dried gel were marked with phosphorescent ink (Duncan) to permit accurate alignment with the X-ray film and recording of the marker positions. Purification of Recombinant Proteins Bacterial Extracts Bacterial cultures were induced for 2-3 hr with 0.4 mM IPTG when their ODsoO reached 1. The cells were harvested, washed in 0.1 M NaCI, IO mM Tris-HCI (pH 8.0) and in some cases frozen at -70% as a cell pellet. The pellet from each liter of culture was resuspended in 30 ml of buffer A. With frozen ceils bearing the pLysS plasmid, lysis already occurred at this stage. Complete lysis and DNA shearing was achieved by sonication on ice in 10 s bursts at 1 min intervals. The lysate was centrifuged at 12,000 rpm for 20 min in an SS34 rotor (Sorvall), and the pellet was discarded. The cleared lysates containing soluble recombinant p32 or p33 were frozen in liquid NI and stored at -70°C. Purification and Renafurafion of ~33 To purify recombinant ~33, 4.1 g of solid cesium chloride was added to each 4.1 ml of cleared lysate that was first adjusted to 5 mM DTT. Each sample was pipetted under 5 ml of 1.3 g/ml cesium chloride in buffer A in a polycarbonate bottle. The samples were centrifuged at 50,000 rpm for 24 hr at 3OC in a Beckman type 75 Ti rotor. The gradients were fractionated from the top in a Buchler Autodensiflow gradient fractionator, and collected in 0.5 ml fractions at 1 mllmin. Each fraction was analyzed for protein and for nucleic acid content. The fractions containing p33 and lacking nucleic acids were pooled and dialyzed overnight against buffer A. The dialysate was spun at 10,000 rpm for 15 min in an SS34 rotor. The pellet, which contained all the ~33, was resuspended in 2 ml of buffer A per liter of original culture. The insoluble material was dissolved in SDS-containing sample loading buffer, purified by preparative SDS-PAGE, denatured in 6 M guanidinium hydrochloride, and renatured in the presence of carrier acetylated BSA

(BRL), essentially as described (Hager and Burgess, 1980). Residual guanidinium was removed from the renatured protein by ultrafiltration in Centriconunits(Amicon), as described by the manufacturer. The final sample in buffer A was frozen in liquid Nz and stored at -70°C. The overall yield was approximately 100 ng of purified protein per liter of induced culture. Sequence Analysis Protein sequences were compared to entries in the PIR (g/90) and Swissprot (8/90) data bases, and to six-frame translations of the GenBank (12/90) and EMBL (11190) data bases, using the programs FASTDB and FASTA in the lntelligenetics Suite software. Theoretical isoelectric points and regions of maximal flexibility were calculated with the programs CHARGPRO and FLEXPRO, respectively, in the PC/GENE software. Amino acid compositions and protein molecular weights were derived from the translated sequences with the PC/ GENE program AACOMP. Acknowledgments We are grateful to P. Bingham, D. Frendewey, D. Helfman, D. Horowitz, J. Keene, D. Marshak, J. Pflugrath, J. Posfai, D. Spector, and R. J. Roberts for many valuable discussions, and to J. Duffy and P. Renna for artwork. We also thank C. Schneider for the generous gift of a cDNA library, and H. Ge, J. Manley, and S. Munroe for communicating results prior to publication. A. M. was supported by a L. I. B. A. postdoctoral fellowship. This work wassupported by National Institutes of Health grants GM42699 and CA13106 to A. R. K. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC Section 1734 solely to indicate this fact. Received

May 28, 1991; revised

June 24, 1991.

References Adams, J. M., and Capecchi, M. R. (1966). N-formylmethionyl-sRNA as the initiator of protein synthesis. Proc. Natl. Acad. Sci. USA 55, 147-155. Aebi, M., and Weissmann, C. (1987). splicing. Trends Genet. 3, 102-107.

and orderliness

in

Amrein, H., Gorman, M., and Ndthiger, R. (1988).Thesexdetermining gene rra-2 of Drosophila encodes a putative RNA binding protein. 55, 1025-I 035.

Cell

Baker, 524.

B. S. (1989).

Precision

Sex in flies: the splice

of life. Nature

340, 521-

Bandziulis, R. J., Swanson, M. S., and Dreyfuss, G. (1989). RNAbinding proteins as developmental regulators. Genes Dev. 3,431-437. Bell, L. R., Maine, E. M., Schedl, P., andcline, T. W. (1988). Sex-lethal, a Drosophila sex determination switch gene, exhibits sex-specific RNA splicing and sequence similarity to RNA binding proteins. Cell 55, 1037-l 046. Bingham, P. M., Chou, T. B., Mims, I., and Zachar, Z. (1988). On/off regulation of gene expression at the level of splicing. Trends Genet. 4, 134-138. Boggs, R. T., Gregor, P., Idriss, S., Belote, J. M. and McKeown, M. (1987). Regulation of sexual differentiation in D. melanogasterviaalternative splicing of RNA from the transformer gene. Cell 50, 739-747. Brown, J. L., and Roberts, W. K. (1976). Evidence that approximately eighty per cent of the soluble proteins from Ehrlich ascites cells are N”-acetylated. J. Biol. Chem. 251, 1009-1014. Chen, G. F., and Inouye, M. (1990). Suppression of the negative effect of minor arginine codons on gene expression; preferential usage of minor codons within the first 25 codons of the Escherichia coli genes, Nucl. Acids Res. 78, 1465-1473. Chou, T. B., Zachar, Z., and Bingham, P. M. (1987). Developmental expression of a regulatory gene is programmed at the level of splicing, EMBO J. 6,4095-4104. Dignam,

J. D., Lebovitz,

R. M., and Roeder,

R. G. (1983).

Accurate

Cell 394

transcription initiation by RNA polymerase II in a soluble extract isolated mammalian nuclei. Nucl. Acids Res. 77, 1475-1489. Etzerodt, Philipson, encoding

from

M., Vignali, Ft., Ciliberto, G., Scherly, D., Mattaj, I. W., and L. (1988). Structure and expression of a Xenopus gene an snRNP protein (Ui 70K). EMBO J. 7, 4311-4321.

Fuchs, P. G., Iftner, T., Weninger, J., and Pfister, H. (1988). Epidermodysplasia verruciformis-associated human papillomavirus 8: genomic sequence and comparative analysis. J. Virol. 58, 626-634. Gardiner-Garden, brate genomes.

M., and Frommer, M. (1987). J. Mol. Biol. 196, 261-282.

CpG islands

in verte-

Ge, H., and Manley, J. L. (1990). A protein factor, ASF, controls alternative splicing of SV40 early pre-mRNA in vitro. Cell 62, 25-34. Ge, H., Zuo, P., and Manley, J. L. (1991). Primary structure human splicing factor ASF reveals similarities with Drosophila tors. Cell 66, this issue. Goralski, mination similarity

of the regula-

T. J., Edstrdm, J.-E., and Baker, B. S. (1989). The sex deterlocus transformer-2 of Drosophila encodes a polypeptide with to RNA binding proteins. Cell 56, 101 l-1018.

Grange, T., Martin de Sa, C., Oddos, J., and Pictet, R. (1987). Human mRNA polyadenylate binding protein: evolutionary conservation of a nucleic acid binding motif. Nucl. Acids Res. 75, 4771-4787. Hager, D. A., and Burgess, R. R. (1980). Elution of proteins from sodium dodecyl sulfate-polyacrylamide gels, removal of sodium dodecyl sulfate, and renaturation of enzymatic activity: results with sigma subunit of Escherichia coli RNA polymerase, wheat germ DNA topoisomerase, and other enzymes. Anal. Biochem. 109, 76-86. Hoffman, D. W., Query, C. C., Golden, B. L., White, S. W., and Keene, J. D. (1991). RNA-binding domain of the A protein component of the Ul small nuclear ribonucleoprotein analyzed by NMR spectroscopy is structurally similar to ribosomal proteins. Proc. Natl. Acad. Sci. USA 88, 2495-2499. Johnson, K. E., Cameron, S., Toda, T., Wigler, M., and Zoller, M. J. (1987). Expression in Escherichia coli of BCYI, the regulatory subunit of cyclic AMP-dependent protein kinase from Saccharomyces cerevisiae. J. Biol. Chem. 262, 8636-8642. Kligman, D., and Marshak, D. R. (1985). Isolation and characterization of a neurite extension factor from bovine brain. Proc. Natl. Acad. Sci. USA 82, 7136-7139. Kozak, M. (1987). An analysis of 5’noncoding vertebrate messenger RNAs. Nucl. Acids Res.

sequences from 1.5, 8125-8148.

699

Matsudaira, P. (1987). Sequence from picomole quantities of proteins electroblotted onto polyvinylidene difluoride membranes, J. Biol. Chem. 262, 10035-10040. Nagai, K., Oubridge, C., Jessen, T. H., Li, J., and Evans, P. R. (1990). Crystal structure of the RNA-binding domain of the Ul small nuclear ribonucleoprotein A. Nature 348, 515-520. Pontius, B. W., and Berg, P. (1990). Renaturation of complementary DNA strands mediated by purified mammalian heterogeneous nuclear ribonucleoprotein Al protein: implications for a mechanism for rapid molecular assembly. Proc. Natl. Acad. Sci. USA 87, 8403-8407. Prats, H., Kaghad, M., Prats, A. C., Klagsburn, M., Lelias, J. M., Liauzon, P., Chalon, P., Tauber, J. P., Amalric, F., Smith, J. A., and Caput, D. (1989). High molecular mass forms of basicfibroblast growth factor are initiated by alternative CUG codons. Proc. Natl. Acad. Sci. USA 86, 1836-I 840. Proudfoot,

N. (1991).

Poly(A)

signals.

Cell 64, 671-674.

Query, C. C., Bentley, R. C., and Keene, J. D. (1989). A common RNA recognition motif identified within a defined Ul RNA binding domain of the 70K Ul snRNP protein. Cell 57, 89-101. Saiki, R. K., Gelfand, D. H., Stoffel, S., Scharf, S. J., Higuchi, R., Horn, G. T., Mullis, K. B., and Erlich, H. A. (1988). Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science 239, 487-491. Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989). Molecular Cloning: A Laboratory Manual, 2nd. ed. (Cold Spring Harbor, New York: Cold Spring Harbor Laboratory Press). Saris, C. J., Domen, J., and Berns, A. (1991). The pim-7oncogene encodes two related protein-serine/threonine kinases by alternative initiation at AUG and CUG. EMBO J. 10, 655-664. Seraphin, B., Kretzner, L., and Rosbash, M. (1988). A Ul snRNA: pre-mRNA base pairing interaction is required early in yeast spliceosome assembly but does not uniquely define the 5’ cleavage site. EMBO J. 7,2533-2538. Siliciano, P. G., and Guthrie, C. (1988). 5’Splice site selection genetic alterations in base-pairing with Ul reveal additional ments. Genes Dev. 2, 1258-1267.

in yeast: require-

Smith, C. W. J., Patton, J. G., and NadaCGinard, 8. (1989). Alternative splicing in the control of gene expression. Annu. Rev. Genet. 23,527577.

Krainer, A. R., and Maniatis, T. (1985). Multiple factors including the small nuclear ribonucleoproteins Ul and U2 are necessary for premRNA splicing in vitro. Cell 42, 725-736.

Spritz, R. A., Strunk, K., Surowy, C. S., Hoch, S. O., Barton, D. E., and Francke, U. (1987). The human Ul-70KsnRNP protein: cDNA cloning, chromosomal localization, expression, alternative splicing, and RNAbinding. Nucl. Acids Res. 75, 10373-10391.

Krainer, A. R., and Maniatis, T. (1988). RNA Splicing. In Frontiers in Molecular Biology: Transcription and Splicing, 8. D. Hames and D. M. Glover, eds. (Oxford: IRL Press), pp. 131-206.

Studier, F. W., Rosenberg, A. H., Dunn, J. J., and Dubendorff, J. W. (1990). Use of T7 RNA polymerase to direct expression of cloned genes. Meth. Enzymol. 185, 80-89.

Krainer, A. R., Maniatis, T., Ruskin, B., and Green, M. R. (1984). Normal and mutant human f3-globin pre-mRNAs are faithfully and efficiently spliced in vitro. Cell 36, 993-1005.

Theissen, H., Etzerodt, M., Reuter, R., Schneider, Argos, P., Liihrmann, R., and Philipson, L. (1986). man cDNA for the Ul RNA-associated 70K protein. 3217.

Krainer, A. R., Conway, G. C., and Kozak, D. (199Oa). Purification and characterization of SF2, a human pre-mRNA splicing factor. Genes Dev. 4, 1158-l 171. Krainer, A. R., Conway, G. C., and Kozak, D. (1990b). The essential pre-mRNA splicing factor SF2 influences 5’ splice site selection by activating proximal sites. Cell 62, 35-42. Laemmli, U. K. (1970). Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature 227, 680-885. Lazinski, D., Grzadzielska, E., and Das, A. (1989). Sequence-specific recognition of RNA hairpins by bacteriophage antiterminators requires a conserved arginine-rich motif. Cell 59, 207-218. Mancebo, R., Lo, P. C. H., and Mount, S. M. (1990). Structure and expression of the Drosophila melanogaster gene for the Ul small nuclear ribonucleoprotein particle 70K protein. Mol. Cell. Biol. 70,24922502. Maniatis, T. (1991). Science 251, 33-34.

Mechanisms

of alternative

Martin, F. H., and Castro, M. M. (1985). inosine: implications for probe design. 8938.

pre-mRNA

Base pairing Nucl. Acids

C., Lottspeich, F., Cloning of the huEMBO J. 5,3209-

Wada, K., Aota, S., Tsuchiya, R., Ishibashi, F., Gojobori, T., and Ikemura, T. (1990). Codon usage tabulated from the GenBank genetic sequence data. Nucl. Acids Res. 78 (Suppl.), 2367-2411. Webster, R. E., Engelhardt, D. L., and Zinder, N. D. (1966). In vitro protein synthesis: chain initiation. Proc. Natl. Acad. Sci. USA 55, 155161. Yuo, C.-Y., and Weiner, A. M. (1989). protein particle with altered specificity an adenovirus ElA mRNA precursor.

A Ul small nuclear ribonucleoinduces alternative splicing of Mol. Cell. Biol. 9, 3429-3437.

Zachow, K. R., Ostrow, R. S., and Faras, A. J. (1987). Nucleotide sequence and genome organization of human papillomavirus type 5. Virology 758, 251-254. Zhuang, Y., and Weiner, in Ul snRNA suppresses

A. M. (1986). A compensatory base change a 5’ splice site mutation. Cell 46, 827-835.

splicing.

involving deoxyRes. 73, 8927-

The accession number SF2 p33 is M69040.

for the SF2 p32 sequence

is M89039;

that for

Functional expression of cloned human splicing factor SF2: homology to RNA-binding proteins, U1 70K, and Drosophila splicing regulators.

SF2 is a protein factor essential for constitutive pre-mRNA splicing in HeLa cell extracts and also activates proximal alternative 5' splice sites in ...
2MB Sizes 0 Downloads 0 Views