Cell, Vol. 7, 279-288,

February

Enzymatic

1976,

Copyright

Q 1976 by MIT

in Vitro Synthesis

Argiris Efstratiadis, Fotis C. Kafatos, Allan M. Maxam, and Tom Maniati@ Biological Laboratories Harvard University Cambridge, Massachusetts 02138

and

Summary Full-length, single-stranded rabbit globin cDNA, synthesized by AMV reverse transcriptase, apparently contains a small double-stranded sequence (hairpin) at the 3’ terminus. This cDNA can serve as template-primer for E. coli DNA polymerase I, which synthesizes a strand complementary to the cDNA and covalently bound to it. The loop connecting the two strands can be cut by Sl nuclease. Reassociation, hybridization, and restriction endonuclease studies, as well as electrophoretic analyses, indicate that the sequential actions of reverse transcriptase, DNA polymerase I, and Sl nuclease generate full-length, doublestranded synthetic globin genes. Introduction Following the initial synthesis of reverse transcripts (cDNA) of globin mRNA (Verma et al., 1972; Kacian et al., 1972; Ross et al., 1972), reverse transcription has proved exceedingly useful for a variety of studies, despite the variable length and quality of the products. Recently we reported the synthesis of complete reverse transcripts of globin mRNA (Efstratiadis et al., 1975). We now describe the use of this cDNA as intermediate for the enzymatic, in vitro synthesis of double-stranded, full-length gtobin genes (defined as sequences coding for mRNA). Other investigators have shown that cDNA can contain double-stranded regions (for example, Fujinaga et al., 1970; for review see Green and Gerard, 1974). Moreover, a significant proportion of this double-stranded material rapidly regains resistance to single-strand specific nucleases after denaturation (Taylor et al., 1972; Green and Gerard, 1974). The latter property reveals the existence of intramolecular base-paired regions (hairpin sequences) which could be internal or terminal. Leis and Hurwitz (1972) have postulated that a 3’ terminal hairpin structure is generated in the course of reverse transcription, and that the short arm of this cDNA hairpin is further extended by reverse transcriptase. Such a mechanism could form double-stranded provirus DNA sequences in vivo. “Also Cold Spring York 11724.

Harbor

Laboratory,

Cold

Spring

Harbor,

New

of Globin Genes

We have shown that our full-length, single-stranded globin cDNA contains a small fraction of hairpin sequences (Efstratiadis et al., 1975). We reasoned that if these hairpins are at the 3’ terminus of the cDNA, they might serve as an ideal primer for the synthesis of a complete second strand by E. coli DNA polymerase I (pol I); a similar activity of pol I has been demonstrated using synthetic hairpin templates (Kleppe et al., 1971). Such a reaction would convert the complete reverse transcript into a double-stranded, full-length gene, with the two strands covalently linked at one end (“hairpin gene”). This link could be severed specifically by Sl endonuclease under appropriate conditions, thus generating an “open” globin gene. The results presented below demonstrate that this objective has been accomplished. Results The Size of Hairpins in Full-Length cDNA Under our conditions of reverse transcription, synthesis of a second DNA strand is extremely limited. The zero-time Sl resistance which corresponds to hairpin structures is only 2-8% (Efstratiadis et al., 1975). This observation does not distinguish between long hairpins in a small fraction of cDNA molecules and short hairpins in a large fraction of the molecules. To decide between the two alternatives, total cDNA prepared in the absence of actinomycin D was treated with Sl nuclease under mild conditions (high salt, 37”C, 45 min). The product was displayed on a denaturing 15% polyacrylamide-98% formamide gel, together with markers and undigested cDNA. The Sl-resistant fragments were very small, in the range of approximately 13-22 NT (Figure 1). Pol I Synthesizes a Second Strand (sDNA) Covalently Bound to cDNA Total 32P-cDNA was purified free of mRNA and used as the template in a DNA polymerase I reaction, in the presence of 3H-dCTP, without exogenous primer. The conditions of the reaction were chosen to suppress all nucleolytic activities of pol I (Kleppe et al., 1971). As seen in Figure 2, nucleotide incorporation did occur and approached a plateau at 6 hr. During the reaction, no loss of TCA-precipitable 32P radioactivity was observed, indicating that the cDNA was not subject to 5’ exonucleolysis. The products of polymerase reactions using 32PcDNA and 3H-labeled or unlabeled dNTPs were analyzed by electrophoresis on denaturing formamide polyacrylamide gels, together with the original cDNA and 32P-XDNA markers. Figure 3a shows typi-

Cell 280

cal results. As previously reported (Efstratiadis et al., 1975) the original cDNA is resolved into a number of discrete bands, the largest being 650 * 50 NT. The complete reverse transcript is the 650 NT band; for convenience, we will designate this size “single-length”. After the pol I reaction, 3*P-labeled material of even higher molecular

weight appears, including a band at approximately 1200 NT (“double-length”). Since the only 3*P in this reaction is in the cDNA and since the products are analyzed under denaturing conditions, we conclude that the double-length molecules are generated by a pol l-dependent extension of the cDNA. When the double-length product was recovered from a formamide gel, denatured, reassociated at an estimated Cot of only 3 x 10-6, and chromatographed on HAP (see Experimental Procedures), 83% of the material was bound, indicating the presence of extensive double-stranded hairpin regions. Under similar conditions, only 11% of the cDNA bound to HAP. We conclude that the cDNA and the second strand produced by pol I (sDNA) are covalently attached. The Loop of the Hairpin Can Be Cut by Sl Nuclease When 3*P-labeled cDNA is treated with Sl nuclease, 92-98% is rendered acid-soluble and electrophoresis on polyacrylamide gels reveals no residual single-length material (see Figures 1 and 4; Efstratiadis et al., 1975, Figure 6). By contrast, when the pol l-treated cDNA is similarly exposed to Sl and analyzed under denaturing conditions, a prominent band of single-length material is observed (Figure 3a). Thus sDNA protects cDNA sequences from Sl digestion, and in such a way that strands equal to the mRNA length are spared, evidently because they are part of a duplex structure. Bands of smaller

8

i

0 -_------

8 _____ e

t ‘0 x

0

i L x

. /

. /

Figure 1. Autoradiogram of 32P-cDNA Polyacrylamide Gel in 98% Formamide

Electrophoresed

on a 15%

Slot 1: 32P-DNA markers from top to bottom 230, 175, 77, and 27 nucleotides (NT). The 230 NT marker is a h-DNA Hind II and III fragment; the rest are Hae Ill (175 NT) and Alu I (77 and 27 NT) lac operon restriction fragments (W. Gilbert, J. Gralla, and A. Maxam, manuscript in preparation). Slot 2: Sl nuclease digestion products of an aliquot of 32P-cDNA synthesized in the presence of all four dNTPs at 200 PM (final SA 5 Ci/mmole each) and purified by gel filtration through Sephadex G-l 50. Slot 3: Aliquot of undigested 3ZP-cDNA.

li Figure

2

2. Time

T $

0

0 ---_-_~----------___

Course

4 Time

6

s

(hr)

of Second-Strand

(sDNA)

Synthesis

32P-cDNA (1400 pmoles, synthesized in the presence of all four dNTPs at a concentration of 630 pM, final SA 1.8 Ci/mmole each) was used as the template in a standard E. coli DNA polymerase I reaction mixture (see Experimental Procedures). All four dNTPs were used at a concentration of 330 pM. The radioactive dNTP was 3H-dCTP (SA 25 Ci/mmole). At the times shown, 0.5 PI aliquots were precipitated with TCA and counted.

Enzymatic 281

in Vitro

Synthesis

of Globin

Genes

size are also observed, but no material exceeding 650 NT. We also examined single- and double-stranded globin DNA on nondenaturing polyacrylamide gels. In these gels, single-stranded DNA migrates more slowly than the corresponding double-stranded DNA (Maniatis, Jeffrey, and van desande, 1975). Thus full-length cDNA should show mobility characteristic of 650 NT single-stranded DNA; the doublelength hairpin gene produced by pol I and the single-length open gene produced by Sl treatment should migrate more rapidly, each as a 650 nucleotide pair duplex. Figure 3b shows that bands with the expected mobilities are observed in the respective samples. It is particularly noteworthy that the Sl product is devoid of 650 NT single-stranded material, but includes a prominent component which behaves as a 650 nucleotide pair duplex; additional components correspond to shorter duplex lengths. To demonstrate further the origin of the “open” 650 nucleotide pair duplex, the pol I and Sl reactions were carried out with purified globin DNAs of defined size (full-length cDNA and double-length cDNA-sDNA, respectively). Full-length 32P-cDNA was purified through two cycles of electrophoresis, first .on a formamide gel and then on an aqueous gel, and then used in the pol I reaction (with unla-

beled dNTPs). The products of the reaction were displayed on an aqueous gel, and the material which migrated as a 650 nucleotide pair duplex was recovered and treated with Sl nuclease. The products of the synthetic reactions, the material selected at each step, and the final Sl product were analyzed together on a formamide gel (Figure 4). Clearly, the single-length cDNA is converted by pol I to double length, and that in turn is converted to single length by Sl nuclease. In the experiment of Figure 4, the double-length material (hairpin gene) was contaminated with single-stranded DNA of shorter length; this was expected (Maniatis et al., 1975) since the hairpin gene was recovered from an aqueous polyacrylamide gel. The single-stranded nature of the contaminant was verified by its disappearance upon Sl treatment (Figure 4). In a repeat of this experiment, the hairpin gene was purified from a formamide gel. In this case, the DNA was uniform in size and was quantitatively converted to single-length size by Sl treatment (Figure 5a).

Figure 3b. Autoradiogram under Native Conditions

Figure 3a. Autoradiogram of 3ZP-Globin DNA Electrophoresed a 5% Polyacrylamide Gel in 98% Formamide

on

Slot 1: 32P-DNA markers. From top to bottom 1190, 1090, and 580 NT (h-DNA Hind II and Ill fragments). Slot 2: ‘ZP-cDNA synthesized in the presence of all four dNTPs at 430 FM (final SA 3 Ci/mmole each). Slot 3: Products of a pol I reaction, in which 970 pmoles of the cDNA shown in slot 2 were used as template. All four dNTPs were present at 110 PM (SH-dCTP, SA 25 Ci/mmole, was included). Slot 4: As slot 3, but after treatment with Si nuclease. Dots indicate the positions of full-length cDNA (slot 2), double-length hairpin gene (slot 3) and open gene (slot 4).

of 32P-Globin DNA on a 4% Polyacrylamide

Electrophoresed Gel

Slot 1: J2P-cDNA synthesized in the presence of all four dNTPs at 420 PM (final SA 1.5 Ci/mmole each). This slot is from a different gel than the rest and is shown here for comparison. Slot 2: Full-length a2P-cDNA recovered from a formamide gel. The cDNA was synthesized in the presence of all four dNTPs at 425 pM (final SA 2.4 Ci/mmole). Slot 3: Products of a pol I reaction, in which 3000 pmoles of total 32P-cDNA (synthesized as for slot 2) were used as template. The dNTPs were unlabeled and at 1 mM. Slot 4: As slot 3, but after treatment with Si nuclease. Dots indicate the position of full-length cDNA (upper, slots 1, 2, and 3) and of double-stranded, full-length gene (lower, slots 3 and 4). For optimal display, different exposures have been used for certain slots in this and subsequent figures.

Cell 282

Figure 4. Autoradiogram of J*P-Globin DNA a 5% Polyacrylamide Gel in 98% Formamide

Electrophoresed

on

Slot 1: 3ZP-cDNA (synthesized as described in Figure 3b, slot 2). Slot 2: Full-length 3*P-cDNA (same material as in Figure 3b, slot 2). Slot 3: Products of DNA polymerase I reaction, in which the fulllength 32P-cDNA of slot 2 (250 pmoles) was used as template. The dNTPs were unlabeled and at 1 mM. Slot 4: Double-length hairpin gene extracted from a 4% aqueous gel following electrophoresis of an aliquot of the material of slot 3. The lower band is a single-stranded contaminant due to the method of purification. In aqueous gels, double-stranded DNA overlaps with single-stranded DNA of shorter length (Maniatis et al., 1975). Slot 5: As slot 4, but after digestion with Sl nuclease. Note that the single-stranded contaminant visible in slot 4 has been completely digested. Slot 6: As slot 2, but after digestion with Sl nuclease. Note the absence of labeled material, indicating that the cDNA has been completely digested.

In the experiments discussed thus far, the 32Plabel was carried exclusively by the cDNA. In additional experiments, 32P-dNTPs were also used in the pol I reaction. Figure 5b shows typical results, from an experiment in which full-length cDNA was the pol I template. The products included a prominent double-length band plus material ranging from 650 to 1200 NT. No unexplained pol I products were evident; all products had the mobilities expected for partially or fully elongated cDNA. Additional Evidence That the Open Gene Is Generated by Sl Scission of a Hairpin Structure: Restriction Endonuclease Studies If in fact we have synthesized a double-stranded gene which exists in either a hairpin or an open form (before and after Sl treatment, respectively), the material should be susceptible to certain restriction endonucleases. Moreover, the loop of the hairpin should be identifiable when the restriction fragments of the two forms are fractionated in parallel. The full-length hairpin and open genes were purified by electrophoresis in an aqueous, nondenaturing gel (Figure 3b). After recovery from the gel, aliquots were treated with restriction endonucleases,

a.

b.

Figure 5. Autoradiogram of 32P-Globin DNA a 5% Polyacrylamide Gel in 98% Formamide

Electrophoresed

on

(a) Slot 1: Double-length DNA extracted from the gel of Figure 3a, slot 3, and purified by DEAE-cellulose chromatography. Slot 2: As slot 1, but after treatment with Si nuclease. (b) Slot 1: 32P DNA markers. From top to bottom, 900, 580, and 340 NT (X-DNA Hind II and Ill fragments). Slot 2: Full-length globin cDNA (indicated by a dot), extracted from a preparative formamide gel and purified by poly(A)-Sepharose chromatography. The cDNA was synthesized in the presence of all four dNTPs at 430 pM (final SA 4.5 Ci/mmole each). Slot 3: Products of a pol I reaction, in which 130 pmoles of the full-length cDNA shown in slot 2 were used as template. All four dNTPs were 32P-labeled and present at 10 PM (SA 100 Ci/mmole each). The dot indicates the double-length hairpin gene.

and the products were analyzed by electrophoresis on formamide gels. Three enzymes were effective: Alu I, Hae III, and Hinf. Figure 6 shows the restriction patterns. When treated with the same enzyme, the hairpin and open genes invariably had at least one band in common; most importantly, they differed in one and only one band. In each case, the band characteristic of the hairpin gene was replaced with a band of half that length in the open gene. As Table 1 shows, the radioactivity in these two bands was comparable. Considering all the restriction fragments, the percentage of radioactivity and the percentage of single-stranded length were approximately proportional, with one important exception-the band characteristic of the hairpin gene had double the length predicted on the basis of radioactivity. These results are exactly consistent with the predictions. The band characteristic of the hairpin gene consists of the 3’ region of cDNA and the 5’ region of the sDNA, covalently attached. The

Enzymatic 283

Table

in Vitro

Synthesis

1. Restriction

of Globin

Fragments

Genes

of Hairpin

and Open

Globin

Genes0 Hae III

Alu I

Type of Fragment

Hairpin

Hairpin

NT cm

Cleaved Hairpin

NT wm

Distal

Ii

NT

Open

280 (47%) 2700 (59%)

270 (45%) 2700 (47%)

cw

Hairpin

Open

360 (2 x 29%) 1600 (30%)

I a0 (29%) 2400 (35%)

130 (21%) 1400 (23%)

330 (55%) 3000 (53%)

NT I

Hinf

Hairpin 260 (2 x 21%) 1400 (25%)

660 (2 x 55%) 1900 (41%)

wm

Distal

Open

360 (57%) 2800 (49%)

360 (57%) 3100 (51%)

110 (17%) 950 (18%)

110 (17%) 1600 (23%)

140 (22%) 1500 (26%)

140 (22%) 1600 (26%)

340 (54%) 2800 (52%)

340 (54%) 2900 (42%)

and after Sl treatment) are OFor identification of the fragments, see Figure 6. 32P-labeled hairpin and open forms of the genes (before indicated. Two rows correspond to each fragment produced by the indicated restriction enzyme. In each case, the top row shows the denatured length in nucleotides and (within parenthesis) the length as percentage of the open gene length; the bottom row shows the radioactivity in cpm and (within parenthesis) the radioactivity as percentage of the total in all fragments of that sample. The radioactivity was determined after immersing the gel piece in Aquasol. Only the cDNA strand was 32P-labeled in this experiment.

Alu I

Figure 6. Autoradiogram on a 5% Polyacrylamide

Hae III

of Restriction Fragments Gel in 96% Formamide

Hinf

Electrophoresed

Hairpin (H) and Open (0) forms of globin genes were extracted from the gel of Figure 3b (slots 3 and 4, respectively). Aliquots of each sample were digested with Alu I, Hae Ill, or Hinf. In each slot, a dot indicates the band characteristic of the hairpin or open form, respectively. The faint band seen in slot Sl is a single-strand contaminant (see also Figure 4, slot 4).

loop of the hairpin is cleaved by Sl nuclease, and the half-length band characteristic of the open gene is thus generated without a significant loss of radioactivity. The bands which hairpin and open genes have in common are all distal fragments, which do not include the site of the loop. The good agreement of lengths in the hairpin and open gene forms indicates that under our conditions, Sl cleaves only the loop and does not significantly digest the opened gene. The hairpin loop is a valuable reference point for ordering the restriction sites of the globin genes. According to Table 1, the genes have restriction

sites for Hae III, Hinf, and Alu I at distances of approximately 130, 180, and 330 nucleotides, respectively, from the loop. For each of two enzymes, Hae III and Hinf, one additional site also exists. Characterization of the cDNA and sDNA Strands of the Open Gene The experiments described above strongly indicate that the sequential actions of reverse transcriptase, pol I, and Sl yield globin genes. The two strands of the open genes were further characterized by reassociation and hybridization experiments. As in Figure 3a, total 3*P-cDNA was synthesized, extended with pol I in the presence of 3H-dCTP, and treated with Sl The resistant material was displayed on an aqueous polyacrylamide gel, and the prominent band which migrated as a 650 nucleotide pair duplex was recovered (see Figure 3b). Because of the Sl treatment, this material was totally free of singlestrand contaminants (see Figure 4, slot 5); when comparable material from a separate experiment was denatured and analyzed on a formamide gel, 32P and 3H (cDNA and sDNA, respectively) were coincident, each showing a mobility characteristic of 650 nucleotides. The double-stranded open gene was used in reassociation and hybridization studies (Table 2). The material contained nearly equal amounts of cDNA and sDNA (reverse transcriptase and pol I products, respectively; 3H/3*P molar ratio 0.86 * 0.08 in three assays). It was fully resistant to Sl digestion (both isotopes). After heating to 100°C for 3 min, both 32P-cDNA and 3H-sDNA failed to bind to HAP, as expected. When the denatured material was reannealed to an equivalent Cot of 0.13, it renatured so that it was fully retained by HAP. However, when a vast excess of globin mRNA was present during reannealing, only the 32P-cDNA was found in duplex molecules; the 3H-sDNA remained single-stranded (Table 2).

Cell 284

Table

2. Reassociation

and Competition

Hybridization

First Strand

(cDNA;

of the Globin cpm

Nonbound No Reassociation (Zero Time) Reassociationb (16 Hr) Competition Hybridizationc (16 Hr)

Gene

Strands

Assayed

by HAP Chromatographya

32P)

1580(909/a)

Second Bound

Strand

(sDNA;

cpm

3H)

Nonbound

170(10%)

Bound

3050 (91%)

300(9%) 3650(97%)

29(2%)

1710(98%)

95(3%)

59(3%)

1740(97%)

3810(94%)

250(6%)

032P-cDNA was used as a template for the synthesis of a 3H-labeled second strand by pol I, as described in the legend to Figure 2. The products of the polymerase reaction were treated with Sl nuclease, and the double-stranded open globin gene was recovered from an aqueous gel and used for reassociation. Samples were denatured and assayed by HAP chromatography (see Experimental Procedures) either directly (zero time) or after reassociation or hybridization for 16 hr at 66°C. Cot or Rot values were converted to their “equivalents” by multiplying by a factor of 4.9 (Britten et al., 1974). bThe sample was reassociated to a Cot of 0.13. CThe sample included 2 pg globin mRNA and was hybridized to a Rot of 288.

In summary, the cDNA and sDNA strands are separable after Sl treatment; they are essentially equal in amount and length; they are fully complementary, as shown by their Sl resistance; and they are either complementary (cDNA) or synonymous (sDNA) with globin mRNA, as shown by the HAP binding data. We conclude that they are the two strands of synthetic globin genes. The Nature of Intermediate Length Polymerase Products In addition to double-length hairpin genes, pol I yields products intermediate between 650 and 1200 NT, even when full-length cDNA is used as the template (Figures 4 and 5b). When total cDNA is the template, pol I products shorter than 650 NT are also observed (this was shown in an experiment in which 3H-dNTPs were used for reverse transcription and 32P-dNTPs in the pol I reaction; the observation suggests that partial cDNAs may also terminate in a 3’ hairpin). Since the pol I reaction is not accompanied by loss of the cDNA radioactivity (Figure 2) we conclude that the 650-1200 NT intermediates result from incomplete extension, rather than from progressive 5’ exonucleolysis of the cDNA. This is verified in Figure 7: the intermediates between 1200 and 650 NT have progressively lower sDNA/cDNA ratios, and the ratios are always lower than that of the full hairpin gene. As Figure 7 implies, the efficiency of conversion of full-length cDNA to hairpin gene will appear on a gel to be greater if the second strand is 32Plabeled, instead of the first strand (compare, for example, Figures 4 and 5b). Even if the same strand is followed, however, the efficiency appears variable when full-length cDNA is used as the template (possibly because of the purification steps). More consistent results were obtained when the enzymatic reactions were carried out in a sequence uninter-

.

1300NT

I

- 6. 0

A,*.

. \

I’ y’ .

5.0

4.0

*

I

.

j’

-3.0

- 4. i .a ? e * 5 ”

- 3.

nr .2.0

650 L

, awT 0

5 SLICE

Figure 7. Distribution Polymerase Products

10 NUMBER

NT 5 1.0

/

15

of Radioactivity

in the Intermediate

Lcmgth

‘ZP-cDNA (200 pmoles, synthesized in the presence of all four dNTPs at 350 PM, final SA 5.7 CVmmole each) was used as template in a pol I reaction. In this second step, all four dNTPs were used at a concentration of 110 PM; the radioactive dNTPs were SH-dATP (SA 12.5 Ci/mmole) and SH-dCTP (25 Ci/mmole). The products were analyzed on a 5% polyacrylamide gel in 98% formamide. The region of the gel corresponding to 650-1300 NT was cut into 1 mm slices, which were incubated for 12 hr at 45’C in 4 ml scintillation fluor containing 3% Protosol (NEN), and counted.

rupted by electrophoretic separation steps. Thus for preparative purposes, the synthesis should begin with total cDNA, and the open gene should be purified from an aqueous polyacrylamide gel after the pol I and Sl reactions. The highest yields have been

Enzymatic 285

in Vitro

Synthesis

of Globin

Genes

obtained in experiments using high dNTP concentrations in both the reverse transcriptase and the pol I reactions (Figure 3b). Yields of gene as high as 4% of the input mRNA have been obtained in this manner. The 650-1200 NT intermediates from an experiment similar to that shown in Figure 5b were recovered from the gel and treated with Sl nuclease. As expected, no 650 nucleotide pair open gene was generated-only duplex structures of smaller size (comparable to those seen in Figure 3b, slot 4). We interpret these structures as partial genes, extending from near the 5’ end of the mRNA to variable distances toward the 3’ end. Such overlapping gene fragments would be useful for sequencing studies. Discussion The globin gene is the first eucaryotic gene to be synthesized in vitro. It is also the first well defined double-stranded DNA gene to be produced using RNA as the initial template. The sequence of enzymatic reactions described in this paper should prove to be a general method for synthesizing genes, starting from purified mRNA. Since specific mRNAs can be isolated by precipitation of polysomes with antibody directed against the proteins for which they code (Shapiro et al., 1974), specific double-stranded DNA sequences can now be obtained in principle by the reverse progression from protein to DNA. This procedure, coupled with methods for molecular cloning of DNA, should provide specific probes in substantial amounts, and thus should greatly facilitate studies on the organization and expression of the eucaryotic genome. We have recently succeeded in integrating synthetic globin genes into a bacterial plasmid (T. Maniatis et al., manuscript in preparation). Synthesis of genes and their cloning in plasmids are important in four major respects. Amplification of Specific Probes DNA sequences can be obtained by this procedure in amounts far in excess of what could be produced by reverse transcriptase alone. This should greatly facilitate all studies in which cDNA is now used (for example, sequencing, production of highly labeled RNA by in vitro transcription, chromatin studies, and so on). Amplification would be particularly important when the mRNA itself is not very abundant. Purification of Specific Sequences By its very nature, molecular cloning can be used to purify to homogeneity a particular nucleic acid sequence. This would be especially important for obtaining specific probes out of impure mRNA preparations or out of a family of mRNAs, such as

the histone messages, keratin messages, or the mRNAs for the 50 or more closely related proteins which make up the insect chorion (Kafatos, 1975). Restriction Mapping As shown in Figure 6 and Table 1, restriction mapping of synthetic genes can be easily performed. The hairpin form of the gene is particularly useful in this respect. After cloning the gene, specific fragments from various regions of it could be identified by comparison to the restriction map of the synthetic gene. These fragments could be obtained in large amounts and used for sequencing and for studying genome and chromatin organization. Safety Considerations The availability of pure defined genes should enhance the acceptability of work with eucaryotic DNA. DNA from the primary clone of a synthetic gene could be recloned repeatedly, to eliminate any possibility that extraneous eucaryotic or animal viral DNA is present (being derived from a minor mRNA contaminant and double transfection in the original cloning). For subsequent work, the amplified specific probe can be used to identify sequences which flank the gene in chromosomal DNA. In a series of restriction digestion and gel purification steps, it would be possible to isolate a restriction fragment, one end of which is within and the other a short distance outside the structural gene. This small fragment could then be recovered from the gel and cloned. It is clear that this experimental design is substantially safer than the alternative “shotgun” approach, in which random clones of chromosomal DNA are generated and the sequence of interest is selected secondarily. An important question is whether the synthetic gene preparation consists of both (Y and /3 genes. After digestion with any one of three restriction enzymes, the major products included only one hairpin terminal fragment, and accounted for only 600630 base pairs. These observations suggest that this synthetic gene preparation was derived predominantly from only one of the globin mRNA types. 1251 labeling of the mRNA preparation used as the original template in the present work, followed by deadenylation (Vournakis, Efstratiadis, and Kafatos, 1975) and electrophoresis on a formamide gel, revealed that N- and /!I-globin mRNA were both present at an approximate cpm ratio of 1 to 1.6. Moreover, we have previously shown that when cDNA and ‘*5l-globin mRNA are hybridized at a mass ratio of 4:1, the mRNA becomes completely resistant to Sl nuclease digestion (Efstratiadis et al., 1975); this indicates that the cDNA represents in substantial amounts both the (Y- and P-globin mRNA sequences. Thus predominant synthesis of

Cell 286

one gene may occur during the production of the second strand (sDNA). It is possible that the efficiency of hairpin formation at the 3’ end of cDNA, or the hairpin length, is sequence-dependent. Short hairpins are known to be inefficient primers (Kleppe et al., 1971). Our conditions, however, do permit the synthesis of both globin genes, although at apparently different efficiencies. Among different plasmid clones derived from synthetic genes and hybridizing with (Y- or /3-globin mRNA probes, we have identified two different types of globin DNA insertion (T. Maniatis et al., manuscript in preparation). These two types of inserted DNA exhibit different restriction endonuclease digestion patterns. One of the two types shows a restriction digestion pattern consistent with that documented in Figure 6 and hybridizes with P-globin mRNA. The synthetic genes, of course, include only sequences which are represented in the message; sequences which are adjacent to the gene but either are not transcribed or are eliminated post-transcriptionally would necessarily be absent. We have previously shown that the 5’ end of the cDNA includes only a short poly(dT) sequence, comparable to the poly(A) of the mRNA (Efstratiadis et al., 1975). Thus within an experimental uncertainty of no more than 1 O%, the length of the open gene, which is identical to that of the full-length cDNA and the mRNA itself, suggests that the entire mRNA sequence is present in the synthetic gene. The complete Sl resistance of the gene preparation, the hybridization of the cDNA strand with globin mRNA, the competition between mRNA and the sDNA strand, and finally the susceptibility to restriction enzymes and the simplicity of the resulting patterns-all indicate that the enzymatic synthesis generates globin genes of high fidelity. Experimental

Procedures

Purification of Globin mRNA Rabbit globin mRNA was isolated as described (Neinhuis, Falvey, and Anderson, 1974), with minor modifications. Total RNA was passed over an oligo(dT)-cellulose (Searle) column and further purified by sucrose gradient centrifugation and a second oligo(dT)cellulose column (Efstratiadis and Kafatos, 1976). The mRNA eluted from oligo(dT)-cellulose at this stage is devoid of any recognizable RNA contamination: a radio-iodinated aliquot (Efstratiadis et al., 1975) migrated as one band in a 6% polyacrylamide gel containing 7 M urea, with the expected mobility of 9s RNA. The same material was resolved into two bands (a- and P-globin mRNA; see Morrison et al., 1974) after prolonged (10 hr) electrophoresis in a 5% polyacrylamide-98% formamide gel. The resolution of the two bands was further improved by deadenylation (treatment with RNAase H; Vournakis et al., 1975). Purification of Reverse Transcriptase Avian myeloblastosis virus in chicken plasma was supplied by Dr. M. A. Chirigos (Virus Cancer Program, National Cancer Institute, Bethesda, Maryland) through Dr. J. W. Beard (Life Sciences Inc., Gulfport, Florida). Reverse transcriptase was purified (Marcus,

Modak, and Cavalieri, 1974) by affinity chromatography on poly(rC)-Sepharose (a gift from Dr. S. Marcus). The enzyme (0.05 units per ~1; Kacian and Spiegelman, 1974) was stored in liquid Nz in the elution buffer [lo mM K phosphate (pH 8) 1 mM DTT, 0.2% Nonidet P40, 10% glycerol, and 0.5 M KCI]. Enzymatic Synthesis of cDNA The standard reaction mixture (25 pl) consisted of the following components: 50 mM Tris-HCI (pH 8.3) 10 mM MgCb, 30 mM 2mercaptoethanol, 800 pg/ml BSA (Pentex), labeled and unlabeled dNTPs as indicated in the figure legends, 100 pg/ml mRNA, 100 pg/ml oligo(dT)12--18 (Collaborative Research), and 20 units per ml reverse transcriptase. The enzyme storage buffer contributed 180 mM KCI. The reaction mixture was heated at 80°C for 1 min and rapidly cooled on ice prior to the addition of BSA and enzyme; it was then incubated at 42’C for 3 hr. For preparative purposes, the reaction mixture was scaled up 4-8 fold. Aliquots were precipitated with TCA and counted for determination of yields. When “HdNTPs (NEN) were used as substrates, 40-50% of the input template was reverse-transcribed into cDNA. Yields up to 40% were also achieved in some reactions using fresh 3zP-dNTPs (NEN). In a 25 pl reaction mixture, however, no more than 200 &i 32P could be used: a further 2 fold increase in radioactive material produced complete inhibition of the enzyme despite the presence of protective protein (BSA) (see also Harrison, Hell, and Paul, 1972). To free the transcript from its mRNA template, the reaction mixture was incubated 12-16 hr with 0.3 M NaOH at 37”C, neutralized, extracted with phenol and chloroform, and passed through a Sephadex G-150 column (0.7 x 30 cm) in deionized HzO; the excluded material was made 0.1 M in Na acetate (pH 5), and ethanol-precipitated without addition of carrier RNA. Boiling in alkali should be avoided, since it can lead to some degradation of the cDNA (this has also been observed with the AMV endogenous reaction; E. Rothenberg, personal communication). For rapid destruction of the RNA template, a combination of RNAases H, A, and T, can be used. RNAase H was a gift from Dr. J. G. Stavrianopoulos. Table 3 shows that when these enzymes are added directly to the reverse transcription reaction mixture, they degrade completely both free and DNA-bound RNA. RNAases A and TI are sufficient for degrading free RNA, but it is advisable to include RNAase H as well, since we have shown (Efstratiadis et al., 1975) that the transcript is partially bound to its template. Table 3 shows that RNAase H is effective under these conditions, even though they are not optimal. 1’25-globin mRNA is degraded completely under

Table 3. Action of RNAases on Free Conditions of Reverse Transcriotion

and Hybridized

Enzvme

cpm

None RNAases RNAase RNAases

‘H

2860 A and Tlb H’ A, Tl , and Hd

RNAa,

under

cpm

3*P

6510

2740

0 0 0

7050 0

asynthetic poly(dT)-poly (SH-rA) hybrid and single-stranded 32PRNA [oligo(dT)-cellulose nonbound silkmoth polysomal RNA] were included in a 25 ~1 reaction mixture simulating reverse transcription conditions [50 mM Tris-HCI (pH 8.3), IO mM MgCb, 140 mM KCI, and 30 mM 2-mercaptoethanol]. After enzyme treatment (37°C 30 min), aliquots were precipitated with TCA, and the cpm were normalized to the original volume. bl ~1 each of RNAases A and Tl (10 mg/ml each) were added to the reaction mixture. CThe reaction mixture was supplemented with 10 pl of 1 :I 00 dilution of RNAase H stock solution (Stavrianopoulos and Chargaff, 1973: Sippel et al., 1974; Vournakis et al., 1975). dAddition as in (b) and (c), combined.

Enzymatic 287

in Vitro

Synthesis

of Globin

Genes

these conditions. Moreover, reverse transcripts treated mildly with alkali or with RNAases show identical electrophoretic patterns. Samples of a reverse transcription reaction tested for Sl nuclease resistance before and after treatment with RNAases showed 83% and 14% resistance of the cDNA, respectively. Enzymatic Synthesis of sDNA Complementary to cDNA The standard reaction mixture (20 ~1) consisted of the following components: 0.12 M K phosphate (pH 6.9) 10 mM MgCl2, IO mM DTT (Kleppe et al., 1971; Olson et al., 1975; Kleid, Agarwal, and Khorana, 1975), labeled and unlabeled dNTPs as indicated in the figure legends, cDNA template, and 15-30 units of E. coli DNA polymerase I. Polymerase I (a gift from Dr. W. R. McClure) was purified as described by Jovin, Englund, and Bertsch (1969) except that a DNA-Sepharose 48 column was used instead of the phosphocellulose column of the original procedure (Arndt-Jovin et al., 1975). The reaction mixture was assembled on ice and incubated at 15°C for 6 hr. The product was extracted with phenol and chloroform, passed over Sephadex G-150, and ethanol-precipitated after addition of 50 kg carrier tRNA. Sl Treatment The product of the polymerase reaction was treated with Sl nuclease, purified from a-amylase from Aspergillus oryzae (Sigma) according to Britten, Graham, and Neufeld (1974). The reaction mixture (40 /rl) consisted of 0.2 M NaCI, 0.05 M Na acetate (pH 4.5), 1 mM ZnCl*, 50 bg/ml sonicated and heat-denatured phage DNA, the double-stranded globin DNA, and 3 units (Vogt, 1973) of Sl nuclease. The mixture was incubated at 37°C for 45 min, phenol-extracted, added to 1 ml 2 M NH4 acetate, and precipitated with 2.5 vol ethanol. Polyacrylamide Gel Electrophoresis Electrophoresis was performed in denaturing polyacrylamide slab gels in 96% formamide (Maniatis et al., 1975). 32P-h or +X DNA restriction fragments were used as markers in each gel. Duplex DNA molecules were electrophoresed under native conditions in 4% polyacrylamide slab gels containing 50 mM Tris-borate (pH 8.3), 1 mM EDTA. The same buffer was used in electrophoresis reservoirs. Bands detected by autoradiography were excised and the DNA extracted (Gilbert and Maxam, 1973; Efstratiadis et al., 1975; Efstratiadis and Kafatos, 1976). DNA electrophoresed under native conditions and extracted from the gel can be ethanol-precipitated directly. A small amount of soluble gel material is eluted with the DNA and precipitates in ethanol, but can then be effectively separated from the redissolved DNA by centrifugation. By contrast, when samples are precipitated with ethanol, gel impurities from the formamide gels co-precipitate and trap the DNA almost completely. The impurities from formamide gels can be removed by passing the extract through a very small DEAE-cellulose column (100 ~1 in a 1 ml disposable syringe). The DNA binds and can then be eluted (but never completely) with 1 M NaCl after extensive washing of the column with IO mM Tris-HCI (pH 7.5) 1 mM EDTA. The extract of full-length cDNA can also be purified by passage through a poly(A)-Sepharose column (Efstratiadis et al., 1975). Non-denaturing aqueous gels are preferable to the formamide gels for preparative purposes. Fragmentation of Double-Stranded Globin DNA with Restriction Enzymes Both the terminally closed (hairpin) and open (SI-treated) forms of globin DNA were digested with site-specific restriction endonucleases. Hae Ill from Haemophilus aegyptius, and Alu I from Arthrobatter luteus were gifts from Dr. W. Gilbert, and Hinf from Haemophilus influenzae serotype f was a gift from Dr. J. Wang. x*P-labeled globin DNA and 5 pg carrier-sonicated calf thymus DNA were digested with each enzyme at 37OC in 50 ~1 10 mM Tris-HCI (pH 7.6), 10 mM MgC12, 1 mM DTT. When digestion was complete, 25 pg carrier tRNA, 0.4 ml 2 M NH4 acetate, and 1 ml 95% ethanol

were added, mixed, chilled at -70°C for 10 min, and the DNA sedimented at 10,000 x g for 20 min. Pellets were redissolved in 100 pl Hz0 and lyophilized. Under these conditions, protein is dissociated from DNA and remains in the supernatant; residual NH4 acetate is then removed from the pellet under vacuum, providing protein- and salt-free DNA suitable for electrophoresis on formamide gels (A. Maxam, unpublished observations). Reassociation and Hybridization of Double-Stranded Globin DNA Globin DNA, in the presence or absence of globin mRNA, in 0.4 M Na phosphate (pH 6.8) (PB), and in a final volume of 6 ,u.l in sealed capillaries, was denatured for 3 min at 100°C, and reassociated at 66’C. After the end of the incubation, each sample was transferred to 2 ml 0.12 M PB containing 100 pg carrier native calf thymus DNA, sheared to 300 NT. The reassociated or hybridized material was assayed by hydroxyapatite (HAP) chromatography. 1 ml HAP columns were used. The single-stranded fraction was eluted with 0.12 M PB at 6O”C, and the double-stranded fraction with the same buffer at 98°C. 2 ml fractions were collected and counted after addition of 5 ml Aquasol (NEN). Acknowledgments We thank Drs. J. Beard, M. Chirigos, W. Gilbert, S. Marcus, W. McClure, J. Stavrianopoulos, and J. Wang for materials; Drs. W. Gilbert and M. Ptashne for use of facilities; K. Livak for help in the purification of reverse transcriptase and globin mRNA; M. Koehler for valuable help with the figures; A. Jeffrey for DNA markers: and M. J. Randell and D. Taylor for expert secretarial assistance. This work was supported by grants from the National Science Foundation and the NIH to F.C.K. T.M. was supported by a National Cancer Institute grant to Cold Spring Harbor Laboratory. Received

October

21, 1975;

revised

December

1, 1975

Arndt-Jovin, D. J., Jovin, T. M., Bahr, W. Frischauf, quardt, M. (1975). Eur. J. Biochem. 54, 411.

A., and Mar-

Britten, R. J., Graham, D. E., and Neufeld, B. R. (1974). In Methods in Enzymology, 29, L. Grossman and K. Moldave, eds. (New York: Academic Press), p. 363. Efstratiadis, A., and Kafatos, F. C. (1976). In Methods in Molecular Biology, 8, J. Last, ed. (New York: Marcel Dekker), in press. Efstratiadis, A., Maniatis, T., Kafatos, kis, J. N. (1975). Cell 4, 367.

F. C., Jeffrey,

A., and Vourna-

Fujinaga, K., Parsons, J. T., Beard, J. W., Beard, M. (1970). Proc. Nat. Acad. Sci. USA 67, 1432. Gilbert, 3581.

W., and Maxam,

A. (1973).

Proc.

D., and

Nat. Acad.

Green, M., and Gerard, G. F. (1974). In Progress Research and Molecular Biology 14, W. E. Cohn, Academic Press), p 187. Harrison,

P. R., Hell, A.,

Jovin, T. M., Englund, Chem. 244, 2996.

and

Paul,

J. (1972).

P. T., and Bertsch,

Kacian, D. L., and Spiegelman, S. (1974). mology, 29, L. Grossman and K. Moldave, demic Press), p. 150.

Green,

Sci. USA

in Nucleic ed. (New

FEBS

Acid York:

Letters

L. L. (1969).

70,

24,

J. Biol.

In Methods in Enzyeds. (New York: Aca-

Kacian, D. L., Spiegelman, S., Bank, A., Terada, M., Metafora, Dow, L., and Marks, P. A. (1972). Nature New Biol. 235, 167. Kafatos, F. C. (1975). In Advances in Experimental Biology. 62, R. H. Meints and E. Davies, eds. (New Press), p. 103.

S.,

Medicine and York: Plenum

Cell 288

Kleid, D. G., Agarwal, Chem. 250, 5574.

K. L., and Khorana,

H. G. (1975).

Kleppe, K., Ohtsuka, E., Kleppe, R., Molineux, H. G. (1971). J. Mol. Biol. 56, 341. Leis, J. P., and Hurwitz, 2331. Maniatis, T., Jeffrey, 14, 3787.

Proc.

Nat. Acad.

A., and van deSande,

H. (1975).

Marcus, S. L., Modak, 14, 853.

J. (1972).

I., and

M. J., and Cavalieri,

Morrison, M. R., Brinkley, S. A., Gorski, J. Biol. Chem. 249, 5290.

J. Biol. Khorana,

Sci.

USA 69,

Biochemistry

L. F. (1974).

J., and Lingrel,

J. Virol.

J. B. (1974).

Nienhuis, A. W., Falvey, A. K., and Anderson, W. F. (1974). In Methods in Enzymology, 30, K. Moldave and L. Grossman, eds. (New York: Academic Press), p. 621. Olson, K., Gabriel, T., Michaelewsky, Nucl. Acids Res. 2, 43. Ross, Acad.

J., Aviv, H., Scolnick, Sci. USA 69, 264.

E., and

J., and Harvey, Leder,

P. (1972).

C. (1975). Proc.

Nat.

Shapiro, D. J., Taylor, J. M., McKnight, G. S., Palacios, R., Gonzalez, C., Kiely, M. L., and Schimke, R. T. (1974). J. Biol. Chem. 249, 3665. Sippel, A. E., Stavrianopoulos, J. G., Schutz, G., and P. (1974). Proc. Nat. Acad. Sci. USA 77, 4635.

Feigelson,

Stavrianopoulos, J. G., and Chargaff, Sci. USA 70, 1959.

Nat. Acad.

Taylor, Bishop,

J. M., Faras, J. M. (1972).

Verma, Nature

I. M., Temple, New Biol. 235,

Vogt,

V. M. (1973).

Vournakis, Nat. Acad.

E. (1973).

Proc.

A. J., Varmus, H. E., Levinson, Biochemistry 77, 2343. G. F., Fan, 163.

Eur. J. Biochem.

H., and

Baltimore,

W. E., and D. (1972).

33, 192.

J. N., Efstratiadis, A., and Kafatos, Sci. USA 72, 2959.

F. C. (1975).

Proc.

Enzymatic in vitro synthesis of globin genes.

Cell, Vol. 7, 279-288, February Enzymatic 1976, Copyright Q 1976 by MIT in Vitro Synthesis Argiris Efstratiadis, Fotis C. Kafatos, Allan M. Max...
3MB Sizes 0 Downloads 0 Views