Proc. Natl. Acad. Sci. USA Vol. 89, pp. 2056-2060, March 1992 Biochemistry

A positive selection vector for cloning high molecular weight DNA by the bacteriophage P1 system: Improved cloning efficacy (high molecular weight genomic DNA/bacteriophage P1 cloning/sacB gene/positive selection)

JAMES C. PIERCE, BRIAN SAUER, AND NAT STERNBERG The Du Pont Merck Pharmaceutical Company, Wilmington, DE 19880-0328

Communicated by James D. Watson, November 13, 1991

Analysis of the P1 human library revealed that a small number of the initial clones (10-20%6) contained P1 vector DNA without inserts and that these clones grew significantly faster than those with inserts. Consequently, when the various pools were amplified to prepare DNA for subsequent analysis, the majority of the vector molecules (as much as 80%) had no inserts. This made it more difficult than expected to isolate unique copy sequences from the library and to use those sequences in genome mapping strategies. Moreover, the problem might be more pronounced if the ligation reaction used to generate a library of cloned inserts was less than optimal, as, for example, if the insert or vector DNA was improperly digested or if the ratio of vector DNA to insert DNA was very high. Under these circumstances, one might expect to generate a P1 library with a much higher initial percentage of clones without inserts than was the case in our original library. Finally, since there was no easy way of preparing probes from the ends of the cloned insert in the original vector, even those clones with inserts were difficult to use for mapping studies. To overcome these problems, we have constructed a positive selection P1 vector (pAdlOsacBII) containing the Bacillus amyloliquefaciens sacB gene that greatly minimizes the recovery of clones without inserts and that permits the simple analysis of the ends of the insert DNA by RNA probe techniques. The positive selection permits one to easily evaluate the quality of the vector DNA and the results of ligation reactions before extensive clone analysis is necessary. Moreover, the vector contains unique and rare restriction sites to help size the insert. This cloning system has been used to construct a complete Drosophila library and a complete mouse library (ref. 3; unpublished data).

The bacteriophage P1 cloning system can ABSTRACT package and propagate DNA inserts that are up to 95 kilobases. Clones are maintained in Escherichia coil by a low-copy replicon in the P1 cloning vector and can be amplified by inducing a second replicon in the vector with isopropyl (3-Dthiogalactopyranoside. To overcome the necessity of screening clones for DNA inserts, we have developed a P1 vector with a positive selection system that is based on the properties of the sacB gene from Bacilus amyloliquefaciens. Expression of that gene kills E. coli cells that are grown in the presence of sucrose. In the new P1 vector (pAdlOsacBII) sacB expression is regulated by a synthetic E. coli promoter that also contains a P1 C1 repressor binding site. A unique BamHI cloning site is located between the promoter and the sacB structural gene. Cloning DNA fragments into the BamHI site interrupts sacB expression and permits growth of plasmid-containing cells in the presence of sucrose. We have also bordered the BamHI site with unique rare-cutting restriction sites Not I, Sal I, and Sfi I and with T7 and Sp6 promoter sequences to facilitate characterization and analysis of P1 clones. We describe here the use ofNotI digestion to size the cloned DNA fragments and RNA probes to identify the ends of those fragments. The positive selection P1 vector provides a 65- to 75-fold discrimination of P1 clones that contain inserts from those that do not. It therefore permits generation of genomic libraries that are much easier to use for gene isolation and genome mapping than are our previous libraries. Also, the new vector makes it feasible to generate P1 libraries from small amounts of genomic insert DNA, such as from sorted chromosomes. The bacteriophage P1 cloning system permits in vitro packaging of P1 vectors containing foreign DNA inserts that are as large as 95 kilobase pairs (kbp) (1). That DNA can be faithfully replicated as a low-copy plasmid in Escherichia coli, can be amplified to high-copy number by adding isopropyl ,3-D-thiogalactopyranoside to the medium, and can be readily isolated as supercoiled circles by standard molecular techniques (1, 2). The cloning efficiency with the P1 system (105 clones recovered per ug of vector) is intermediate between those of the other two high molecular weight DNA cloning systems-the A-cosmid system and the yeast artificial chromosome (YAC) system. P1 clones can be more than twice as large as cosmid clones, but they are significantly smaller than YAC clones. However, the fact that it is difficult to isolate more than several micrograms of YAC DNA from YAC clones, while such amounts of DNA are easily obtainable from P1 clones, suggests that the P1 system may fill an important niche in genome mapping and sequencing strategies. To address this issue a 50,000-member human DNA library consisting of 26 pools of 2000 clones each was constructed (2).

MATERIALS AND METHODS Construction of the pAdlOsacBll Vector. Construction was initiated by cutting the parent vector pNS582tetl4AdlO (henceforth called pAdlO) (2) at unique Sal I and BamHI restriction sites in the tetracycline gene. The 276-bp DNA fragment between these sites was removed and replaced with a series of oligonucleotides to generate the sequence shown in Fig. 1. Starting from its 5' end, this sequence contains a SnaBI site, a Sfi I site, a Sal I site, a Sp6 promoter, a BamHI site, a T7 promoter, a Not I site, a P1 C1 repressor binding site (4), and a near-consensus E. coli promoter. The E. coli promoter and C1 binding sites overlap. All of the restriction sites except Sft I are unique to the vector, and Sf1 I is unique to that portion of the vector that is recovered with the cloned insert in E. coli after phage P1 packaging (2). To insert the B. amyloliquefaciens sacB gene, we started with a 1.6-kb EcoRI fragment from plasmid pBE501 (5) that contains the entire Abbreviations: kanR, kanamycin resistance; sucR, sucrose resistance; CIP, calf intestinal alkaline phosphatase; BAP, bacterial alkaline phosphate; FIGE, field-inversion gel electrophoresis; YAC, yeast artificial chromosome.

The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. 2056

Biochemistry: Pierce et al.

Proc. Natl. Acad. Sci. USA 89 (1992)

SnaBI SfiI Sp6 SalI 5'TACGTAGGCCTAATTGGCCGTCGACATTTAGGTG ATGCATCCGGATTAACCGGCAGCTGTAAATCCAC

promoter BamHI ACACTATAGAAGGATCCTCTCCCTATAGTGAGTC

TGTGATATCTTCCTAGGAGAGGGATATCACTCAG N T7 promoter cl binding site

NotI

GTATTAGCGGCCGCAAATTTATTAGAGCAATATA CATAATCGCCGGCGTTTAAATAATCTCGTTATAT

E. coli promoter

GTCCTACAATGTCAAGCTCGA3' CAGGATGTTACAGTTCGAGCT

FIG. 1. Sequence of the promoter multiple cloning site region upstream of the sacB gene in pAdlOsacBII. This sequence was constructed by annealing two double-stranded oligonucleotides.

sacB structural gene and ribosome binding site but lacks a promoter element. We blunt ended this fragment by annealing it to an EcoRI adapter that also contains an internal Spe I restriction site and then inserted the entire construct into the SnaBI-cut pAdlO vector that had been modified as described above. This process destroys the vector SnaBI site. A construct was isolated in which the beginning of sacB was adjacent to the vector Sal I site (Fig. 2) and was designated

pAdlOsacBII. Preparation of soeBiH Vector DNA. The pAdlOsacBII plasmid must be grown in a bacterial strain (NS3607) containing the P1 C1 repressor. The repressor binds to its operator site in the E. coli promoter that regulates sacB expression in pAdlOsacBII and prevents the expression of that gene. This is necessary because synthesis of the sacB gene product, levansucrase, interferes with cell growth to some degree, even in the absence of sucrose. Consequently, amplification IoxP

7

r* cq~qo r27 -F

S- --

/XN-

-a.

C-

re=;ress:r

'ZZ M,

FIG. 2. P1 positive selection cloning vector pAdlOsacBII. The major portion of the vector is derived from the original pAdlO plasmid and has been described (1, 2). The sacB cassette was cloned between the unique BamHI and Sal I sites of the tetracycline gene (stippled bar) of pAdlO. This cassette contains the promoter multiple cloning site shown in Fig. 1 and the sacB gene. The location of the unique Sca I site is also shown.

2057

of the plasmid in bacterial strains without the repressor tends to select for rearrangements that inactivate sacB and/or its E. coli promoter. NS3607 is a derivative of E. coli recA strain DH~alacjq (6) that contains a resident Aimm21 prophage (Aimm21-P1:7A5b) that constitutively expresses the P1 C1 repressor (7). NS3607 also contains a AimmALP1 prophage (obtained from A. Wright, Tufts University Medical School) that constitutively expresses the laCjq repressor. This repressor blocks replication of the P1 lytic replicon on the vector (1, 2). Plasmid DNA was prepared from strain NS3607 (pAdlOsacBII) as described by Pierce and Sternberg (8). Standard DNA Methods. Restriction enzymes and T7 DNA ligase were purchased from New England Biolabs. Calf intestinal alkaline phosphatase (CIP) was purchased from New England Nuclear. Bacterial alkaline phosphatase (BAP) was purchased from Bethesda Research Laboratories. The restriction enzymes and the DNA ligase were used as specified by the vendors. The phosphatases were used as described by Pierce and Sternberg (8). P1 plasmid DNA was isolated by the alkaline lysis method of Birnboim and Doly (9). For Not I digests, the plasmid DNA was treated first with proteinase K (100 ,ug/ml) (Boehringer Mannheim) and 0.1% SDS for 1 hr at 37°C, then extracted with phenol and chloroform, and finally dialyzed against TE buffer (10 mM Tris HCI, pH 8.0/1 mM EDTA) for 1-2 hr (8). DNA fragments that are 104

kanR

sucR,

kanR Eff (3) (4) (4/3) DNA (2) (2/1) 5000 75 BamHI cut + 8 1.0 800 Sca I cut (BAP) BamHI cut (CIP) + 75 686 516 Fraction 1 5 Sca I cut (BAP) BamHI cut (CIP) + 328 63 Fraction 2 518 6 Sca I cut (BAP) BamHI cut (CIP) + Fraction 3 574 418 72 7 Sca I cut (BAP) BamHI cut (CIP) In experiments 1-3, we used 200 ng of vector DNA that was either uncut or cut with BamHI. The DNA in experiment 3 was then ligated with 400 units of T4 DNA ligase and all three of the DNAs were packaged in vitro. In experiments 4-7, 200 ng of vector DNA was first digested with Sca I and then treated with BAP. It was then extracted with phenol/chloroform and ethanol precipitated. After resuspension in TE buffer, the DNA was digested with BamHI and then treated with CIP. The DNA was again extracted with phenol/chloroform and ethanol precipitated. In experiment 4, this DNA was ligated and packaged as described above. In experiments 5-7, the digested vector DNA was mixed with 500 ng of human DNA that had been partially digested with Sau3AI and fractionated on a sucrose gradient before being ligated and packaged. Gradient fractions 1, 2, and 3 contain Sau3AI fragments that range in size from 80 to 120, 70 to 100, and 50 to 80 kb, respectively. The numbers of kanR or sucR, kanR colonies shown represent results obtained when 10%o of the packaging reaction was used to infect bacterial strain NS3529. Details of the methods used to digest, ligate, and fractionate the DNAs are described by Pierce and Sternberg (8). Eff, efficiency (%).

Exp. 1 2 3 4

Insert DNA None None None None

T4 DNA

kanR

Biochemistry: Pierce et al. concatemerized vector DNA (Table 2, line 3). Since P1 normally packages a headful (-110 kb) of DNA (2, 15, 16), substrates as small as the vector itself (-32 kb) are not expected to be recovered efficiently by in vitro packaging compared to a large concatemer. A possible explanation for these results is presented in the Discussion. For the actual cloning of genomic DNA, the vector is cleaved at both the Sca I and BamHI sites to generate two arms. The Sca I ends of the arms are treated extensively with BAP and the BamHI ends are treated mildly with CIP (8). This minimizes the ligation of spurious fragments to the Sca I ends, reduces the ligation of BamHI ends to each other, but still allows efficient ligation of the BamHI ends to Sau3AI partially digested genomic DNA fragments. Genomic DNA inserted between the two vector arms (2) creates the appropriate substrate for the subsequent in vitro packaging reaction. Experiments were performed with three different genome fragment preparations (fractions of sucrose gradients that were used to size select Sau3AI-digested human DNA). Addition of the insert DNA to the ligation reaction mixture increases the proportion of sucR, kanR colonies recovered after in vitro packaging -70-fold (Table 2, lines 4-7). Restriction Digestion Analysis of Plasmid DNA from pAdlOsacBII Clones. Plasmid DNA was isolated from 64 colonies produced on the kanamycin plates containing sucrose and 12 colonies produced on the kanamycin plates without sucrose (Table 2, line 5). The DNAs were digested with Bgl II and Xho I and fractionated on 1% agarose gels. All 64 of the DNAs from colonies produced on plates with sucrose contained genomic DNA inserts. Only 50o of the DNAs in colonies produced on plates lacking sucrose contain cloned inserts (data not shown). To determine the sizes of the cloned inserts, we isolated plasmid DNA from 26 sucR colonies from a cloning reaction similar to that in Table 2 (lines 4-7) but using fractionated Sau3AI-digested mouse genomic DNA. Those DNAs were digested with Not I and fractionated by FIGE. Fig. 3 illustrates results with 8 representative examples of the 26 clones. Lane 3 contains pAdlOsacBII DNA, which is 32 kbp. Lanes 4 and 5 contain a plasmid with the 18-kb kanR domain of the pAdlOsacBII vector (Fig. 2). Lanes 6-13 contain plasmids from sucR colonies with genomic inserts ranging in size from 30 to 85 kb (total plasmid DNA is insert plus 18 kb). The clones containing the smaller inserts (30-60 kb) can be eliminated by a more careful fractionation of digested DNA to eliminate fragments in the lower size range (2) and by taking greater care to completely digest and dephosphorylate the vector DNA. Failure to do the latter allows concatemers of vector DNA to form and permits the packaging and recovery of smaller inserts in phage containing a headful of DNA (2, 15). The Use of T7 and Sp6 Promoters for Generating Probes from the Ends of the Cloned Inserts. One important use of genomic libraries is for chromosome mapping and involves the linkage of individual clones into contiguous segments of DNA that span a portion of the cloned genome, a process called chromosome walking. The pAdlOsacBII vector facilitates this process by using T7 and Sp6 promoters (14) to generate RNA probes from the ends of the cloned DNA. End-specific probes can be used to determine whether both ends of a cloned DNA segment come from the same contiguous region of the genome, to identify the next clone along the contiguous sequence, and to facilitate restriction mapping of the insert. To demonstrate the use of RNA probes in identifying end fragments, DNA from P1 clones containing either 70- (clone 1) or 50-kb (clone 14) mouse genomic inserts were digested with Taq I, a restriction enzyme that recognizes a 4-bp site, and were then incubated with either T7 or Sp6 polymerase (Fig. 4A). The resulting RNA probes were hybridized to EcoRI or Bgl II digests of clone 1 or clone 14 DNAs that had been fractionated in a 0.8% agarose gel (Fig.

Proc. Natl. Acad. Sci. USA 89 (1992)

2059

PI clones 1

3 4 5 6 7 8 9 10 111213

15 16 1718

61 5kb I1 Okb --o-

70kb

--W- 40kb

FIG. 3. Not I digestion of pAdlOsacBII mouse clones. P1 clones were isolated by in vitro packaging of DNA from ligation reaction mixtures containing pAdlOsacBII vector DNA and Sau3AI partially digested, sucrose-fractionated mouse DNA. These reactions were similar to those shown in Table 2 (lines 5-7). Plasmid DNA was isolated from several of the kanR, sucR clones, digested with Not I, and fractionated by FIGE (lanes 6-13). Lane 3, pAdlOsacBII vector DNA digested with Not I. Lanes 4 and 5, Not I-digested plasmid DNA from two different kanR sucrose-sensitive colonies produced by packaging pAdlOsacBII DNA and using those phage to infect Cre' strain NS3529. Lanes 15-18, size markers. 17 DNA is 40 kb (lane 15), P1 mouse clone 20 is 70 kb (lane 16), P1 DNA is 110 kb (lane 17), and T4 DNA is 165 kb (lane 18).

4B) and transferred to nitrocellulose filters. The clone 1 T7 RNA probe hybridizes to a single EcoRI fragment from the clone 1 EcoRI digest (Fig. 4C). This is presumably the EcoRI fragment generated from the T7 end of the 70-kb insert in this clone. It also hybridizes to two BgI II fragments from this DNA. The detection of two fragments indicates that there is a Bgl II restriction site between the T7 promoter and the closest Taq I site in the insert. As expected, the clone 1 probe does not hybridize to any of the fragments produced by either the EcoRI or Bgl II digest of the clone 14 DNA. Moreover, as it fails to hybridize to a BgI II digest of either human or mouse DNA, it probably has no repetitive DNA sequences. Single copy genomic sequences in these DNAs could not be visualized in the short exposure times used in this experiment. The clone 14 RNA probe hybridizes extensively to BgI II-digested mouse DNA, indicating that it contains repetitive DNA (Fig. 4D). Despite this, it could be used to detect unique end fragments from clone 14 DNA but not from clone 1 DNA. A longer exposure of this gel results in the appearance of other fragments in both the clone 14 and clone 1 lanes, presumably due to their hybridization with the repetitive sequences in the probe. End fragments could also be detected with RNA probes generated from the Sp6 promoter of clone 1 and clone 14 DNA (data not shown).

DISCUSSION The essential feature of the P1 cloning vector pAdlOsacBII is the sacB gene. It permits one to discriminate between clones containing inserts and those that do not by a factor of -70-fold (Table 2). The effect of that discrimination on P1 cloning is 2-fold: it allows one to generate P1 libraries by pooling strategies and it makes practical the cloning of inserts under less than optimal ligation conditions.

A

B

._

Biochemistry: Pierce et al.

2060

Mouse genomic DNA insert

C

An.

;

D

!

..

,_

L::1 L;

P.

Proc. Natl. Acad. Sci. USA 89 (1992)

'.

..

A

-Li

'II

s ,-'

..i.

w

.:

M."'O

FIG. 4. Using RNA probes to localize end fragments of cloned P1 inserts. (A) Diagram of a pAdlOsacBII mouse clone. Various elements of the clone are designated in the figure. (B) Fractionation of Bgl II or EcoRI digests of DNA derived from two P1 mouse clones (clones 1 and 14) by electrophoresis in 0.8% agarose gels. Also shown are Bgl II digests of mouse and human DNA that were fractionated on the same gel. (C) Southern hybridization of a filter generated from the gel in B with an RNA probe prepared with T7 polymerase from Taq I-digested clone 1 DNA. The minor band in the lane containing the EcoRI digest of clone 1 DNA is presumably due to incomplete Taq I digestion of that DNA. (D) Same as C except the probe was prepared from Taq I-digested clone 14 DNA.

In the original cloning vector (2), there is a preferred amplification of clones lacking inserts. In pooled populations, this can create a large enough bias to interfere with the isolation and characterization of individual clones. We have recently constructed a mouse library in the pAdlOsacBII vector that consists of 300 pools of 400 clones each (data not shown). Less than 1% of the clones lack an insert. We have had little trouble using PCR techniques to isolate clones with unique sequences from this library. It is usually the practice to clone large inserts under conditions in which the ends of the vector DNA are treated with alkaline phosphatase and are in slight excess over insert ends, the conditions used in Table 2. These conditions permit efficient cloning of insert DNA and minimize the possibility of cloning two genomic fragments in the same vector molecule. Unfortunately, the ratio of vector to insert DNA is often very high when only small amounts of insert DNA are available. This is the case when sorted chromosomes or gel-purified fragments of chromosomal DNA are used. Under these conditions, most of the clones will have no inserts. Positive selection with the pAdlOsacBII vector eliminates the majority of these clones and selects the rarer insertcontaining clones. When the pAdlOsacBII vector is cleaved at the BamHI cloning site and then ligated in vitro, the ability of the vector to discriminate between clones with or without inserts is compromised, as evidenced by an increased recovery of sucR clones in the absence of added insert DNA (Tables 1 and 2). Analysis of these clones showed that the majority contained small deletions of vector DNA that included the BamHI site

and the adjacent sacB sequences (data not shown). We assume that they are generated by aberrant digestion and/or ligation reactions. Elimination of these reactions should increase the discrimination level of the vector. One of the surprising observations made in these studies is the high efficiency (5-10% compared to concatemerized vector DNA) with which the in vitro packaging system recovers both uncut vector DNA or singly cut, BamHIdigested vector DNA as kanR clones (Table 2, lines 1-3). Since the P1 packaging system should only be able to encapsidate a headful of DNA (%s110 kb of DNA for normal P1 heads and 45 kb for small P1 heads) (2, 16), we expect neither of these DNAs to be an efficient substrate for the system (pAdlOsacBII vector DNA is 32 kb). Typically, P1 packaging lysates contain 10-15% small heads (ref. 16; unpublished data). To further investigate the packaging of cut and uncut 32-kb vector DNA, we analyzed the product of those reactions by CsCl equilibrium density gradients. That analysis indicates that the resulting kanR-producing phage have the same density as P1 plaque-forming particles and P1 small-headed phage and, hence, a normal headful of DNA. Thus, the DNAs must be packaged from headful-sized concatemers. How these concatemers are generated is still unclear, but they are probably not due to ligation of DNA in one of the packaging reactions since no evidence of ligation is detectable in reconstruction experiments (data not shown). Besides the positive selection system, another advantage of the pAdlOsacBII vector over our previous P1 cloning vector is the incorporation of several features that facilitate the characterization of the insert DNA. First, the presence of unique and rare restriction sites (Not I, Sal I, Sfi I) flanking the BamHI cloning site permits one to easily size and isolate the insert DNA (Fig. 3). Moreover, since Not I and Sfi I sites are not likely to be present in most of the cloned fragments, these sites can be end labeled after digestion and that DNA can be used to map the location of other restriction sites within the insert DNA by partial enzyme digestion. The presence of T7 and Sp6 promoters flanking the BamHI cloning site permits the production of RNA probes from both ends of the insert (Fig. 4). This should make it easier to evaluate the fidelity of the cloning process and should facilitate the use of the clones in chromosome walking strategies. We would like to thank Dr. Vansantha Nagarajan for the pBE501 plasmid. This work was supported by National Institutes of Health Grant R01-HG00339-02. 1. Sternberg, N. (1990) Proc. Natl. Acad. Sci. USA 87, 103-107. 2. Sternberg, N., Ruether, J. & DeRiel, K. (1990) The New Biol. 2, 151-162. 3. Smoller, D., Petrov, D. & Hartd, D. (1991) Chromosoma 100, 487-494. 4. Eliason, J. L. & Sternberg, N. (1987) J. Mol. Biol. 198, 281-292. 5. Tang, L. B., Lenstra, R., Borchert, T. V. & Nagarajan, V. (1990) Gene 96, 89-92. 6. Kaiger, B. D. & Jessie, J. (1990) Focus 12, 28. 7. Sternberg, N., Sauer, B., Hoess, R. & Abremski, K. (1986) J. Mol. Biol. 187, 197-212. 8. Pierce, J. C. & Sternberg, N. (1991) Methods Enzymol., in press. 9. Birnboim, H. C. & Doly, J. (1979) Nucleic Acids Res. 7, 1513-1523. 10. Sambrook, J., Fritsch, E. F. & Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual (Cold Spring Harbor Lab., Cold Spring Harbor, NY), 2nd Ed. 11. Sauer, B. & Henderson, N. (1988) Gene 70, 331-338. 12. Gay, P., Le Coq, D., Steinmetz, M., Ferrari, E. & Hoch, J. A. (1983) J. Bacteriol. 153, 1424-1431. 13. Gay, P., Le Coq, D., Steinmetz, M., Berkelman, T. & Kado, C. I. (1985) J. Bacteriol. 164, 918-921. 14. Melton, D. A., Krieg, P. A., Rebagliati, M. R., Maniatis, T., Zinn, K. & Green, M. R. (1984) Nucleic Acids Res. 12, 7035-7054. 15. Streisinger, G., Emrich, J. & Stahl, M. M. (1967) Proc. Natl. Acad. Sci., USA 57, 292-295. 16. Yarmolinsky, M. & Sternberg, N. (1988) Bacteriophage PI, ed. Calender, R. (Plenum, New York), pp. 291-438.

A positive selection vector for cloning high molecular weight DNA by the bacteriophage P1 system: improved cloning efficacy.

The bacteriophage P1 cloning system can package and propagate DNA inserts that are up to 95 kilobases. Clones are maintained in Escherichia coli by a ...
1MB Sizes 0 Downloads 0 Views