Proc. Nati. Acad. Sci. USA Vol. 88, pp. 1943-1947, March 1991

Biochemistry

Construction of a uniform-abundance (normalized) cDNA library (reassociation/hybridization/hydroxyapatite)

SANKHAVARAM R. PATANJALI, SATISH PARIMOO, AND SHERMAN M. WEISSMAN Department of Human Genetics, Yale University School of Medicine, 333 Cedar Street, New Haven, CT 06510

Contributed by Sherman M. Weissman, December 4, 1990

We have used a kinetic approach to construct ABSTRACT cDNA libraries containing approximately equal representations of all sequences in a preparation of poly(A)+ RNA. Randomly primed cDNA fragments of a selected size range were cloned in A phage vector. Inserts were amplified by the polymerase chain reaction (PCR), denatured, and selfannealed under optimized conditions. After extensive but incomplete reannealing, the single-stranded fraction was relatively depleted of more abundant species of cDNA. Libraries of these fragments are suitable for cDNA subtraction, screening, or selection by hybridization and make it possible to detect and analyze cDNA corresponding to species of mRNA present at a low level in a small fraction of the cells in a complex tissue.

The total number of genes in the human genome has been estimated at between 50,000 and 100,000 (1). In any one cell type, perhaps 10,000 genes are expressed and these may be expressed at levels of from 200,000 copies to 1 copy or less per cell. With the advent of polymerase chain reaction (PCR) technology (2), it is possible to prepare cDNA libraries from single cells (3) and, in principle, to prepare a set of cDNA libraries that, in total, contain representatives of most or all genes. A complete cDNA library would be impractically large because of the variation in abundance of cell types, the need to obtain cDNAs from cells at several stages of development, and the different levels of expression of genes. Hence, 'it would be essentially impossible to subtract or select the rarest species. To overcome these difficulties, it would be desirable to be able to construct cDNA libraries containing equal amounts of cDNA from each gene expressed in a given cell, tissue, or organ (normalized cDNA libraries). Two approaches toward obtaining normalized cDNA libraries have been proposed (4). One approach depends on hybridization selection with genomic DNA so that the relative abundance of cDNAs would be proportional to the abundance of genes complementary to that cDNA in genomic DNA. The other approach depends on the observation that if cDNA reannealing follows secondorder kinetics, rarer species anneal less rapidly and the single-stranded fraction of cDNA becomes progressively more normalized during the course of the hybridization (5). For example, at the time that the rarest species was 50%6 annealed, the most abundant species in the single-stranded fraction would be no more than twice as abundant as the rarest species. We report here the application of the latter principle to the construction of normalized cDNA libraries. Although the actual kinetics ofthe reassociation reactions are considerably more complex (6, 7), the end effect is that libraries can be readily produced in which a minor mRNA from a small fraction of the cells of an organ are nearly as abundantly represented as is the most abundant RNA from the predominant cell population. Applications of this approach are discussed.

MATERIALS AND METHODS Preparation of Short-Fragment (sfQ cDNA Libraries. Total RNA was isolated from human adult thymus (Natl. Disease Res. Inst., Philadelphia) by the guanidinium thiocyanate/ phenol extraction method (8) followed by the removal of contaminating DNA by lithium chloride precipitation (9). Poly(A)+ RNA (5 pug), fractionated from total RNA by oligo(dT)-cellulose chromatography, was primed with a mixture of random hexanucleotides (Pharmacia) (300 ,ug/ml) and the first strand was synthesized using Moloney murine leukemia virus reverse transcriptase (Life Technologies). Second-strand cDNA synthesis was carried out by the method described by Gubler and Hoffman (10) except that ligase was omitted. The cDNA molecules [400-1600 base pairs (bp)] were selected after electrophoresis on low-meltingtemperature agarose gel. EcoRI linkers (Promega) were ligated to the cDNA fraction and it was cloned into calf intestinal alkaline phosphatase-treated AgtlO vector (Stratagene) as suggested by the vendor's protocol. Amplification of cDNA. Two sets of primers were synthesized flanking the EcoRI site of the AgtlO vector and are shown below. Set 1 contains primers a (AGCAAGTTCAGCCTGGTTAAG) and b (CTTATGAGTATTl CTTCCAGGGTA). Primers a and b are situated, on opposite sides of the cloning site, 8 and 10 bases away, respectively, from the EcoRI site of vector AgtlO. Set 2 contains primers c (AGCCTGGTTAAGTCCAAGCTG) and d (CTTCCAGGGTAAAAAGCAAAAAG). Primers c and d are immediately adjacent to the EcoRI site and leave no vector sequence when the amplified cDNA is cut with EcoRI for further cloning purposes. All the amplification reactions were performed for 40 cycles in a Perkin-Elmer thermal cycler using a step-cycle program of 94°C for 1 min, 55°C for 1 min, and 73°C for 2 min. The PCR mixture consisted of 50 mM KCl, 10 mM Tris HCl (pH 8.3 at 23°C), 1.5 mM MgCl2, all four deoxynucleotide triphosphates (each at 0.2 mM), 50 ng of each of the primers, 2 ng of phage DNA from the thymus library (obtained from 6 x 106 recombinant plaques), and 1 unit of AmpliTaq DNA polymerase (Cetus) in a total reaction volume of 50 ,ul (3). Primer set 1 was used to prepare the mixture of cDNA fragments. The unreacted primers were removed either by Quick Spin (Sepharose CL-6B, Linkers 6) columns (Boehringer Mannheim) or by fractionating the PCR products on agarose gels (11). The amplified DNA is referred to as sf-cDNA I. Preparation of Tracer DNA. Radiolabeled sf-cDNA I was prepared by the addition of 100 ,uCi of [a-32P]dCTP (NEN; 1 Ci - 37 GBq) to the amplification reaction mixture in a separate tube. The amplification was performed as described above and the free label was removed by use of a Sepharose Abbreviations: sf, short fragment; ssDNA, single-stranded DNA; dsDNA, double-stranded DNA; Cot, initial concentration of DNA (mol of nucleotide per liter) x time (sec); STH, short-fragment thymus; NSTH, normalized thymus; TCR, T-cell receptor.

The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. 1943

1944

Biochemistry: Patanjali et al.

CL-6B spin column. Labeled DNA between 400 and 1600 bp long was prepared by agarose gel electrophoresis. Reassociation Reactions. Reassociation reactions were performed routinely in a 50-,ul reaction mixture containing 0.3 M sodium phosphate (pH 7.0), 0.4 mM EDTA, 0.04% SDS, and sf-cDNA I (20 gg/ml). The reaction mixture was overlaid with mineral oil in a 0.5-ml tube and heat-denatured in a boiling water bath for 5 min. The tube was immediately transferred to another water bath (NesLab, endocal) maintained at 650C and DNA molecules were allowed to reassociate for 24, 48, 72, 96, 120, 216, or 288 hr. The reassociation was inhibited by chilling the reaction tube on wet ice and further diluting the mixture to 1.0 ml in 0.01 M sodium phosphate (pH 7.0) containing 0. 1% SDS. This diluted sample was applied to a hydroxyapatite column. Elution of singlestranded and double-stranded DNA (ssDNA and dsDNA, respectively) was performed as described below. Hydroxyapatite Chromatography. DNA-grade Bio-Gel HTP hydroxyapatite (Bio-Rad) was suspended at a 1:10 ratio in 0.01 M sodium phosphate (pH 7.0) containing 0.1% SDS. The slurry was packed to 1.0-ml bed volume in waterjacketed columns and run at flow rates of 8-10 ml/hr. The binding and elution properties were tested using an Hae III digest of M13mpl8 ssDNA and a 1-kilobase-pair (kbp) dsDNA ladder (Life Technologies) as double-stranded markers. The columns were maintained at 60TC using a Neslab (endocal) circulation water bath. The elution profiles of ssand dsDNA were calibrated using a step gradient of 0.05 to 0.4 M sodium phosphate (pH 7.0) with 0.02 M increments in the concentration of the phosphate buffer. All buffers contained 0.1% SDS. Desalting and Concentration of Eluates. Column eluates were concentrated to 100-200 ul from starting volumes of 10-12 ml by centrifugation through Centricon-30 filters and washed at least three times with sterile water or TE (10 mM Tris Cl, pH 8.0/1 mM EDTA). Construction of Normalized Library. The single-stranded fraction from 72-hr reassociated DNA [initial concentration of DNA x time (Cot) = 59.0 mol-sec/liter; ref. 6] was amplified using primer set 2. The 500- to 2000-bp fraction of the amplified DNA (sf-DNA II) was eluted from low-meltingtemperature agarose gel by phenol extraction followed by Centricon filtration and cloned into calf intestinal alkaline phosphatase-treated AgtlO vector. The construction of the normalized library is summarized in Fig. 1. Probes and Hybridization. Blur 8, a clone representing the 300-bp Alu repeats (12), and a 5.8-kbp rRNA gene clone (R-DNA) (13) were obtained from D. Ward (Yale University). cDNA clones of T-cell surface molecules CD4 (14) and CD8 (15) containing 1.8-kbp and 2.0-kbp inserts respectively, in pBS vector, were a gift from P. Kavathas (Yale University). An 800-bp probe from the constant region of the T-cell receptor (TCR) -y-chain (16) was provided by S. R. Carding (Yale University). A 1.5-kbp probe was isolated from a BamHI digest of a y-actin cDNA clone constructed in pBR322 (17). HLA-H, a pseudogene (6.9 kbp) containing a fragment from the human major histocompatibility complex (18) with extensive homology to class I-like sequences, was cloned from the JY cell line (lymphoblastoid) in our laboratory. A 300-bp fragment of the Oct-I gene, a sequencespecific DNA binding protein (19), was PCR amplified by using internal primers spanning coding sequences. A 1.4-kbp Cla I-EcoRI fragment of the human c-myc gene (20) containing exon 3 and 3' untranslated sequence was a generous gift of A. Hayday (Yale University). A 200-bp fragment of the ,8-globin gene and a 250-bp fragment of the tumor necrosis factor a were amplified by PCR from the S-globin cDNA clone-JW102 (21) and yeast artificial chromosome A73C3 (D. Chapman, Washington University School of Medicine, St. Louis), respectively. A 350-bp EcoRI-HindIII cDNA frag-

Proc. Natl. Acad. Sci. USA 88 (1991) Short fragment cDNA library in ogt1O I PCR (outer primers) Amplified product (sf-cDNA 1)

I

Heat denature and allow to reassociate till time, t (24, 48, 72, 96 hours etc)

Apply to hydroxyapatite

Elute single stranded DNA I PCR (inner primers)

Amplified product (sf-cDNA 11) Check for the extent of normalization by Southem blotting/probe hydrdization Identify the best normalized condition

I

Digest the normalized DNA with EcoRI Clone into Xgtl 0

NORMALIZED

LIBRARY]

FIG. 1. Normalization protocol.

ment of a-fodrin (22), a cytoskeletal protein, was a gift from B. Forget (Yale University). Southern blot analysis of DNA was performed (23) using nylon membranes (Hybond-N, Amersham). The DNA was fixed to the filters by baking for 2 hr at 80TC. The prehybridization and hybridization were performed in 1% crystallinegrade bovine serum albumin/1 mM EDTA/0.5 M sodium

phosphate, pH 7.2/7% (wt/vol) SDS (24) in a 650C water bath. Filters were washed twice with 0.1 x SSC (lx SSC = 0.15 M NaCI/0.015 M sodium citrate, pH 7.0) at 650C for 30 min.

dsDNA Withdrawal Experiments. The sf-cDNA I was heatdenatured and reassociated at 650C as described earlier. The reassociation reaction mixture (200 p1) contained 0.15 M sodium phosphate (pH 7.0), 0.4 mM EDTA, 0.04% SDS, and sf-cDNA I (20 Ag/ml). Hydroxyapatite, equilibrated with 0.15 M sodium phosphate (pH 7.0), was packed to 100 Al in a graduated microcentrifuge tube. The reassociating reaction mixture was added to the packed and dry hydroxyapatite at the end of 6 and/or 12 hr of incubation. The contents of the tube were mixed thoroughly and allowed to stand for 1 min. dsDNA alone binds to hydroxyapatite at the 0.15 M sodium phosphate concentration employed. The sample was centrifuged for 15 sec at 10,000 x g and the supernate was incubated 72 hr, and then ss- and dsDNA fractions were isolated and hybridized with R-DNA and HLA-H probes.

RESULTS Hydroxyapatite Chromatography. Separation of ss- and dsDNA on hydroxyapatite is well documented (6). The ssand dsDNA fractions were eluted respectively at 0.1 M and 0.35 M sodium phosphate. However, fragments of dsDNA smaller than 200 bp as well as all sizes of ssDNA fractions of the Hae III digest of M13mpl8 DNA were eluted at the same phosphate buffer concentration (0.1 M). To avoid the con-

Biochemistry: Patanjali et al. tamination of very small fragments of dsDNA in the ssDNA eluates, all normalization experiments utilized the DNA fraction larger than 400 bp. The ssDNA eluted from the hydroxyapatite column as such is not optimal for PCR amplification because (i) the presence of high concentrations of sodium phosphate and SDS, which inhibit AmpliTaq polymerase, and (ii) the low concentration of ssDNA in the eluate. Of the several methods, namely, Centricon-30 filtration, Glassmilk (BIO 101) purification (GeneClean protocol supplied by the manufacturer), 1-butanol and 2-butanol concentration (25), freeze drying followed by dialysis, etc., Centricon filtration was found to be the method of choice for recovery of small amounts of ssDNA because of the efficient recovery (>90%) and complete removal of salts. Reassociation and Normalization Studies. The sf thymus library (STH), prepared by random oligonucleotide priming, contained 6 x 106 recombinants and 0.1% of nonrecombinant clones. Phage DNA was prepared from a plate lysate of the library and used for PCR amplification of inserts. Initial amplification of the cDNA inserts of the STH library was performed using primer set 1. The amplified DNA (sf-DNA I) was extracted once with chloroform and passed over a Sepharose CL-6B spin column. The 400- to 1600-bp fraction was selected by electrophoresis through a lowmelting-temperature agarose gel and further utilized in normalization experiments. The sf-DNA I was reassociated from 24 to 288 hr (Cot = 19.7 to 236.0 mol-sec/liter), at the end of which, the ss- and dsDNA fractions were amplified using primer set 2 (sf-cDNA II). The sf-cDNA II showed a doublet of intensities between 500 and 1500 bp. However, the upperband intensity disappeared after EcoRI digestion prior to cloning into AgtlO vector. Time Course ofReassociation and Kinetics. The sf-cDNA II, obtained by PCR amplification of the ss- and dsDNA fractions of sf-cDNA I, was examined for the extent of normalization by hybridizing the DNA blots with several radioactive probes. rRNA genes (R-DNA) were found to be the most abundant species in the original library among the various probes tested (Fig. 2, lane 15). A major histocompatibility complex class IA probe, HLA-H, represented a cDNA of medium abundance. The CD4 cDNA was less-abundant than HLA-H and the y chain TCR cDNA was even less abundant. Single-stranded R-DNA and HLA-H cDNAs decreased rapidly with increasing reassociation times. Very little R-DNA was visible in the ssDNA fraction (Fig. 2, lane 1) even after 24 hr of reassociation. In contrast, the dsDNA fraction (Fig. 2, lane 2) showed large amounts of R-DNA, even when examined after 24 hr. The single-stranded HLA cDNA decreased less dramatically and significant amounts remained even after 96 hr of reassociation (Fig. 2, lane 7). Detectable amounts of CD4 were retained in the ssDNA fraction, even after 120 hr of reassociation. Very scarce mRNAs, like c-myc and TCR, were retained (50% or more) in the ssDNA fraction after the longest reassociation times studied. The sizes of the ss- and dsDNA fractions of sf-cDNA II decrease progressively with the increase in reassociation time, the effect being most pronounced after 120 hr of reassociation (Fig. 2, lanes 11-14). The decrease in size, as well as intensity, may be due to the thermal degradation of the reassociating DNA. Normalized Thymus Libraries (NSTH). Two normalized libraries, NSTH I and NSTH II, were prepared from the sf-cDNA II amplified from the ssDNA fractions at Cot values of 41.7 and 59.0 mol-sec/liter respectively. Approximately 1.0 x 106 of NSTH I and 1.7 x 107 of NSTH II recombinant clones were obtained per microgram of the cDNA insert, out of which approximately 0.5% was identified as nonrecombinant clones. Approximately 100,000 clones of each library

Proc. Nati. Acad. Sci. USA 88 (1991)

1945

R-DNA

eij

HLA-H

CD4

c-myc

*

* *

TCR

FIG. 2. Southern blot analyses of the normalization process. The ss- and dsDNA fractions at the end of 24, 48, 72, %, 120, 212, and 288 hr of reassociation of sf-cDNA I were amplified by primer set 2. The Cot (mol-sec/liter) values at the end of 24, 48, 72, 96, 120, 212, and 288 hr were 19.7, 39.3, 59.0, 78.7, 98.3, 177.0, and 236.0, respectively. The Cot values were corrected for the salt concentration employed in the reassociation mixture. PCR-amplified ssDNA (lanes 1, 3, 5, 7, 9, 11, and 13) and dsDNA (lanes 2, 4, 6, 8, 10, 12, and 14) fractions at the end of each reassociation time were hybridized to various probes, as indicated. Lane 15 represents the native (N) sf-cDNA I.

were probed with all the probes described in Fig. 2 as well as with additional probes. A dramatic reduction took place in the number of R-DNA clones (Table 1). The initial approximately 30% of R-DNA clones in STH decreased about 320-fold in the NSTH I library. However, the reduction in the number of Blur 8 clones was only about 2-fold. The large number of repetitive DNA clones in NSTH I may be due to the relatively short length of the Alu repeats and the considerable sequence divergence between copies of the repeat (26). Mediumabundance species such as y-actin and HLA-H decrease to almost the same levels as less-abundant species like CD4 and CD8 in the NSTH I library. The various cDNA species were present in more nearly equal numbers in NSTH II library than in NSTH I. The R-DNA clones decreased to 0.012%, representing a 2500-fold reduction as compared to the STH library. The repetitive clones showed a similar trend to that of NSTH I, decreasing to 0.36% from 0.8% in the STH library. The mediumabundance clones like HLA-H and the low-abundance clones such as a-fodrin differed by 33-fold in STH but were represented almost identically in NSTH II library. With the exception of Blur 8, all the cDNA species tested were present in equal abundance (within a factor of 2) in the NSTH library (Table 1) in contrast to a 10,000-fold variation in abundance in the STH library.

1946

Proc. Natl. Acad. Sci. USA 88 (1991)

Biochemistry: Patanjali et al.

Table 1. Effect of normalization on cDNA library Clones identified, no. per 100,000 plaques NSTH II NSTH I STH Probe 12 (0.012) 94 (0.094) 30,000 (30) R-DNA 360 (0.36) 450 (0.45) 800 (0.8) Blur-8 NT 37 (0.037) 110 (0.11) y-Actin 10 (0.01) 80 (0.08) 104 (0.104) HLA-H 12 (0.012) 37 (0.037) 28 (0.028) CD4 12 (0.012) 55 (0.055) 15 (0.015) CD8 8 (0.008) NT 9 (0.009) Oct-1 10 (0.01) NT 7 (0.007) B-Globin 11 (0.011) NT 5 (0.005) c-myc 8 (0.008) NT 5 (0.005) TCR 6 (0.006) NT 5 (0.005) TNF-a 9 (0.009) NT 3 (0.003) a-Fodrin cDNAs present at various levels of abundance in STH library become almost identically abundant in the normalized (NSTH) libraries. Increased reassociation times, as indicated by the increased Cot value, render better normalized libraries. The numbers in parentheses indicate the percentage of total plaques corresponding to each species. NT, not tested. For NSTH I the Cot value was 41.7 mol-sec/liter and for NSTH II the Cot value was 59.0 mol-sec/liter.

Ten clones were chosen randomly for analysis. One clone had an insert of about 350 bp and the remainder had inserts of between 500 and 1200 bp. Each insert was separately hybridized to a blot of EcoRI-digested human genomic DNA from JY cells. Three inserts contained some repetitive sequences, and all but one of the inserts hybridized to human DNA. Two discrepancies were noted with regard to predictions of annealing by second-order kinetics. The most abundant cDNAs were depleted more rapidly than expected and their absolute levels in the single-stranded fraction could be reduced to well below that of the less-abundant species. Also, even at the shortest annealing times, a significant fraction of the cDNA from scarce species was recovered in the doublestranded fraction. An explanation for the first anomaly has been discussed by others (7, 27). Since DNA used for normalization of sf-DNA I was originally prepared by random oligonucleotide priming, fragments ofvarious lengths were generated for every species of mRNA. During reassociation, the shorter and longer fragments of the same species, if paired homologously, generate single-stranded tails. Such structures can pair with the remaining ssDNA of the same species in a non-secondorder kinetic mechanism. Such kinetic behavior leads to the anomalous removal of the ssDNA into dsDNA fractions. Hence, even the very scarce mRNAs such as c-myc and TCR appear in both ss- and dsDNA fractions at all reassociation times. To verify such anomalous reassociation kinetics, the dsDNA was removed from a reassociation reaction by addition of hydroxyapatite at 6 and 12 hr in one experiment and only at 12 hr in another (Fig. 3). The ssDNA fractions of both the HLA-H

a

b

c

R-DNA

d

a

b

c

d

FIG. 3. Effect of selective removal of dsDNA. The dsDNA fraction of reassociating sf-cDNA I was removed at 6 and 12 hr (lanes a and b) and only at 12 hr (lanes c and d) and further incubated up to 72 hr at 650C. The ssDNA (lanes a and c) and dsDNA (lanes b and d) were isolated at the end of incubation.

experiments showed a considerable amount of R-DNA and HLA-H at the end of 72 hr of reassociation. Comparatively, the R-DNA was less abundant in the ssDNA if the dsDNA was removed only 12 hr after reassociation. Removal of dsDNA at either 12 hr alone, or at 6 and 12 hr, depleted the dsDNA with single-stranded tails (Fig. 3). Even after removal of the double-stranded material at 6 and 12 hr of reassociation, somewhat more of the scarce cDNA species was found in the 72-hr double-stranded fraction than might have been predicted.

DISCUSSION The goal of the present approach is to generate a cloned set of cDNA fragments containing an approximately equal representation of all mRNA sequences from any starting material, regardless of the relative abundance of the mRNA in the original material. We began by preparing cDNA using a random mixture of hexanucleotide primers to increase the chance of having representative cDNA from all parts of mRNA including the 5' portions of the longer RNA molecules. These fragments were initially size-selected to provide DNA short enough to be efficiently recovered by PCR, but long enough to favor efficient reannealing, allow stringent washing conditions during hybridization without the melting of perfect duplexes, and expedite assembly of extensive cDNA sequences from the final library. The cDNA fragments were recovered at each stage of the subsequent work by PCR, using primers complementary to the A phage vector sequences on either side of the cloning site. This provided an extremely sensitive way to recover small amounts of cDNA from the single-stranded fraction after extensive hybridization. The sequences on either side of the cloning site are known and we designed several nested sets of primer pairs. By using successive sets of primers, it was possible to obtain preparative amplification of the cDNA inserts without excessive accumulation of primer-derived PCR artifacts. An additional protection against cloning of artifacts was provided by cutting the amplified material with EcoRI and cloning at a site with overhanging EcoRI ends. Because the primers did not contain or overlap an EcoRI site, the only source for clonable ends on the fragments would have been internal EcoRI sites either present within the cDNA sequence or derived from the linker attached to the cDNA fragment at the time of the original cloning. Fractionation of the PCR products by gel electrophoresis prior to cloning further increased the specificity of the inserts in the final library. Simple second-order kinetics of hybridization predicted that at the time when the scarcest species in a mixture was 50%o annealed, the representation of the most abundant species in the single-stranded fraction would be no more than twice that of the scarcest species. The actual hybridization behavior proved to be considerably more complex. The most abundant species hybridized more rapidly than predicted and, in some of the initial experiments, was actually found to be present in the single-stranded fraction at a level considerably less than that of the scarcer species. A major cause of this effect was probably continued hybridization between double-stranded fragments and the remaining single-stranded cDNA. In practice, the effect could be reduced to a tolerable level by appropriate choice of the time and concentration of annealing conditions and further reduced by using hydroxyapatite to remove double-stranded products at various stages during the reaction. The effect could also be reduced, in principle, by carrying out one cycle of limited annealing, separation of ssDNA, amplification denaturation, and reannealing, but this last process would involve more steps of PCR amplification and risk a higher level of error in the sequences present in the final library. Perhaps the simplest

Biochemistry: Patanjah et al. approach to overcome the effect is to prepare two library fractions after annealing for different times and to recombine the fractions in an appropriate ratio. For example, adding 1 part of a library reassociated to a Cot value of 15.0 mol-sec/ liter to 10 parts of a library prepared after a 10-fold increase in Cot value would ensure that the most abundant initial cDNAs were still represented at a level of 1/10,000 in the final normalized library. The approaches described above proved to be very effective in reducing the abundance of the major cDNA species to that of the very scarce species. The relative abundance of rRNA gene sequences, the most abundant species in the initial (STH) library, was markedly reduced and the abundance of the rarest species was slightly increased. The Cot values and hence the enrichment for the rarest species could readily be increased by increasing the concentration of DNA (data not shown). One might expect that as the hybridization was pushed toward completion, one would begin to see more cDNA derived from incompletely spliced mRNA or nuclear transcripts that do not represent cytoplasmic mRNA. In part, though, these would be removed if they were linked to more abundant sequences such as repetitive DNA or true coding

regions. Removal of the more abundant species from a library prepared from a single cell type would only enrich the rare species by 3-fold or less because an estimated one-third of the mRNA in a single cell type is made up of species present at only 1-10 copies per cell (5). A single cell type may contain representatives of 10,000 or more mRNAs. If each mRNA contributed two or three separate 500- to 2000-bp cDNA fragments, after complete normalization, any single cDNA fragment would be present at about 1 copy per 30,000. This representation would go down further when the source of cDNA is a complex tissue, as in the present experiments, and if a library were derived from all mRNAs expressed at any time, each fragment would be present at an abundance of less than 1 part per 100,000. By preparation and initial normalization of libraries from each of a reasonable number of organs or tissues, separate partial normalization of each library, pooling of libraries, and renormalization one might approach such a fully representative library. While the present manuscript was in preparation a report appeared of an alternative approach using some of the above principles to derive partially normalized cDNA libraries (28), in this case derived only from the predominantly untranslated regions of the 3' ends of mRNA. Neither approach removes repetitive sequences effectively, but the use of randomly primed fragments representing the complete sequence of cDNAs enables us to quench the more abundant repeats during hybridization, selection, etc., and still retain specific unique sequence fragments representing essentially all of the cDNA species. The complete normalized cDNA libraries have several applications. They simplify screening of cDNA libraries for rare mRNA, particularly if the library is prepared from a complex source containing multiple cell types, and are more convenient for detection of rare mRNA species that distinguish different cell types, either by subtraction or differential hybridization. They are also advantageous for use with a modified contingent replication assay (29) for cloning celltype-specific peptides that bind to specific DNA sequences or activate transcription from ineffective promoters. Finally, and perhaps most importantly, they expedite identification of mRNAs encoded by very large genomic fragments such as those obtained by affinity capture (30) or yeast artificial

Proc. Natl. Acad. Sci. USA 88 (1991)

1947

chromosome cloning (31). In particular, they are useful in conjunction with cDNA hybridization selection procedures (S.P. and S.M.W., unpublished results) to rapidly identify and sequence cDNAs encoded by specific genomic DNA regions. We thank Mr. Leif Madsen and Ms. Barbara Gramenos for the excellent technical preparation of this manuscript. This work was supported by grants from the Huntington's Disease Foundation and the Department of Energy.

1. Bishop, J. O., Morton, J. G., Rosebach, M. & Richardson, M. (1974) Nature (London) 250, 199-204. 2. Saiki, R. K., Gelfand, D. H., Stoffel, S., Scharf, S. J., Higuchi, R., Horn, G. T., Mullis, K. B. & Erlich, H. A. (1988) Science 239, 487-491. 3. Li, H., Gyllensten, U. B., Cui, X., Saiki, R. K., Erlich, H. A. & Arnheim, N. (1988) Nature (London) 335, 414-417. 4. Weissman, S. M. (1987) Mol. Biol. Med. 4, 133-143. 5. Galau, G. A., Klein, W. H., Britten, R. J. & Davidson, E. H. (1977) Arch. Biochem. Biophys. 179, 584-599. 6. Britten, R. J., Graham, D. E. & Neufeld, B. R. (1974) Methods Enzymol. 29, 363-418. 7. Smith, M. J., Britten, R. J. & Davidson, E. H. (1975) Proc. Natl. Acad. Sci. USA 72, 4805-4809. 8. Chomczynski, P. & Sacchi, N. (1987) Anal. Biochem. 162, 156-159. 9. Cathala, G., Sayouret, J.-F., Mendez, B., West, B. L., Karin, M., Marctial, J. A. & Baxter, J. D. (1983) DNA 2, 329-339. 10. Gubler, U. & Hoffman, B. J. (1983) Gene 25, 263-269. 11. Ogden, R. C. & Adams, D. A. (1987) Methods Enzymol. 152, 86-87. 12. Houck, C. M., Rinehart, F. P. & Schmid, C. W. (1979) J. Mol. Biol. 132, 289-306. 13. Wilson, G. N., Hollar, B. A., Waterson, J. R. & Schmickel, R. (1978) Proc. Natl. Acad. Sci. USA 75, 5367-5371. 14. Norment, A. M. & Littman, D. R. (1988) EMBO J. 7, 34333439. 15. Maddon, P. J., Littman, D. R., Godfrey, M., Maddon, D. E., Chess, L. & Axel, R. (1985) Cell 42, 93-104. 16. Lefranc, M.-P. & Rabbits, T. H. (1985) Nature (London) 316, 464-466. 17. Gunning, P., Ponte, P., Okayama, H., Engel, J., Blau, H. & Kedes, L. (1983) Mol. Cell. Biol. 3, 787-795. 18. Chorney, M., Sawada, I., Gillespie, G. A., Srivastava, R., Pan, J. & Weissman, S. M. (1990) Mol. Cell. Biol. 10, 243-253. 19. He, X., Treacy, M. N., Simmons, D. M., Ingraham, H. A., Swanson, L. W. & Rosenfeld, M. G. (1989) Nature (London) 340, 35-42. 20. Richman, A. & Hayday, A. (1989) Mol. Cell. Biol. 9, 4%24969. 21. Wilson, J. T., Wilson, L. B., deReil, J. K., Villa-Komaroff, L., Efstratiadis, A., Forget, B. G. & Weissman, S. M. (1978) Nucleic Acids Res. 5, 563-581. 22. Moon, R. T. & McMahon, A. P. (1990) J. Biol. Chem. 265, 4427-4433. 23. Southern, E. (1975) J. Mol. Biol. 98, 503-515. 24. Church, G. M. & Gilbert, W. (1984) Proc. NatI. Acad. Sci. USA 81, 1991-1995. 25. Stafford, D. W. & Bieber, D. (1975) Biochim. Biophys. Acta 378, 18-21. 26. Singer, M. F. (1982) Int. Rev. Cytol. 76, 67-112. 27. Marrow, J. F. (1974) Doctoral thesis (Stanford Univ., Stanford, CA), pp. 101-190. 28. Ko, M. S. H. (1990) Nucleic Acids Res. 18, 5705-5711. 29. Vasavada, H., Ganguly, S., Settleman, S., DiMaio, D. & Weissman, S. M. (1988) Ind. J. Biochem. Biophys. 25,488-494. 30. Kandpal, R. P., Ward, D. C. & Weissman, S. M. (1990) Nucleic Acids Res. 18, 1789-1795. 31. Burke, D. T., Carle, G. F. & Olson, M. V. (1987) Science 235, 806-812.

Construction of a uniform-abundance (normalized) cDNA library.

We have used a kinetic approach to construct cDNA libraries containing approximately equal representations of all sequences in a preparation of poly(A...
1MB Sizes 0 Downloads 0 Views