Somatic Cell and Molecular Genetics, Vol. 18, No. 1, 1992,pp. 7--18

Efficiency and Limitations of the hn-cDNA Library Approach for the Isolation of Human Transcribed Genes from Hybrid Cells Pu Liu, 1 M. B e n j a m i n P e r r y m a n , 2 Warren Liao, 3 and M i c h a e l J. Siciliano ~ Departments of ~MolecularGenetics and ~Biochemistryand Molecular BioIo&; The Universityof Texas M.D. Anderson Cancer Center7 and "~Departmentof Medicine, BayIor Collegeof Medicine, Houston, Texas, 77030 Received 15 August 1991--Final 18 October 1991

Abstract--The use of splice donor site consensus sequences as primers in cDNA synthesis (to make a cDNA library from heterogeneous RNA or unprocessed transcript--an hn-cDNA library) and the screening of such an hn-cDNA library with human repeat DNA probe in order to isolate human genes from somatic cell hybrids have been demonstrated. Here, we optimize and evaluate the efficiency and limitations of the approach. Computer analysis of genomic sequences of 22 randomly selected human genes indicated that hexamers CTTACC, CTCACC, and CCTACC were most efficient at beginning first-strand cDNA synthesis at donor splice sites of hnRNA and suggested that the procedure is efficient for priming cDNA synthesis of at least one exon from most every gene. Primer extension experiments established conditions in which the primers would initiate synthesis of cDNA starting from a perfectly matched position on the RNA template at mote than 60-fold higher yield than any other product. By isolation of a clone containing exon III of the human DNA repair gene ERCC1, we indicate that the approach is capable of cloning exons from weakly expressed genes. Sequencing of clones revealed a structul~ of hn-cDNA clones consistent with the expectations of the cloning strategy and indicated the potential of the clones in detecting polymorphisms. Finally, we demonstrate that the expression of these hn-cDNA sequences in cells can be detected efficiently at the hnRNA level by reverse transcriptase-polymerase chain reaction (RT/PCR).

INTRODUCTION

Isolated transcribed sequences from specific regions of the human genome provide valuable materials for human genetic studies. The biological importance of transcribed genes from a region or a chromosome suggests their use as anchor points for mapping and start sites for sequencing in the genome project. With transcribed segments from specific genomic regions in hand, identifying candidate genes for human inherited traits mapped to those regions becomes practical. Finally, region-specific transcribed segments should become essential reagents

for the studies on the organization and evolution of chromosomes and gene families as well as the regulation of gene expression. We have used a human x rodent somatic ceil hybrid containing limited human chromosomal material to make a cDNA library containing human transcripts from defined regions of the human genome (1). In order to distinguish human transcripts from the vast excess of rodent gene products that would be present in such a library, hnRNA, rather than cytoplasmic mRNA, was used as template to make the cDNA library. This was accomplished by using hexamers, reverse-

0740-7750/92/0100-0007506.50/0 © 1992 Plenum PublishingCorporation

8

Liu et ai.

complementary to splice donor sites, as primers for first-strand cDNA synthesis to select for hnRNA as template in the reaction. Since hnRNA still retains introns containing species-specific repetitive sequences, the low frequency of human clones present in such an hn-cDNA library constructed in this manner were readily identified by screening with total human DNA. The splice donor site sequences were used as primers not only because they should preferably prime from hnRNA, but also because the cDNA synthesis would proceed first through the adjacent exon, ensuring the presence of coding sequences in the hncDNA molecules generated. This enrichment for exons is seen as important for the use of such clones in identifying expressed genes from the region of interest. In this report we investigate several critical issues with respect to the effectiveness of the approach to accomplish the tasks described above. By computer analysis we determined the frequency of occurrence of the hexamer sequences at 5' splice sites

compared to their location at other positions of the primary transcripts in order to evaluate their efficiency of priming at exons for human genes. A series of primer extension experiments was conducted to determine conditions to maximize the fidelity of hexamer priming. Conditions for screening hn-cDNA libraries for human transcripts were optimized, the effectiveness of the approach in isolating exons of weakly expressed genes was determined, and representative clones were sequenced to: (1) evaluate the structure of transcripts cloned in meeting the theoretical expectations of the cloning strategy; and (2) construct PCR primers to analyze more effectively the expression characteristics of the isolated clones. MATERIALS AND M E T H O D S

Computer Study. DNA sequence analysis was performed with the GCG software package of the University of Wisconsin on the VAX system of the University of Texas at Houston. Table 1 lists the 22 human genes

Table 1. Human Genes Used in Analysis of Primer Specificity GenBank code

Gene name

Gene size (bp)

Splice sites (N)

humfbrg humhbb humhlasba humhpars2 humhp2fs humifng humil2a humkerep hummha2 hummhcp42 hummhdrs2 hummycc humops humpomc humprca humrash humtbb5 humtcbjc humtcrac humthbnb humtpa humupax

fibrinogen gamma beta globin locus HLA-SB(DP) c~ haptoglobin-related gene haptoglobin Hp2 gene interferon gamma interleukin-2 50-kD a type 1 epidermal keratin MHC class 1-A2 21-bydroxylase b HLA-DR c~-chain exons 2-5 c-myc opsin proopiomelanocortin protein C c-Ha-ras [3-tubulin T-cell receptor germline [3-chain T-cell receptor a chain C region prothrombin tissue plasminogen activator urokinase-plasminogen activator

10,564 10,350 8,629 10,599 4,867 5,961 6,684 5,339 4,000 3,580 3,656 5,198 6,953 7,664 11,725 6,453 8,874 4,913 5,089 4,957 36,594 7,258

9 12 4 3 5 3 3 7 7 9 3 2 4 2 7 3 3 6 3 5 13 10

hn-cDNA Library Efficiency

used in this study. They were selected from the GenBank because both intron and exon sequences are available. Program "FIND" was used to look for the presence of each of the four hexamers, CTTACC, CTCACC, CCTACC, and CCCACC, in these 22 gene sequences. Only the sense strand was considered, since the other strand is irrelevant for this study. Matching the locations of these patterns with the positions of the splice donor sites and the calculation of percentages were performed manually. Primer Extension. Rabbit globin RNA from BRL was used as a template for primer extension. Rabbit globin RNA (200 ng) and 60 ng primer CTCACC were denatured in 8 txl 0.2 M NaC1 solution at 95°C for 1 min followed by annealing at different temperatures for 10 min. Then the reaction solution was adjusted on ice to 100 mM NaC1, 50 mM Tris pH 8, 1-6 mM MgCI 2, 50 ~g/ml actinomycin D, 0.5 mM dNTP, 1.5 ~Ci/~l [32p]dCTP, 10 mM D~UI", 0.25 units/~l of human placental RNase inhibitor (BRL), and 0.375 units/~t AMV reverse transcriptase (BMB) in a final volume of 20 ixl. The cDNA synthesis was allowed at different temperatures for 1 h. An aliquot of each reaction was then separated from free nucleotides by Sephadex G-50 spin column (Worthington), dried with a speed vacuum condensator, and resuspended in l x loading buffer. The samples were separated on 6% urea polyacrylamide gel, which was then dried and exposed to film at -70°C overnight. Quantitation of primer extension products was conducted with a [3-scanner (Betagen). Library Screening aim CotlO Human DNA. The library colonies were grown on nylon filters at 10,000 colonies/135-cm filter. Replica filters were hybridized with 32p_ labeled Cot10 human DNA overnight and exposed to X-ray films as described (1). After matching with bacterial plates to identify positive colonies, the probe was removed from the filters, which were then rehybrid-

9

ized with a human ERCC1 cDNA probe pE12-12 (2). For human Cotl0 DNA preparation, human placental DNA (Sigma) was dissolved in TE at 5 mg/ml and sonicated to 500 bp. Five milligrams of the DNA was denatured by boiling and allowed to hybridize in 2 ml 5x SSC at 65°C for 4 min to reach a Cot value of 10 [Cot value = DNA concentration (rag/rot) x time (minute)] (3). Singlestranded DNA was digested with 800 units of nuclease S1 at 37°C for 2 h. Excess enzyme was removed by proteinase K digestion, and finally the DNA was phenol-chloroform extracted and ethanol precipitated. Southern and Northern Blot Hybridization. Both nucleic acid hybridizations were performed as previously described (1). DNA Sequencing. Human hn-cDNA inserts cloned in pBR322 were sequenced by the dideoxy method as described (4) using pBR322 Pstl primers (BRL). Sequence analyses were performed on the VAX system of the University of Texas at Houston with the GCG software package of the University of Wisconsin. PCR Reaction for Expression Detection. Total cellular RNA was isolated from cultured cells using RNAzol (Cinna/Biotecx) and digested with RNase-free DNase (Pharmacia) to remove residual DNA before use. Amplification of RNA transcripts via complementary DNA was carried out essentially as described (5). Briefly, first-strand cDNA was synthesized from 2 ~g total cellular RNA in 20 txl I x PCR buffer with random hexamer primer and M-MLV reverse transcriptase (BRL) at 42°C for 1 h. After 15 min of incubation at 95°C of the 20 txI reverse transcription mixture, 80 ~1 of l x PCR buffer containing 50 pmoI of each upstream and downstream primer, t unit of Perfect match polymerase enhancer (Stratagene), and 2.5 units of Taq polymerase (Cetus) was added. Forty cycles of 94°C for I min, 60°C for 1 min, and 72°C for 2 min were performed, and an aliquot of each reaction product was checked

10

Liu et al.

by agarose gel electrophoresis. Primer sequences for human ERCC1 are: hERCC-I: CTCCAGATGGACCCTGGGAA; hERCC2: A C A G G G C C T C A G A T G T C C T ; and hERCC-3: CAGAGGCTGTGAGATGGCA.

RESULTS

Specificity of Primers. The process of nuclear pre-mRNA splicing in higher eukaryotes requires that the sequences at the 5' and 3' junction sites be correctly cleaved and the two ends accurately joined. This is specified largely by evolutionarily conserved sequences at the two splice sites. These conserved sequences have been demonstrated in matrices displaying the frequency of each base at each position surrounding the exon-intron junctions. Based on such a matrix (6), four hexamer sequences were selected to represent 5' splice sites. They are GGTAAG, GGTGAG, GGTAGG, and GGTGGG. The first base is located on exon side and the next five bases on intron side. Their reverse complementary sequences CTTACC, CTCACC, CCTACC, and CCCACC have been used to construct an hn-cDNA library. It is important to know how representative these four hexamers (as a sequence rather than individual bases) are for the 5' splice sites. We analyzed the issue of primer specificity by evaluating 22 human genes in GenBank in which complete genomic se-

quence information was available. These 22 genes are composed of 178,263 bp DNA with 123 5' splice sites (Table 1). As shown in Table 2, columns A and B, there are 22 GGTAAG (18%), 31 GGTGAG (25.2%), 8 GGTAGG (6.5%), and 10 GGTGGG (8.1%) among the 123 5' splice sites. As a group, these four hexamers represent 57.7% of 123 splice donor sites in these 22 genes. Since a random hexamer will be present by chance at the 123 splice donor sites only 0.03 times (123/46), use of the consensus donor site hexamers represents a several hundred-fold enrichment over random hexamers for splice donor sequence recognition. Moreover, considering that mammalian genes usually contain multiple exons, the four hexamers will be present at least once at donor sites in most or all genes. In fact, 21 of the 22 genes studied have at least one of the four hexamers at their splice donor sites. However, not all consensus hexamers were at donor sites. As seen in Table 2, columns C and D, only 56.4% (22/39) of total GGTAAG sequences in the 22 genes, 42% (31/74) of GGTGAG, 23.5% (8/34) of GGTAGG, and 8% (10/124) of G G T G G G are present at donor sites. Therefore, only 26% (71/271) of hn-cDNAs cloned with these primers would actually be expected to be the result of priming from splice donor sites and therefore would be assured to contain exons. Close inspection reveals that hexamer G G T G G G is the main reason for

Table 2. Distribution of Four Hexamer Sequences in 22 Genes (Table l) A

B

C

D

Hexamers

Donor sites identified (N)

Donor sites identified (%)

Total sites (N)

Sites identified at donor (%)

1. G G T A A G 2. GGTGAG 3. G G T A G G 4. G G T G G G 1+ 2+ 3 + 4 1+ 2+ 3

22 31 8 10 71 61

17.9 25,2 6.5 8.1 57.7 49.6

39 74 34 124 271 147

56,4 41.9 23.5 8.1 26.2 41.5

hn-cDNA Library Efficiency

the poor percentage, since it recognizes only 10 splice donor sites but 114 nondonor sites. To increase both the specificity and the representation of oligomers at donor sites, different compositions of the oligomers (derived by adding or deleting bases from both the exon end and the intron end of the existing hexamers) were tested. None improved the efficiency of priming at the splice donor sites in the data base simulation. These results indicate that the best combination of primers for hn-cDNA library construction is CTTACC, CTCACC, and CCTACC (reverse complementary sequences of GGTAAG, GGTGAG, and GGTAGG). They should be able to identify 50% of the splice donor sites in a random segment of genome, and, of the total number of sites they identify, 42% should be splice donor sites. Therefore, at least 42% of the cDNAs cloned following priming with this hexamer set would be expected to contain exons--at least an order of magnitude enrichment for the cloning of exons than can be accomplished by random priming. Primer Fidelity. The above numbers for the efficiency of priming from the splice donor sites are meaningful only if these hexamers will prime cDNA synthesis from perfectly matched sequences. There has been no experimental study on the fidelity of hexamers in priming cDNA synthesis. Primer extension experiments were conducted to investigate this issue. Rabbit globin RNA (BRL) was used as a template for primer extension. In the two (~ and 13) globin RNA sequences (total of 1142 bases, excluding poly(A) tail) (7, 8), there is a cryptic splice juction site--a GGUGAG sequence (complementary to the CTCACC primer) located at position 129-134 (counting from the 5' terminus of the RNA, same below) of the ~3globin RNA. While it is the only such sequence in those molecules, 12 other sequences differing by only a single base are also present (e.g., at position 103-108 of c~

11

globin RNA there is GGCGAG, see Table 3 for the complete list of such sites). cDNA synthesis was performed using the rabbit globin mRNA as template and CTCACC as primer in the presence of ~2p-labeled dCTP. The cDNA product was run on 6% urea polyacrylamide gel. Assuming all cDNA products would be full length (a reasonable assumption since the whole molecule is relatively short), the sizes of tbe products, as determined from gel electrophoresis, are indicative of the priming position. For example, when the primer binds to its perfectly complementary sequence (at position 129-134 of [3 globin RNA), a 134-nt cDNA product should be generated. Different conditions for cDNA synthesis were tested to determine the extent of, and optimum conditions for, fidelity of hexamer priming. The following factors were found to be important: 1. A high-temperature incubation (4570°C) of RNA and primer before addition of the enzyme to allow annealing is necessary to have specific binding. As seen in Fig. 1A, without an annealing period, there is very little enrichment for the expected band at nt position 134 (lane 1). When annealing between RNA and primer was performed at 55°C for 10 min before adding other compoTable 3. Position of Hexamer GGTGAG and Distribution of its Related Sequences~ in Rabbit Globin mRNA Sequences Position 69 103 107 126 129 233 270 317 324 357 432 462 542

c~globin

t3 globin AGTGAG

GGCGAG GGTGAA GGAGAG GGTGAG GGTGAA AGTGAG GCTGAG GGTGAA GGTGAC CGTGAG

°Mismatched base indicated in bold.

GGTGTG AGTGAG

12

Liu et al.

Fig. 1. Primerextensionwith rabbit globinRNA and primer CTCACCunder differentconditions.(A) Preannealing vs. no preannealing.In lane 1, no preannealingof the primer and RNA was done; in lane 2, preannealingat 55°Cfor 10 rain was performed. The primer extension products were separated by polyacrylamidegel electrophoresis. (B) Different annealingtemperatures. (C) DifferentMg> concentrations. nents and the enzyme, a prominant band was produced at position 134 (lane 2). Lane background was greatly reduced and faint discrete bands were visible at positions corresponding to some of the 12 sequences differing by one nucleotide from G G U G A G . The intensities of bands were detected and calculated with a [3-scanner and it was determined that the intensity of the 134-nt product was at least 10-fotd that of the brightest band at any other position. 2. The specificity was higher when higher temperature was used for annealing; however, the general yield decreased when higher temperature was used (Fig. 1B). The scanning results indicated at least a twofold higher specific intensity of the 134-nt product than any other at the annealing temperature of 65°C than at 55°C. 3. The specificity of annealing was again

higher at lower concentrations of Mg 2÷ (Fig. 1C). Intensity comparisons (calculated as above) indicated over a threefold increase in specificity at 1 mM concentration of Mg 2+ than at 3 mM. 4. Primer amounts used ranging from 30 to 120 pmol for the 200-ng R N A template did not affect the yield and specificity" significantly, and higher temperature of cDNA synthesis decreased the efficiency of yield of the 134-nt product (data not shown). Therefore, the optimal conditions for specific hexamer priming were determined as: 200 ng of rabbit globin RNA and 30 pmol of the primer heated at 95°C for 1 rain followed by annealing in 200 mM NaC1 at 65°C for 10 min, then adjusting the reaction solution on ice to 100 mM NaCI, 50 mM Tris pH 8, 1 mM MgC12, 0.5 mM dNTP, 1.5 tzCi/ixl [32p]dCTP, 50 txg/ml actinomycin D,

hn-cDNA Library Efficiency

10 mM DTT, 0.25 units/ixl human placental RNase inhibitor (BRL), and 0.375 units/~l AMV reverse transcriptase (BMB) in a final volume of 20 Ixl and incubated at 42°C for 1 h. As indicated, these conditions result in at least a 60-fold increase of specific priming at perfectly matched sequences as opposed to priming at sequences mismatched at one of the six nucleotides of the hexamer primer. Therefore, the conditions recommended for producing cDNA for making an hn-cDNA library are these plus the inclusion of 2 txg poly(A) + RNA from the somatic cell hybrid cell line of interest and 100 pM each of the three hexamers described above in a 50-~1 reaction. Efficient Library Screening. The frequency of positive colonies identified by human Cotl0 DNA was at least five times higher than that obtained using total human DNA (data not shown).

Sequence Anatomy of Human hn-cDNA Clones. Three representative hn-cDNA clones (2B3, 4-3, and 4-6) were sequenced (Fig. 2). 2B3 we now know is an exon of the DNA repair gene ERCC1 and its isolation is described below. Clones 4-3 and 4-6 were described originally in Liu et al. (1) as identifying bands on Northern blots and were therefore suspected of containing exons of expressed genes. The sequences of all three clones have well-matched splice acceptor sites (underlined in figure) as judged by the computer program "fitconsensus" (all have scores > 40 compared with the 3' splice site consensus sequence). The region 5' upstream from the acceptor site should be the intron and partial Alu sequences were identified in this region (in italics) in all three sequences. The Alu sequences all contain poly(A) tails identified as sites of putative high-frequency variable-length polymorphisms (9). Sequences downstream from the putative acceptor sites should be derived from exons--supported by the presence of an open reading frame in at least one phase in each. Computer analysis did not detect any

13

significant homology between the putative exon portions of 4--3 and 4-6, and the primate gene sequences in GenBank. The splice donor sites are missing due to mung bean nuclease treatment. That was necessary to remove hairpin loops after first-strand cDNA synthesis (1). Therefore, the se~ quences fit the structure expected of hncDNA clones produced by priming from splice donor sites and selected by low-Cot DNA screening.

Isolation of Weakly Expressed Gene Sequence. To determine whether or not exons of weakly expressed genes could be isolated through this approach, the library was screened for the presence of human DNA excision repair gene ERCC1. It has been shown that this chromosome 19 gene was present and expressed in the hybrid cell used to make the library (10). Van Duin et al. (11) indicated that this gene is constitutively expressed at a low level. They also determined that the gene consists of 10 exons spreading over approximately 15 kb. Six of the nine splice donor sites match with one of the four hexamers used as primer for cDNA synthesis. Sequential screening of the library with human Cotl0 DNA and a human ERCC1 cDNA probe pE12-12 identified one double positive clone out of 150,000 colonies. Sequencing of this clone (named 2B3) revealed that it contains 142 bp of exon III and 180 bp intron II from ERCC1 (Fig. 2a). The sequence from 181 to 322 is the 5' portion of exon III. The donor splice site in the published sequence (11) of this exon matches with one of the primers (GGTGAG). When compared with the published sequence of this exon (2, 11), it can be seen that the donor splice site sequence and 71 bp from the 3' end of the exon are missing from the cloned insert. The missing segment is an indication of the extent of sequence loss attributable to the mung bean nuclease treatment of the double-stranded hn-cDNA needed to remove hairpin loops at the 3' end of the first strands (1).

14

Liu et al.

1

TGCTTGAACC

CGGGAGGTGG

AGGTTGCAAT

GAGCCAAGAT

AGTGCCACTA

51

CACTCCAGTC

TGGGTGACAG

AGCAAGACTG

TCTCAAAAAA

AGAAAAAGAA

I01

AGGGGAGAGG

AACTCACAGG

GCCTCAGATG

TCCTCTGCTC

ACCCCAACCA

151

GTCCCCTGCA

AACTCCCTTT

CTCCCCACAG

GCCAAGCCCT

TATTCCGATC

201

TACACAGAGC

CTTCCCACTG

TGACACCTCG

CCCAGGCGGC

CCCTCAGACC

251

TACGCCGAAT

ATGCCATCTC

ACAGCCTCTG

GAAGGGGCTA

GGCCACGTGC

301

CCCACAGGGT

CAGAGCCCCT

GG

(a)

(b)

(c)

1

GGAAGATCCT

GTATGACCAG

GCGGTGGCTC

ACGCTGTAAT

CCCAGCACTT

51

TGAGAGGCCG

AGGTGGGCAG

ATCACCTGAG

GTTGGGAGGA

GAATTGCTTG

i01

AACCCAGTAG

GCAGAGGTTG

CGGGGAGCCA

AGATCTCGCC

ATGTGCACTC

151

CATCCTGGGG

AACAAGAGCG

AAACTCCGTC

TCAAAAAAAA

AAAAAAAGAT

201

CCCTGTGGCT

CTGTGCGTGG

AGTAGAGTGG

GAAGACGAGA

GGGATGACAA

251

ATGTCCAGGC

AAGGGAGGGT

GGGGGCTTGG

GAAAAGGTGG

AGGTAGCCCA

301

CTACCCTCCC

CGACTCCTGC

TCCTCACCTG

TTTTTTGGCC

CTGCTCAGCT

351

CCAGGTCAAG

TGCTTGGGCC

AGCTCC

1

GGTGAGACCA

GCCTGGCCAA

CATGATGAAA

CCCCATCTCT

ACTAAAAATA

51

CAAAAAATTA

GCTGGGCATG

GTGGCAGGCA

CCTGTAATCC

CAGTTACTTG

i01

GGAGGCTGAG

GCAGGAGAAT

TGCTTGAACC

CAGGAGGTGG

AGGTTGCAGT

151

GAGCTGAGAT

TGTGCCACTG

CACTTCAGCC

TGGGGCACAG

AGTGACTCTG

201

TCAAAAAGGAAAGGCAAAAA

CCGGACCCTG

GAGCTCTGTG

GGCCCTGATT

251

GCAGGCCCCT

CCCTGTGTGA

TGAGGGATGA

GTCAGCTTGG

GGAGCCACTT

301

GGCCAGCCTC

GATGCCTGGT

CATCAGGAGG

TTGAGAAAAG

CCTTGAAAGT

351

TGTGAAGCAC

TGAGCAGATA

CAAGTGTTTG

TTAATTTTGT

AGAAATTAAC

401

TTTCCCATCT

GGCCTTTATC

TCCAAAGTTG

GTATTGTTGC

TCCACGCAGA

451

CTGCAGTCCG

GTTTCTTATT

GTGACTTTGA

CAATGCAAAG

TGCAAAATAG

501

GAAGAGGCCT

CCCCACCCCA

CTATCCCACC

CTGGTGACCA

CCCCC

Fig. 2. Nucleotide sequences of three clones isolated from the hn-cDNA library: (a) 2B3, (b) 4-6, and (c) 4-3. The Alu sequences are in italics; the putative 3' splice sites are underlined; and the putative exon sequences are in

boldface.

hn-cDNA Library Efficiency

15

RT/PCR Approach to Study Expression intron II, respectively, while the reverse of hn-cDNA Clones. From the sequence complementary primer, hERCC-3, identifies a site in exon III (see Materials and Methods for the actual sequences). Since exons I1 and III are separated by approximately 2.3 kb of intron II in genomic DNA or in cDNA generated from hnRNA (hn-cDNA), whereas the distance between the two primers in cDNA from processed RNA is only 210 bp, most, if not all, PCR products generated, under the PCR conditions employed, by primer pair hERCC-1 and hERCC-3 would be amplified from cDNA templates made from mature, spliced mRNA. However, primer pairs hERCC-2 and hERCC-3 could amplify from genomic DNA and/or hncDNA but not from cDNA made of mature mRNA, since the hERCC-2 sequence is removed from those molecules by splicing. As shown in Fig. 3B, lanes 3 and 4, fragments of expected sizes (210 bp from the

anatomy of the inserts in the hn-cDNA clones, it is possible to develop a more sensitive way, than Northern blots, of identifying whether a sequence is truly transcribed. For this, reverse transcription from total cellular RNA of the hybrid from which the library was derived (20XP3542-1-4), coupled with polymerase chain reaction (RT/PCR), was attempted (12, 13). As a model to test the system, primers representing clone 2B3 (containing an insert from the weakly expressed ERCC1 gene), for which both intron and exon regions have been sequenced, were used for the PCR. Three oligo sequences for PCR capable of amplifying from cDNA generated from immature as well as processed message were identified. As shown in Fig. 3A, complementary primers hERCC-1 and hERCC-2 identify sites in exon II and

A !

2

3

[

hnRNA

m .A

I

l

5I m

_1

Fig. 3. R T / P C R with ERCC1 primers. (A) Relative position and direction of priming of PCR primers hERCC- l (1), hERCC-2 (2), and hERCC-3 (3) in ERCC1 h n R N A and m R N A molecules. (B) Ethidium bromide-stained products of PCR reactions following gel separation. Two micrograms of total cellular R N A from 20XP3542-1-4 (lanes 3, 4, and 5), and CHO (2) was used. Lane 1 contains molecular size markers. Primers hERCC-1/hERCC-3 were used in lane 3 and hERCC-2/hERCC-3 primers were used in lanes 2, 4, and 5. The reaction in lane 5 was the same as in lane 4 minus the reverse transcriptase.

i6

hERCC-1/hERCC-3 primers representing the product from mature cDNA, and 158 bp from the hERCC-2/hERCC-3 primers representing products from hnRNA and/or contaminating hybrid cell DNA) were generated. To rule out the possibility that hERCC2/hERCC-3 PCR product in lane 4 was from genomic DNA contamination, an identical experiment was performed without addition of reverse transcriptase. Primer pair hERCC-2 and hERCC-3 did not generate any visible product (lane 5), indicating the amplified fragment above was truly from hn-cDNA, not from contaminating genomic DNA. The absence of any PCR products from lane 2 (with CHO cDNA) indicates that these are the products of the human RNAs produced in the hybrid cell. These results indicate that the transcriptional status of inserts in hn-cDNA clones may be readily identified by this assay in any cell line or tissue. DISCUSSION

The results of the experiments reported here modify the conditions for, and determine the limits of, the hn-cDNA procedure. The computer analysis indicated that primers CTTACC, CTCACC, and CCTACC would identify 50% of the donor sites in a set of 22 randomly selected genes. Since there are usually multiple introns per gene, and since many of these are expected to contain human-specific repetitive sequences, we can consider the procedure efficient for priming cDNA synthesis from at least one exon of many genes. Furthermore, the finding that 42% of the total sequences identified by that primer set were at the splice donor sites of the genes in the data base, coupled with the observation that the average intron length is at least eight times that of exons (14), suggests that the procedure offers an approximately fourfold enrichment for the isolation of expressed sequences over the use of random priming or other approaches de-

Liu et al.

signed to pull out human transcribed sequences from hybrid ceil by using human specificAlu primers for cDNA synthesis (15). At the same time, the fact that > 50% of the sites identified by the primer set were not at splice donor sites is now recognized as the probable basis for the fact that only four of the nine initial hn-cDNA library clones hybridized to RNA on Northern blots isolated from the hybrid cell line from which the library was constructed (1). Previous studies have shown that reverse transcriptase could initiate cDNA synthesis when only very limited D N A - R N A annealing has taken place and some mismatch could be tolerated by the enzyme (16). Therefore, the fidelity of hexamers for priming cDNA synthesis was an issue, Primer extension experiments identified conditions at which the primers would initiate cDNA synthesis at perfectly complementary sequences at least 60 times more frequently than at any other positions even with mismatch by only one nucleotide. Human Cot 10 DNA was found to be most efficient in screening the hn-cDNA library for human inserts since it contains only highly repetitive sequences, reducing the possibility of nonspecific hybridization or other contributions to background. Since housekeeping genes comprise functions that are needed in all cell types and are expressed at a level of only a small number of copies per cell--so-called scarce mRNA class (17)--the ability to isolate an hn-cDNA clone containing a partial ERCC1 gene after screening only 150,000 total (human and CHO containing) library colonies indicated that the approach is sensitive enough to isolate genes expressed at very low levels as long as the genes contain introns and repetitive sequences. The sequences of several library clones that gave positive Northern blot hybridization results contained segments as predicted: a presumed intron at one end with partial Alu sequences, and a possible exon at the

hn-cDNA Library Efficiency

other separated from the first end by a fragment matching with splice acceptor consensus sequences. Although not sequenced, some of those clones giving negative RNA hybridization results may come from intron regions alone (as indicated above) or contain an exon too small for efficient hybridization to weakly expressed housekeeping genes (18, 19). To remove sensitivity of detection of message as a future issue, it has been shown that PCR analysis is more sensitive for gene expression detection than Northern blot hybridization (20, 21). The study in this report confirmed that PCR is especially useful for detection at the hnRNA level. The presence of hnRNA copies of human ERCC1 in the total cellular RNA of 20XP3542-1-4 was unambigously detected by PCR (Fig. 3B). Since all hn-cDNA clones are derived from parts of primary transcripts, albeit some are just introns, the PCR technique is capable of confirming their hnRNA origin by detecting their presence in total cellular RNA. Future studies carrying out the procedure on cytoplasmic as well as whole cellular R N A should determine whether the sequences of any such clone isolated, around which PCR primers were designed, brings up signal from mature or unprocessed RNA. In summary, this study showed the approach is a simple, efficient, and effective method to isolate directly chromosome region-specific human transcribed genes. The isolated hn-cDNA clones can be used to: (1) identify corresponding full-length cDNA clones by screening a regular cDNA library; (2) isolate corresponding genomic sequences cloned in cosmids or YACs (yeast artificial chromosomes); (3) study the expression regulation of the represented genes; (4) study the inheritance of the loci in the human population since the poly(A) tract at the end of the Alu sequences (present in most isolated human hn-cDNA clones) has been shown to be frequently polymorphic (9); and (5) function as anchor points for the genome

17

mapping effort since the hn-cDNA inserts are the right size for quick sequencing to identify STSs (sequence tagged sites), which has been proposed as a common language for physical mapping of the human genome (22). The limitations of the method are: (1) only genes with introns can be isolated; (2) the introns have to contain repeat sequences, while at the same time contain a donor splice site sequence complementary to one of the primers; and (3) only those genes expressed in hybrids could be isolated. The first limitation should not be a serious problem since it has been shown that most structural genes contain introns (14, 23). As for the second point, it has been shown that the gene-rich regions of human chromosomes (R bands) are also enriched for the Alu sequences (24), which are most likely located in introns of the genes. Indeed, it has been shown that almost every hnRNA species contains at least one repeat sequence

(25). With respect to the human gene expression issue, all, or at least most, housekeeping genes present on the retained human genomic elements are expressed in hybrids (26). Some tissue-specific genes may also be expressed, depending on the parental cells involved in the fusion. The expectation is that the gene expression pattern of the host cell (the rodent cell fusion partner maintaining its full chromosomal complement) will cause the expression of human genes consistent with the host cell pattern of gene expression (27-30). Recent studies indicated that a variety of tissue-specific genes are expressed at very low levels in other tissues (31, 32). Therefore, many human tissue-specific genes could be expressed in hybrids as well. Moreover, earlier studies have shown that hnRNA complexity is much higher than that of mRNA (33, 34). Not all of the differences could be accounted for by the introns, implying that some hnRNA species are degraded entirely in the nuclei. Some RNA species may be only functioning and metabo-

18

lized entirely inside the nuclei. Therefore, the hnRNA of a particular cell type may contain transcripts of genes not present in the mRNA of the same cell type. The above-mentioned phenomena point to the possibility that this approach could provide not only an effective way of gene isolation enriched for specific regions of genome, but also a new approach to study gene expression regulation, i.e., the posttranscriptional regulation reflected by the relative abundance of an RNA sequence between nucleus and cytoplasm. ACKNOWLEDGMENTS

This work was supported in part by grants GM34936 from the NIH and DEFG05-91ER61239 from DOE, and a gift from Mr. Kenneth D. Muller. LITERATURE CITED 1. Liu, P., Legerski, R.J., and Siciliano, M.J. (1989). Science 246:813-815. 2. van Duin, M., de Wit, J., Odijk, H., Westerveld, A., Yasui, A., Koken, M.H.M., Hoeijmakers, J.HJ., and Bootsma, D. (1986). Cell 44:913-923. 3. Sealey, P.G., Whittaker, P.A., and Southern, E. (1985). Nucleic Acid Res. 13:1905-1922. 4. Villarreal-Le~% G., Ma, T.S., Kerner, S.A., Roberts, R., and Perryman, M.B. (1987). Biochem. Biophy. Res. Commun. 144:1116-1127. 5. Kawasaki, E.S., Clark, S.S., Coyne, M.Y., Smith, S.D., Champlin, R., Witte, O.N., and McCormick, F.P. (1988). Proc. Natl. Acad. Sci. U.S.A. 85:56985702. 6. Padgett, R.A., Grabowski, P.J., Konarska, M.M., Seller, S., and Sharp, P.A. (1986). Annu. Rev. Biochem. 55:111%t150. 7~ Efstratiadis, A , Kafatos, F.C., and Maniatis, T. (1977). Cell 10:571-585. 8. Heindell, H.C., Liu, A., Paddock, G.V., Studnicka, G.M., and Salser, W.A. (1978). Ceil 15:43-54. 9. Economou, E.P., Bergen, A.W., Warren, A.C., and Antonarakis, S.E. (1990). Proc. Natl. Acad. Sci. U.S.A. 87:2951-2954. 10. Stallings, R.L., Olson, E, Strauss, A.W., Thompson, L.H., Bachinski, and Siciliano, M.J. (1988). Am. J. Hum. Genet. 43:144-151. 11. van Duin, M., Koken, M.H.M., van den Tot, J., ten Dijke, P., Odijk, H., Westerveld, A., Bootsma, D.,

Liu et al.

and Hoeijmakers, J.H.J. (1987). Nucleic Acid Res. 15:9195-9213. 12. Kawasaki, E.S., and Wang, A.M. (1989). In PCR Technology, Principles and Applications for DNA Amplification, (ed.) Erlich, H.A. (Stocton Press,

New York), pp. 89-97. 13. Singer-Sam, J., Robinson, M.O., Bellve, A.R., Simon, M.I., and Riggs, A.D. (1990). Nucleic Acid Res. 18:1255-1259. 14. Hawkins, J.D. (1988). Nucleic Acid Res. 16:98939908. 15. Corbo, L., Maley, J.A., Nelson, D.L., and Caskey, C.T. (1990). Science 249:652-655. 16. Mendelman, L.V, Petruska, J., and Goodman, M.F. (1990). J. BioL Chem. 265:2338-2346. 17. Lewin, B. (1990). In Genes IV, (Oxford University Press and Cell Press), pp. 478-480. 18. Riordan, J.R, Rornmens, J.M., Kerem, B., Alon, N., Rozmahei, R., Grzelczak, Z., Zielenski, J., Lok, S., Plavsic, N., Chou, J., Drumm, M.L., Iannuzzi, M.C., Collins, F.S., and Tsui, L. (1989). Science 245:1066-1073. 19. Rommens, J.M., Iannuzzi, M.C., Kerem, B., Drumm, M.L., Melmer, G., Dean, M., Rozmahel, R., Cole, J.L., Kennedy, D., Hidaka, N., Zsiga, M., Buchwald, M., Riordan, J.R., Tsui, L., and Collins, F.S. (1989). Science 245:1059-1065. 20. Koos, R.D., and Olson, C.E. (1989). MoL Endocrinol. 3:2041-2048. 21. Rappolee, D.A., Wang, A., Mark, D., and Werb, Z. (1989).J. Cell. Biochem. 39:1-11. 22. Olson, M., Hood, L., Cantor, C., and Botstein, D. (1989). Science 245:1434-1435. 23. Naora, H., and Deacon, N.J. (1982). Proc. Natl. Acad. Sci. U.S.A. 79:6196-6200. 24. Korenberg, J.R., and Rykowski, M.C. (1988). Cell 53:391-400. 25. Robertson, H.D., and Dickson, E. (1984). MoL Ceil. BioL 4:310-316. 26. Davidson, R.L. (1974). Annu. Rev. Genet. 8:195218. 27. Deisseroth, A., Velez, R, and Nienhuis, A.W. (1976). Sc&nce 191:1292-1293. 28. Siciliano, M.J, Bordelon, M.R., and Kohler, P.O. (1978). Proc. Natl. Acad. Sci. U.S.A. 75:936-940. 29. Papayannopoulou, T., Lindsley, D., Kurachi, S., Lewison, K., Hemenway, T., Melis, M., Anagnou, N.P., and Najfeld, V. (1985). Proc. NatI. Acad. Sci U.S.A. 82:780-784. 30. Blau, H.M., Pavlath, G.K., Hardeman, E.C., Chiu, C.-P., Silberstein, L., Webster, S.G., Miller, S.C., and Webster, C. (1985). Science 230:758-766. 31. Chelly, J., Concordet, J., Kaplan, J., and Kahn, A. (1989). Proc. Natl. Acad. Sci. U.S.A. 86:261%2621. 32. Sarkar, G., and Sommer, S.S. (1989). Science 244:331-334. 33. Lewin, B. (1975). CEU4:77-93. 34. Lewin, B. (1980). In Gene Expression Vol. 2, Eucaryotic Chromosomes (John Wiley & Sons, New York), pp. 728-760.

Efficiency and limitations of the hn-cDNA library approach for the isolation of human transcribed genes from hybrid cells.

The use of splice donor site consensus sequences as primers in cDNA synthesis (to make a cDNA library from heterogeneous RNA or unprocessed transcript...
1MB Sizes 0 Downloads 0 Views