GENOMICS
10,390-399
(1991)
Serial Ah Sequence Transposition Interrupting a Human B Creatine Kinase Pseudogene TONY S. MA,*,’ JONAH IFEGWU,* LAURA WATTS,* ROBERT ROBERTS,* AND M. BENJAMIN
MICHAEL J. SIcuwo,t PERRYMAN*
*Department of Medicine, Molecular Cardiology Unit, Baylor College of Medicine, One Baylor Plaza, Room 506C, Houston, Texas 77030; and fDepartment of Molecular Genetics, M.D. Anderson Cancer Center, Houston, Texas 77030 Received
October
2, 1990;
Press, Inc.
INTRODUCTION
Creatine kinase (CK; EC 2.7.3.2) catalyzes the reversible storage of ATP as creatine phosphate and its regeneration to maintain a high ATP/ADP ratio. In mammalian cells the cytoplasmic isoenzymes are dimers of M and B subunits, and all three isoforms, MM, MB, and BB creatine kinase, can be detected in a tissue-specific distribution (Watts, 1973). In addition to the cytoplasmic isoenzymes, at least two mitochondrial isoforms are present in mammalian tissues, the ubiquitous and striated muscle isoforms (Grace et al., 1983; Haas et al., 1989; Haas and Strauss, 1990; correspondence
and reprint
0888-7543/91 $3.00 Copyright 0 1991 by Academic Press, Inc. All rights of reproduction in any form reserved.
requests
should
December
5, 1990
Hossle et al., 1988). We have isolated and sequenced M (CKM) and B (CKB) creatine kinase cDNAs from human tissue (Perryman et al, 1986; Villarreal-Levy et aZ., 1987), and two other laboratories have also published cDNA sequences for the human B creatine kinase isoform (Kaye et al., 1987; Mariman et al., 1987). Using isoform-specific probes, we have demonstrated that both of the CKM and CKB genes exist in single copy in the human genome (Villarreal-Levy et al., 1987). DNA probes derived from these cDNA clones, however, detect an additional creatine kinase gene in the human genome. Similar observations have been reported by Kaye et al. (1987). We have cloned and characterized this gene and determined it to be a CKB pseudogene. A unique feature of the organization of the pseudogene is an insertion of three tandem Ah repetitive elements into the coding region. Ah repetitive elements have been demonstrated with high-resolution in situ hybridization techniques to be nonrandomly distributed in the human genome (Korenberg and Rykowski, 1988). Nucleotide sequence analysis of the Ah repetitive elements in the CKB pseudogene suggestsone mode of integration wher’eby Ah repetative elements are nonrandomly placed into the human genome.
We have isolated, sequenced, and characterized a singlecopy B creatine kinase pseudogene. The chromosomal assignment of this gene is 16~13 and a unique sequence probe from this locus detects EcoRI restriction fragment length polymorphisms of 7.8 and 6.4 kb. In 26 unrelated individuals, the frequencies for the 7.8- and 5.4-kb B creatine kinase pseudogene alleles were calculated to be 17.3 and 62.7%, respectively. The B creatine kinase pseudogene is interrupted by a 904-bp DNA insertion composed of three Alu repeat sequences in tandem flanked by an 18-bp direct repeat, derived from the pseudogene sequence. Nucleotide sequence analysis of the Alu elements suggests that the Alu sequences were incorporated into this locus in three separate integration events. Several complex clustered Ah repeat sequences without defined integration borders have been previously identified at different genomic loci. This is the first evidence that complex tandem Alu elements can integrate in an apparently serial manner in the human genome and supports the contention that Alu repeats integrate nonrandomly into the human genome. o 1991 Academic
1 TO whom dressed.
revised
METHODS
Creatine Kinuse-Specific Probes A 182-bp HaeIII-HaeIII restriction fragment of the human CKB (position 1177-1358) and a 135bp HaeIII-HaeIII restriction fragment of the human CKM 3’-untranslated region (position 12801414) were subcloned into pUC9 and designated pHCKB3UT and pHCKM3UT, respectively. Probes made from these subcloned fragments were highly specific and hybridized only with the appropriate CKM or CKB cDNA. A 134-bp HaeIII-Hue111 restriction
be ad-
390
THREE
Alus IN TANDEM
WITHIN
fragment of the human CKB cDNA (position 657790) was similarly subcloned and used as a coding region nonspecific probe (designated pHCKB-134) that cross-hybridized with CKM and CKB sequences under nonstringent conditions. A 619-bp SmaI-PstI restriction fragment of the 5.4-kb B creatine kinase pseudogene (position 804-1422 of the sequence as reported below) was used as a specific probe for this gene and designated pHCK4-618. Genomic DNA Purification Hybridization Analysis
and Southern Filter
High-molecular-weight genomic DNA was isolated by the Proteinase K-Sarkosyl method (Maniatis et al., 1982). Complete restriction endonuclease digestion was achieved using 5 units of enzyme/pg DNA and overnight incubation. The digests were electrophoresed on 0.8% agarose gels and transferred to nylon membranes (Zetaprobe, Bio-Rad) using the alkaline transfer method of Reed and Mann (1985). Hybridization conditions were 1.5~ SSPE, 1% SDS, 0.5% milk powder, 500 pg/ml salmon sperm DNA, 10% (w/v) dextran sulfate, at 68°C (or 55°C for cross-species hybridization), overnight with 6-8 X lo6 cpm/ml of 32P-labeled probe. Stringent washing conditions were as follows: 2~ SSC, 1% SDS for 15 min twice at room temperature, followed by 0.1X SSC, 0.1% SDS for 60-90 min at 65°C. Reduced stringency wash conditions included a final wash at 1X SSC, 0.1% SDS for 60 min at 55”C, or as specified under Results. The nylon was exposed to X-ray film with two intensifying screens (DuPont) at -70°C. Probe Labeling DNA fragments were labeled by the random-primer labeling technique (Feinberg and Vogelstein, 1984) to a specific activity of 10’ cpm/wg. The probe was purified by NenSorb (DuPont) column chromatography before use in hybridization experiments. Oligonucleotides were end-labeled using T4 polynucleotide kinase (Maniatis et al., 1982) to a specific activity of 10’ cpm/pg. The probe was purified by NenSorb chromotography as above before hybridization. Enriched Genomic Libraries of Individual Kinase Genes
Creatine
Four hundred micrograms of human placenta DNA was digested to completion with EcoRI restriction endonuclease. The digests were electrophoresed on 0.8% agarose gels at 2 V/cm for 36 h. Regions of the gel corresponding to the size of DNA fragments hybridizing to creatine kinase-specific probes were recovered from the gel by electroelution and separately inserted into appropriate cloning vectors [EMBL4 or
A HUMAN
PSEUDOGENE
391
Lambda-Zap (Stratagene)] to make a sequence-specific enriched genomic library using the GigapackGold (Stratagene) in vitro packaging system.
Screening, Subcloning, and Sequencing of the B-l&e Creatine Kinase Gene The creatine kinase-enriched genomic libraries were screened with a CKB cDNA probe containing 538 bp of the coding region and all 204 bp of the 3’-untranslated region. Lambda clones were amplified and harvested using Lambda Sorb (Promega). The insert was released by EcoRI digestion and subcloned into plasmid vectors using standard protocols (Maniatis et al., 1982). Plasmid DNA sequencing was performed by the double-strand dideoxy chain terminating method using the T7 Sequenase Kit (Pharmacia).
Human Chromosomal Localization Chromosomal localization was accomplished essentially as described by Stallings et al. (1988). The somatic cell hybrid clone panels consisted of 17 clones and a HeLa cell control. The blot was hybridized with a 618-bp SmaI-PstI fragment of the 5.4-kb B-like gene clone under stringent conditions for chromosomal localization.
Polymerase Chain Reaction (PCR) Amplification the Tandem Alu Repetitive Element
of
Two 30-mer oligonucleotide primers with 5’ added restriction sites were constructed from unique sequences from the CKB pseudogene flanking the Alu insertional element: 5’-ccgaattcatcaggctgccccacctgggca-3’ (sense); 5’-ccaagcttatctgcaccagcaccacctctg-3’ (antisense). The PCR mixture in a total volume of 20 ~1 contained 0.2 pg human genomic DNA; 1 &f sense and antisense primers; 50 mM Tris, pH 8.8; 10 mM (NH,),SO,; 5 nnJ4 MgCl,; 14 mM mercaptoethanol; 5 @4EDTA; 100 mg/ml BSA, 1 unit Tuq DNA polymerase; 1 mM dNTP. The PCR protocol was as follows: 96”C, 1 min; 68°C 7 min. The amplification was performed for 30 cycles. The PCR products were separated on a 1% agarose gel and transferred to a Zetaprobe membrane. The blot was hybridized with 2 X lo6 cpm/ml 32P-end-labeled Alu-specific DNA probe (Nelson et al., 1989), in 5~ SSC, 5% SDS, 5~ Denhardt’s solution with 100 pg/ml of sheared denatured salmon sperm DNA at 42°C. The membranes were washed at room temperature in 1X SSC for 1 h, followed by an additional wash at 42°C for 30 min.
MA
ET
AL.
share considerable sequence similarity with the CKB cDNA and hybridize with the CKB-specific probe under nonstringent conditions. The HindIII, XbaI, and BgtII restriction digests show only a single band for CKM and for CKB, as well as for the B-like creatine kinase gene. These results indicate that in addition to the single-copy CKM and CKB gene, there exits another single-copy B-like creatine kinase DNA sequence in the human genome.
EcoRI Restriction Fragment Length Polymorphisms with B and B-like Creatine Kinase Gene
2.32.0-
1
Probe: Coding CKB Stringency: Low
Probe: CKB Stringency:
*
3’UT Low
FIG. 1. Human creatine kinase gene family. (A) Southern analysis of human genomic DNA digested with EcoRI, HindIII, XbaI, and BgflI using a 135-bp B creatine kinase coding region probe shows cross-hybridization with M, B, and B-like creatine kinase sequences in the human genome. For the EcoRI digest (first lane), the CKM gene resides on the 23-kb fragment, and the CKB gene resides on fragments with an EcoRI restriction polymorphism (RFLP) of 16.5 and 12 kb. The boxed signals represent the B-like creatine kinase gene with RFLP of 7.8 and 5.4 kb. (B) Same blot hybridized with 3’UT CKB-specific probe under reduced stringency shows only hybridization signals from the B and the B-like locus.
RESULTS
Human
Creatine Kinuse Gene Family
Using isoform-specific probes, derived from unique 3’-untranslated sequences of the corresponding cDNAs, we have demonstrated that both CKM and CKB are single-copy genes in the human genome (Villarreal-Levy et al., 1987). When a nonspecific creatine kinase coding region probe was used to probe Southern blots of human genomic DNA, however, three groups of DNA sequences were identified in the human genome (Fig. 1A). EcoRI restriction endonuclease digests human genomic DNA to produce distinctive DNA fragments representing each member of the cytosolic creatine kinase gene family; the CKM gene resides on a 23-kb fragment, the CKB gene is associated with an EcoRI restriction polymorphism of 16.5 and 12 kb, and a third distinct sequence, again showing EcoRI restriction polymorphism, is identified at fragment sizes of 7.8 and 5.4 kb (Fig. 1). By Southern blot analysis these B-like genomic sequence
The CKB gene has an EcoRI restriction fragment length polymorphism (RFLP) of 16.5 and 12 kb and the B-like creatine kinase gene, similarly, has an EcoRI RFLP of 7.8 and 5.4 kb. With DNA isolated from 26 unrelated individuals, the frequencies of the 16.5 and 12-kb B creatine kinase alleles were calculated to be 38.5 and 61.5%, respectively. The frequencies for the 7.8- and 5.4-kb B-like creatine kinase alleles were calculated to be 17.3 and 82.7%, respectively.
B-like Creatine Kinase Gene Isolation Characterization
and
B and B-like creatine kinase gene sequences could be separated by agarose gel electrophoresis, allowing the construction of a specific sequence-enriched genomic library. We screened 600,000 recombinant clones constructed in the EMBL4 phage vector containing inserts of 12 to 18 kb and obtained 4 positive clones, representing the human CKB genomic sequences. Studies of these clones are not reported in the present communication. Similarly, there were 3 positive clones from 600,000 recombinant clones constructed in the Lambda-Zap vector with inserts in the range 4-6 kb and 33 positive clones from 600,000 recombinant clones in the range 6-9 kb. These represent the alleles of the B-like creatine kinase genomic sequences. There were also 12 weakly hybridizing clones from the fragments in the range 6-9 kb, which were not further characterized. One of the B-like creatine kinase clones, containing a 5.4-kb insert and representing the smaller allele of the B-like gene, was subloned into pGEM3Z (Promega) for detailed analysis and designated pHCK4. A 618-bp SmuI-PstI restriction fragment of the B-like gene was used to verify the isolation of the correct genomic fragment and for use as a B-like creatine kinase gene-specific probe for Northern blot analysis. This fragment hybridizes to the same B-like creatine kinase gene sequences identified in the original EcoRI restriction digests of human DNA, and under stringent conditions does not cross-hybridize with the CKB gene sequences (results not shown).
Lb
0
I
2
3
THREE
Ah
4
5
IN
TANDEM
WITHIN
Poly-A
AAGAGTCTGTGGCT AAGAGACTGTGACT
FIG. 2. B creatine kinase pseudogene structure. A 5.4-kb DNA fragment corresponding to the 5.4-kb signal on the EcoRI restriction digest was cloned into Lambda-Zap vector. Selected restriction sites on the insert are shown. The structure of the B-like creatine kinase gene was inferred by comparison to the CKB cDNA sequence. The corresponding sequences are represented by open rectangles. Filled arrows represent Ah repetitive elements. The locations of ATG, AATAAA, and poly(A) sites are designated, as are the flanking direct repeats of the inserted Ah complex.
Chromosomal Localization Kinase Gene
of the B-like
Creatine
Using a human-hamster hybrid cell line chromosomal panel, we determined that the B-like creatine kinase gene is localized to chromosome 16~13. Among the members of the human-hamster hybrid clone panel, the lowest level of discordancy of the human band that hybridized with the probe to any of the isozyme or molecular chromosomal markers studied was 6%. The marker with that low level of discordance was PGP, which has been mapped to the short arm of human chromosome 16,16p13.3 (Hyland et al, 1989). The level of discordance with the q-arm marker used in the study, DfA4 at 16q12-q22 (Lavinha et al., 19&Q, was equal to the next lowest level of discordancy of any of the other markers, which ranged from 24 to 59%. These data confirm the provisional assignment of the gene to chromosome 16 and suggest its p-arm regional localization.
B-like Creatine Kinuse Gene Structure Figure 2 shows the locations of selected restriction enzyme sites in the B-like creatine kinase gene clone and the location of Alu repeat sequences. Of the 5400bp total in the insert, the DNA sequence between positions 2420 and 5400 was completely determined (Fig. 3). This region of the clone contains the entire pseudogene sequence. For clarity in this communication, the DNA sequence at position 2421 of the 5400bp insert, corresponding to the SmaI site, is designated position 1. The organization of the gene is determined by comparison to the B creatine kinase cDNA sequence (Villarreal-Levy et al., 1987).
A HUMAN
393
PSEUDOGENE
The B-like creatine kinase gene has an AUG initiation codon at positions equivalent to those of the B creatine kinase cDNA. It is characterized by what initially seemed to be a two-exon structure, in addition to multiple small deletions and insertions. The overall length of this gene product, if functional and counted from the ATG codon, is estimated to be 1296 bp, which is 78 bp shorter than the B creatine kinase transcript. The termination codon location of this gene, when matched to that of the CKB cDNA, corresponds to the position of a small deletion. Identical localization of the polyadenylation signal and poly(A) tail are present. In the 2980 bp sequenced from the 5.4-kb genomic fragment, there are four Alu repetitive elements, one of which is located 5’ to the pseudogene. The B-like creatine kinase gene itself is interrupted by 904 bp of DNA representing three Alu repetitive elements in tandem in the same orientation. The insertion of the 904-bp Alu repetitive elements is bordered by an 18bp direct repeat. The overall sequence homology between this gene and the CKB cDNA, starting at the initiation codon ATG and ending before the poly(A) tail, excluding one of the duplicated 18-bp direct repeats and the Alu insertion complex, is 77.6%. The homology is adversely affected by the presence of multiple small deletions and less so by insertions of l-3 bases.
Creatine Kinase Gene Families
in Other Species
To determine whether the CKB gene has undergone a similar duplication event in other species, we compared the restriction patterns of genomic DNA prepared from several different species, including human, dog, mouse, and rat. Figure 4 shows that B-like creatine kinase genes are present in all species examined. Dog DNA shows unexpected complexities, suggesting multiple duplication events, the number of which cannot be determined by Southern blot analysis.
Alu Tandem Repeats in B Creatine Kinuse Pseudogene Are Not Polymorphic and Are Homozygous in the Human Population To verify that the AZu tandem repeats in the CKB pseudogene are not cloning artifacts and to examine possible polymorphisms of this structure in the population, the Alu tandem repeats were amplified by PCR using unique sequence primers flanking the insertion. Figure 5 shows that the tandem repeats are present in five unrelated individuals and in an additional six unrelated individuals that were tested (results not shown). No polymorphism was apparent by gel electrophoresis analysis. This indicates that the Ah repet-
394
MA ET AL.
itive elements contained in the B creatine kinase pseudogene are homozygous in the population. DISCUSSION
We have identified a single-copy B creatine kinase pseudogene in the human genome. This is demonstrated by the presence of multiple small deletions, insertions, and point mutations in its nucleotide sequence. The initiation codon ATG is present but the location of the corresponding termination codon is displaced by a short deletion. The canonical polyadenylation signal AATAAA is represented and followed at the appropriate distance by the poly(A) tail indicative of a processed pseudogene. A 904-bp insertion present in the B-like creatine kinase pseudogene sequence has been determined to be three tandem Ah elements aligned in the same orientation, flanked by an 18-bp direct repeat corresponding to the bordering CKB pseudogene sequence. The Ah elements abut the direct repeat sequences at each end, suggesting that the direct repeats are target-generated and are created by the transposition of the complex Ah element. Ah repetitive elements which belong to the SINES family (short mobile elements; Singer, 1982) are thought to be retropseudogenes (Weiner et al., 1986) derived from RNA polymerase III transcripts of 7SL RNA or its descendants (Ullu and Tschudi, 1984). Previously, Ah sequences have been found to be inserted into other identifiable DNA sequences in primate genomes, such as the a-satellite repetitive sequences in African Green Monkey (Grimaldi and Singer, 1982), the 3’-untranslated region of a functional low-density lipoprotein receptor gene (Yamamoto et al., 1984), as well as pseudogenes (Liu and Chan, 1990; Zaborsvsky et al, 1984). In rodents, Kominami et al. (1983) showed that a mouse type 2 Ah sequence is present only in some mouse strains and not in others, suggesting the mobile nature of this element. The fact that an Ah is present at the ,&globin gene locus of gorilla DNA but is absent from the corresponding homologous position of the human and other primate DNA (Trabuchet et al, 1987) and the demonstration of a polymorphic Ah sequence at the Cl gene (Stoppa-Lyonnet et al., 1990) suggest that Ah repetitive elements may have remained mobile throughout primate evolution. Recently, direct demonstration of the mobility of an Ah element in a tissue culture system was reported (Lin et aZ., 1988) but later found to be due to contamination by a plasmid clone, Blur-8 (Lin et al, 1989). The present results, with no ambiguity of the target site and border, support the interpretation of the occurrence of a transposition event in the distant past. The presence of the complex Ah element suggests
additional ways in which complexity of the genome is created. It appears that the B-like creatine kinase gene was the product of a reverse transcription of a processed CKB mRNA that was integrated into a chromosome (chromosome 16) different from the authentic CKB gene, which is located on chromosome 14. Subsequent to this event, an Alu transposition took place and interrupted the pseudogene sequences. The mechanism for Ah transposition is believed to be similar to that of the integration of a pseudogene, that is, the incorporation of a cDNA product of a transcribed gene sequence, in this case the 7SL RNA or its descendants. The mode of integration appears to be distinct from that of the well-characterized transposons and the retroviruses (Shih et al., 1988), and AT-rich sites are proposed to be selected preferentially for integration (Daniels and Deininger, 1985). There is a suggestion that secondary structures of DNA, such as those associated with DNase I hypersensitivity sites, may invite the integration of retroposons (Vijaya et al., 1986). From the same line of reasoning it has been suggested that homopurine-homopyrimidine sequences and triple helix structures may generate a preferential integration site (Liu and Chan, 1990). Recently, using high-resolution in situ DNA hybridization techniques, it has been demonstrated that Ah elements in the human genome are not randomly distributed, but are associated with the reverse bands, or R bands, of the metaphase chromosome (Korenberg and Rykowski, 1988). At the DNA sequence level, although clustering of the Ah repetitive elements has been noted at some genomic locations, tandem repeats have been rare. In this context, it is important to differentiate tandem Ah elements from clustered Ah elements. The latter are frequently found in intragenic regions but often lack the flanking direct repeats indicative of transposition versus integration via a possible recombinational event. In reviewing the literature, we have found six reports of tandem Ah repeats: thymidine kinase gene Ah G and H and Ah K and L (Slagel et al., 1987), tubulin gene Ah E and F (Slagel et al., 1987), cr-globin tandem Ah 3A and 3B (Hess et al., 1983), nucleoplasmin pseudogene Ah 3 and 4 (Liu and Chan, 1990), and prothrombin Ah 1 and 2 (Degen et al., 1983). The mechanism for the generation of tandem repeats has been addressed only to the extent that Ah repeats may have preferentially integrated into the poly(A) tail of another Ah sequence (Rogers, 1985). Recently, Ah repeats have been demonstrated to be divided into at least three subsets, including sets with a “conserved consensus” and a “divergent consensus” (Britten et al., 1988; Jurka and Smith, 1988; Willard et aZ., 1987). Ah repeats may be classified into subsets, which in turn may reflect evolutionary relationships between them.
THREE
20
10
ccc~~~xcx
W~TTGCA 70
Alua
IN
TANDEM
WITHIN
30 40 50 60 GTGAG~~GG ATTGCGCCAC TGUCTCUG TCTGGGTGAC
A HUMAN 1630 AAhhATACM
110 GAMTAMTT
120 CAAMCCAM
140 AhAhCMAG&
160
ATMTATTTT
170 ACATA‘%GCh
180 GCTTCCTTTG
190 GMhhCAGGC
200 210 AGTTCTTTAC AMTTTAAAC
AhA ATAGMCCTG
230 240 GGTGGCTCAC GCCTGTMTC
250 CCAGUCTTT
260 GTTAGGCCGA
270
280 TUCTTGAGG
290 300 TCAGTGACTA GCCTGhCAhh
AGGCGGhTCh
310 CATGGTAAM
320 CTTCGTCTCT GC-TM
330
340 CACAAACATT
350 360 AGCCGGGTGT TGTGGCGGGC
TCTCTACTM
370 TCCTGTMTC
380 CCAGCTACTC
390 GAGAGGCTGA
430 AGTTTGCAGT
440 GAGCCGAGAT
AGGGCCACTG
490 AACATAGTGT
TACCATATGA
130
150 -‘ZAGhA
TTGCTATAGA
GGCGGGTGGA
500
2060 TTCCATCCTG
2070 GGCGACUGA
2080 GcGTAACTCC
510
520 TCMTACTM
530 GMTATGCTC
540 ATM-G
u AAAACGGTCA
2120 2130 GGCGTGGTCG TCACGCCCTG
TMTCTCMA
CCACGCTCAG
GCAGATCACG
2190 AGGTTMGAG
ATCMGACGA
TCCTGGCCAA
2230 ACTMhhhCA
2240 ChhhMTTAG
2250 2260 2270 CTGGGCGTGG TGGCACGCGC CTGTACTCCC
AGCTACACCA
UGAhGAhTA
2310 GCTTGAACCC
AGGAGGAGGA
2330 GGTTGCAGTG
AGCCGAGATA
2360 ACTCCAGCCT
2370 GGCEAUGAG
2380 AGAGhctcc~
TCTCMW
660 GGCGCTGMG
670 CTCGCCTCCC
680 GGCGUGGAC
690 GAGTTTCCCG
ACCTGAGCGG
710 CCACCAAGAC
720 CCAGTGGCCG
730
740 CCAGCTGMC
GCGGAGCTGC
750
800 810 CGCGTGGACA GCCCGGGCCA
700
650
760 770 780 GCGCCAGGGG TGGCTTCGCG CTGGAGGCGC 820 CCCGTACTCA
930 990
940 ACCCAGCTCA
830 840 GGGCCGTGGG CGCGTGGCGG 900
1060
2350 GTGCCACT~~
960
1070
1020
ACTGGTCGGC
GACTGGCGGG
1090 1100 GCGCGCAGGA GChGGACCGh
1110 CGCGGMCAG
1120 CMCAGCAGC
1130 1140 TCCTCGACAG CCACTTCCTC
1150 1160 TTCCACGAK CTGTACCGCC
1170 1160 CCTGCTCCTG GCCTCGGUT
1210 TGGChCAAffi
1230 1240 1250 CCTTCCTGGT GTGGGGCGGG GACGAGGACC
MGTGAGGGA
2470 GGGTCTTTGA
1190 GGCCChhCGC
1060 CCTGGhCGhC
2660 2670 CCAGTGCCTG CCATGCACCC
1200 CCGCGGGATC
2400 *GGAAAAAAA
2450 2460 GGACGTGGCC GCGGTGGGTG
zsoa 2510 GGGGCTTCTC AGAGGTGGTG
2520 CTGGTGCAGA
2560
2570 GTGGCATCAC
2580 CAGGCCAGCC
2610 CCAGCCCGCA
2620 CCCACCACCA
2630 2640 GCCCTTGCTG CTTCCTAACT
1260 ACCTGCGGGC
Oucaide T -GhA
2830
2840 CTGATGAATG
2890
1390 TCUTCCTCA
1400 CCTGCCCTCC
1410 CAACCTGCGC
1420 ACAGGCCTGC
AGOSGCMGT
1450 GGCTGCCCCA
1460 CCTGGGCMG
CACGAGTTCT
1380 TCACCTGGGC
TATGCTAAAT
1430
1440 GTGCACATCA
Direct
2900
2960 GAAAGAAACC
1550 GAWCMGG
1560 CGGGCGGATC
1600 GGCCAACATG
1610 GTGMACCAC
1620 CGTCTGTACT
2620 ACRI\TCCAM
2860 TGTGGTATAT
2910 2920 GTACTGATAG ATGCTATMC 2970 AAACACAAAA
2870
2660
CCATACMTG
GAATATTATT
2930 ATAGThhACC
2940 TTGAAMCAT
2980 GGAATATACT
AAAAAGT AAAAAGT
(582-588) (2800-2806)
Ah insertion:
AAGAGGCTGTGGCTTCAG MGAGTCIGTGGCT AAGAGACXTGACTTCAG
(1482-1499) (1798-1811) (2403-2420)
CornspondingBaclrbv kinase~uetnx:
AA-GACITCAG
MO-997)
repeat
ilmlclng
the
pseudogees:
Lxrccttepatflulkiagthc
AGCACTTTffi
2850 GATMTAAAA
MGGMTUA
2950
1370
1480 CGGAGGTGCT TUrJlGGCTG
2700 AGCCCTTAGC
MTTTTGGCC
TGTACATCAA
2730
2680 2690 CTGATGTCGG CCACCTGGCA
2770 GTGACGCTGA -CTAG
TGTGGMTCC
1590
2340
2140 2750 2760 G C A T T T T T T T T T T T T M T G G TAAGATATTC
1360 GACGAGTTCA
AGACCAGCCT
2280
CACCTTGCTA
1350 GTCTAGGMC
1580
2390
2220 CTCCATCTCT
2710 2720 CTCGCTGTAG AGACTTCCGT
1330 1340 CCCGGTTTGA MCTCTTCAA
MGGAGTTCG
2440
AGATGGAGCA
2650 TATCWCCGG
CAGCUTW
1570
2320
2550 CTGCTCATTG
2600 TMTGCTTGC
2590
1300 1310 1320 GAGGCATTTA CTGCTTCTGC CTCGGCCTCT
ACTTGAGGTC
2490 GCTGACCTCC
2210 CATGGTGAM
CAGACGGTGT
ACCTATGGCC
1290 GCAAACGMG
1540
2480 CGTCTCCAAC
2530 2540 TGAGGGTGGG CGTCATAMG
1270 1280 CATCTCCGTG CMCAGGGGG
u 1510 1520 1530 CTGGGCG-CGG TGGCTCATGC CTGTMTCCC
2200
2430 P
2150 2160 ACTTTGGGAG GCCGAGGCGG
ACTTCTGCCT
CGCCCTAGAG
1470
2300
2190
2100 cmmmcm
CCTGCTGGGA
1030 1040 CCCGTGGGTC CCGCGATGGA
1220 ACAGTATAGA
2290 GAGGCTGAGG
2140
2090 GTCT-
AGGACCGGCA
950 ACCCCGAW
1000 1010 TCGCGGTGCG CACGGCCTCC
1050
2170
600 GG
870 880 090 TGTTCAAGGA TCTCTTCGGC CCCATCCTTG
TGTGTTGAGC
1980 TCCAGCTACT
2050 G~C~ATGTA
ACAGCCGCGG
960 TGGCCCCAAC
1960 1970 ATGGTGGCAC ATGCCTGTM
480 CAAUAChAh
640 CCCTTCTCCG
970 GGCGGCCACC
1950 MTTACTGGC
470 TGGMAhhM
8-CI CCCGCA‘XT.G
TGAGCAUAG
1940 AGATACCMA
1920 GAAMCCCTA
460 CACTCCAGTC
CCACCMGCG
920 AGCCCAGCGA
1910 CCMCGTGGA
1860 AGGCTGAGGC
450
610 AGCTGCGGAC
910
1890 1900 GGAGTTCGAG ACCMCCTGA
1650 GCACTTTGGG
2020 2030 2040 TGGAGGTGGA GGTTGTGGTG ACTGGAGATT
570
CCWCTACA
1930
11140 TGTMTCCU
B
2010 CGGCTTGACC
TAHUGT
850 860 CGACGAGGAG TCCTACGACG
lea0 CCTGAGGTCA
1730 1740 TGCAGTGAGC UAGATCACG
1770 CGATATGTGC GAGATTCAGT CTUMMM
1820 1830 TGGGCGCGGG GGCTCACGCC 1870
1720 AGGCAGAGGT
1680 TACTTGGGAG
1990 2000 CGGAGGCTGA GGCAGAAGAA
420
TTATMCAGT
790
1710 TGMCCTGGG
1670 TMTCCCAGC
CCGGGAGGGT
410
GCAGhhTTAT
CATCCAGACC
1760 CCAGCCTGGG
1660 TGGATGXTG
TCGCTTGMC
400
550 560 TGTACATATG MTGTTGATA
AGGCGCTGCC
1750 CCACTACACT
1650 GGCGTGGTGG
GGCAGGAGhh
TCCCACMTT
620
1640 MATTAGCCG
1690 1700 GGTGAGACAC GAGMTCGCT
80 90 100 MMCCCMA TCCGTCTChh hMGMAhAA
AGAGCCAGAC
395
PSEUDOGENE
FIG. 3. DNA sequences of the B creatine kinase pseudogene. Of the 5400-bp insert, the DNA sequence between positions 2420 and 5400 was completely determined and contains the entire pseudogene sequence. For clarity in this communication, the DNA sequence at position 2420 of the 5400-bp insert, corresponding to the SmaI site, is designated position 1 (Fig. 2). The location of pseudogene outside direct repeats, the Ah insertion flanking repeats, the initiation codon ATG, and the polyadenylation signal AATAAA are underlined. Also shown is the corresponding B creatine kinase cDNA sequence for the 18-bp direct repeat.
The data in this study of the B creatine kinase pseudogene locus show that the repetitive elements represent three Ah repetitive sequences in tandem, in
the same orientation. This could be a result of (1) cointegration of multiple Ah sequence, (2) independent insertion of three Ah repeats in the same region,
396
MA Human
12345678 ---------mm-
rat
mouse
dog 9
10
I1
12
13 14 -m--
15
16
-
23 Lb
-
9.4
-
4.3
-
2.3 2.0
6.5
FIG. 4. B-like creatine kinase DNA sequences in different species. Genomic restriction digests of EcoRI, HindIII, X&I, and BglII from human placenta DNA, dog, mouse, and rat liver DNA were separated on an 0.8% agarose gel and transferred to a nylon membrane. The blot was probed with a 742-bp CKB cDNA probe, which contains a portion of the coding region and the entire 3 ‘UT region (see Methods). The signals represent B and B-like creatine kinase sequences. The 23-kb M gene in the human EcoRI digests hybridizes faintly under such conditions. B-like creatine kinase sequences produce weaker signals than the major CKB gene hybridization signal in each restriction digest. Separate hybridization experiments with a CKM probe demonstrated that these signals, with the possible exception of the rat sequences, do not represent cross-hybridization with the CKM sequences.
ET
AL.
poly(A) tail of Alu-1, and its 3’-end is composed of 4 bp of the beginning of Ah-2. This suggests that a short stretch of sequence homology may have influenced the integration site of Ah-2. The same argument can be applied to Alu-1. The possibility that Alu-1 and Ah-2 were created by an unequal crossover at meiosis between the alleles of an original complex consisting of a single Ah element needs to be considered, particularly in view of the presence of the incomplete 13-bp direct repeat 5’ to Ah-2. This model, however, would demand a close identity between Alu-1 and Ah-2 sequences, since they would have been generated by a duplication event mediated through misalignment of the Alu-1 alleles. We therefore performed pairwise sequence comparisons for the three individual Ah elements. Alu-1 and Ah-2 differ from each other by 20%, Ah-2 and Ah-3 differ from each other by 22%, whereas Ah-3 and Alu-1 differ from each other by 20%. These differences are comparable to that between any two random Ah sequence in the human genome (Kariya et al., 1987) and argue against a recombination event contributing to the generation of this complex. We analyzed the six other reported Ah tandem in-
123456
bp or (3) duplication or unequal crossover of an inserted Ah repeat. The Ah insertions were analyzed with respect to conserved consensus sequences. Alu-1 and Ah-2, based on these analyses, belong to class II Ah repeats (Britten et al., 1988; Jurka and Smith, 1988; Willard et al., 1987), whereas Ah-3 represents a class III Ah, or a more recently evolved class (Table 1). Ah-4, which is 5’ to the pseudogene, also belongs to Ah class II. It has, however, a small 5’ and a 3’ truncation. The possibility exists that this Ah is generated through a recombinational event rather than transposition. Ah-3, being of the later evolved class III Ah, indicates that this Ah repetitive element was integrated later and argues against a cointegration event as a mechanism of the generation of the complex. The retention of the full-size Alu element in each of the three individual Alus within the complex also argues against unequal crossover as the mode of generation of the complex. The preservation of the IS-bp direct repeat at the border of the Ah complex, and the presence of another incomplete 13-bp repeat preceding the Ah-2 element, suggests that the sequences of integrations are serial; Alu-1 preceded Ah-2, and Ah-2 preceded Ah-3. The incomplete 13-bp repeat of the second Ah repeat is particularly interesting, as its 5’-end is composed of two As from the preceding
9416 6557 436 1 2322 2027
1353 1078 872
603 of tandem Alu repetitive elements in human FIG. 5. Detection B creatine kinase pseudogene using PCR. Ah tandem repeats were amplified by PCR using unique sequence primers flanking the insertion. The tandem repeats were identified using an A&specific probe (see Methods) and are present in all individuals tested (total equals 11). Lanes 1 through 5: unrelated individuals; Lane 6: clone pCK4. No polymorphism was apparent by gel electrophoresis analysis. This indicates that the Ah repetitive elements contained in the B creatine kinase pseudogene are homosygous in the population.
THREE
Alus
IN
TANDEM
WITHIN
A HUMAN
TABLE of Ah Repetitive
Classification Diagnostic Position
Class
93 98 218 196 199 132 152
C T G C
positions
1
Class
2
1
at the B Creatine
repeat Class
C T G C
C T G C
C
Cl C
T T T G G
C/T T T G A
L
64 + 1 64 + 2 76 86 162
in Ah
Elements
Classified 3
Class
(5 C
T C C G G A G
c--j t--j A T G
t-1 (-) A T G
4
Alu-1
(4.3%) C T G C
Deletion
occurred
compared
to other
classes;
TABLE
Ah Repetitive in Tandem Alu
subfamilies
Thymidine kinase Alu G and H Thymidine kinase Alu K and L Tublin Alu E and F cY-Globin Alu 3A and 3B Nucleoplasmin pseudogene Alu 3 and 4 Prothrombin Ah 1 and 2 B creatine kinase pseudogene Ah-l, -2, -3
2
Element
Subfamilies
Ah Repeats Class
IV
Class
III
Class
II
G and H L
K
EandF 3B
3A 3and4 1 and 2 3
land2
pseudogene Ah-2
(8.6%)
Pseudogene Alu
repeat Ah-3
Locus (% divergence) (5.3%)
(5 G
(T, C
T T T G A
C T T A A
(-) (-) A A C
% divergence-compared
sertions in this fashion. The results are summarized in Table 2. Note that only in the case of the a-globin cluster Ah 3A and 3B and the thymidine kinase locus Ah K and L are the tandem AZus made from two different classes. In the present finding, the difference in the class of the Ah repetitive element inserted into the B creatine kinase pseudogene and the presence of direct repeats as described suggest three separate serial integration events. This supports the theory that the poly(A) end of the Ah element is involved in the preferential integration of subsequent AZu elements (Rogers, 1985), although the rarity of finding AZu repeats in tandem
Kinase
C T G G T A C
Class Note. (-) positions.
397
PSEUDOGENE
2
Class
Ah-4
C T G A
2
to Alu consensus
Class (2) not counting
(7.9%) A T T C T A G T (3 G A
3
Class CpG
2
and diagnostic
indicates that there is very low specificity of this factor as the basis of preferential integration. In addition, the sequence of the direct repeat in this pseudogene locus suggests that localized sequence homology to the beginning of the AZu element may have contributed to the determination of the AZu integration site. Note that in all cases of AZu elements in tandem (i.e., with outside flanking repeat), the AZu elements are all in the same orientation. The age of the CKB pseudogene can be calculated by its nucleotide divergence from CKB cDNA sequences. The non-CpG nucleotide sequence drift rate of single-copy DNA has been estimated to be about 0.15% per million years (Britten et al., 1988). The nucleotide divergence of the pseudogene, compared to that of the B creatine kinase cDNA, and AZu insertional elements, compared to the consensus AZu sequence at the non-CpG and nondiagnostic positions (Britten et aZ., 1988), are as follows: 5’ to the AZu insertion, beginning at initiation codon ATG, total 872 nucleotides, 13.2%; AZu-1, 4.3%; AZu-2, 8.6%; AZu-3, 5.3%; 3’ to the insertion, ending at poly(A) tail, total 390 nucleotides, 14.3%. This suggests that the CKB pseudogene evolved perhaps 90 million years ago, well before the primate radiation. In our calculation for the nucleotide divergence for the CKB pseudogene from the CKB gene sequence, we have included nucleotide(s) insertions or deletions as individual mutational events. Thus, the estimation of 90 million years is perhaps an overestimation. The neutral drift rate of 0.15% per million year in higher primates is based on primary sequence comparisons for silent substitutions in coding sequences and from interspecies hy-
MA
bridization data (Britten, 1986) and there is evidence that Alu sequence drift occurs at a similar rate (Britten et al., 1988). The nucleotide sequence drift rate of the pseudogene, after insertion, likely behaves similarly to that of an integrated Ah element. After finding the presence of a CKB pseudogene in the human genome, we proceeded to examine the presence of CKB gene duplication in other species, and DNA from three species were examined. Using probes derived from the coding regions of human CKM and CKB cDNA, we were able to establish hybridization and washing conditions under which cross-hybridization between species was maintained, but cross-hybridization within the species and between the CKM and CKB sequences was minimized. The results support the view that the B creatine kinase-like gene occurs in the mouse and possibly also in the rat. In the dog, there are multiple B creatine kinase-like genes in the genome. The nature and significance of the B gene duplication in these species are not known. Certainly, there has been no evidence of multiple forms of CKB in these species. Since the CKB gene is active in embryonic cells, there is a propensity, as suggested for some other pseudogenes, to have a reverse-transcription product of a CKB message incorporated into the genome of a pleuripotent cell which predates the development of germ lines, leading to the maintenance of this transposed gene product in the species. Whether any of these sequences in these species represent the same pseudogene cannot be answered. Certainly, the rodents and the dog have different ALU-like repetitive elements and the first Ah sequence insertional event would be expected to have occurred in the primate lineage. In summary, the present findings support the contention that Ah repeats integrate nonrandomly into the human genome. Since it is known that the higher primates have very different copy numbers ofAlu family repeats, as determined by the titration method (Hwu et al., 1986) and since deletion events at homologous locations have yet to be observed, it is conceivable that this locus in human genome, and other such structures that can be recovered and analyzed, in which multiple independent transposition events occurred, may function as useful molecular clocks through which the evolution of primate radiation and Ah repetitive sequences can be studied.
ACKNOWLEDGMENTS This work was supported by a grant from the American Heart Association-Texas Affiliate, Inc., the American Heart Association Bugher Foundation Center for Molecular Biology in the Cardiovascular System, and the Muscular Dystrophy Association. We also thank S. L. Terry and S. A. Montemayor for manuscript preparation, and P. A. Brink for providing RFLP data and analysis.
ET
AL.
REFERENCES 1. 2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
BRITTEN, R. J. (1986). Rates of DNA sequence evolution differ between taxonomic groups. Science 231: 1393-1398. BRITTEN, R. J., BARON, W. F., STOUT, D. B., AND DAVIDSON, E. H. (1988). Sources and evolution of human Alu repeated sequences. Proc. Natl. Acad. Sci. USA 85: 4770-4774. DANIELS, G. R., AND DEININGER, P. L. (1985). Integration site preferences of the Alu family and similar repetitive DNA sequences. Nucleic Acids Res. 13: 8939-8954. DEGEN, S. J. F., MACGILLIVRAY, R. T. A., AND DAVIE, E. W. (1983). Characterization of the complementary deoxyribonucleic acid and gene coding for human prothrombin. Btichemistry 22: 2087-2097. FEINBERG, A. P., AND VOGELSTEIN, B. (1984). A technique for radiolabeling DNA restriction endonuclease fragments to high specific activity: Addendum. Anal. Biochem. 137: 266 267. GRACE, A. M., PERRYMAN, M. B., AND ROBERTS R. (1983). Purification and characterization of human mitochondrial creatine kinase. J. Biol. Chem. 268: 15346-15354. GRIMALDI, G., AND SINGER, M. F. (1982). A monkey Alu sequence is flanked by 13-base-pair direct repeats of an interrupted alpha-satellite DNA sequence. Proc. Natl. Acad. Sci. USA 79: 1497-1502. HAAS, R. C., KORENFELD, C., ZHANG, Z., PERRYMAN, M. B., ROMAN, D., AND STRAUSS, A. W. (1989). Isolation andcharacterization of the gene and cDNA encoding human mitochondrial creatine kinase. J. Biol. Chem. 264: 2890-2897. HAAS, R. C., AND STRAUSS, A. W. (1990). Separate nuclear genes encode sarcomere-specific and ubiquitous human mitochondrial creatine kinase isozymes. J. Biol. Chem. 266: 69216927. HESS, J. F., Fox, M., SCHMID, C., AND SHEN, C. K. J. (1983). Molecular evolution of the human adult a-globin-like gene region: Insertion and deletion of Alu family repeats and nonAlu DNA sequences. Proc. Natl. Acad. Sci. USA 80: 59705974. HOSSLE, J., SCHLEGEL, J., WEGMANN, G., et al. (1988). Distinct tissue specific mitochondrial creatine kinases from chicken brain and striated muscle with a conserved CK framework. Biochem. Biophys. Res. Commun. 151: 408-416. Hwu, H. R., ROBERTS, J. W., DAVIDSON, E. H., AND BRIAN, R. J. (1986). Insertions and/or deletions of many repeated DNA sequences in human and higher ape evolution. Proc. Natl. Acad. Sci. USA 83: 3875-3879. HYLAND, V. J., SUTHERS, G. K., FRIEND, K. L., et al. (1989). Probe, VKBB, is located in the same intervals as the autosoma1 dominant adult polycystic kidney disease locus, PKDl. Hum. Genet. 84: 286-288. JURKA, J., AND SMITH, T. (1988). A fundamental division in the Alu family of repeated sequences. Proc. Natl. Acad. Sci. USA 86: 4775-4778. KARIYA, Y., KATO, K., HAYASHIZAKI, Y., HIMENO, S., TARUI, S., AND MATSUBARA, K. (1987). Revision of consensus sequence of human Alu repeats-a review. Gene 53: l-10. KAYE, F. J., MCBRIDE, 0. W., BA?TEY, J. F., GAZDAR, A. F., AND SAUSVILLE, E. A. (1987). Human creatine kinase-B complementary DNA-Nucleotide sequence, gene expression in lung cancer, and chromosomal assignment to two distinct loci. J. Clin. Inuest. 79: 1412-1420. KOMINAMI, R., MURAMATSU, A mouse type 2 Alu sequence Nature 301: 87-89.
M., AND MORIWAKI, K. (1983). (M2) is mobile in the genome.
THREE
Alus
IN
TANDEM
18.
KORENFJERG, J. R., AND RYKOWSKI, M. C. (1988). nome organization: Alu, Lines, and the molecular metaphase chromosome Bands. Cell 53: 391-400.
19.
LAVINHA, J., MORRISON, N., GLASGOW, L., AND FERGUSONSMITH, M. A. (1984). Further evidence for regional localization of human APRT and DIA4 on chromosome 16. Cytogenet. Cell Genet. 37: 517.
20.
Human structure
WITHIN geof
LIN, C. S., GOLL~THWAITE, D. A., AND SAMOLS, D. (1988). Identification of Alu transposition in human lung carcinoma cells. Cell 54: 153-159.
21.
LIN, C. S., GOLDTHWAITE, D. A., AND SAMOLS, D. (1989). Identification of Alu transposition in human lung carcinoma cells: A correction. Cell 59: 153-159.
22.
LIU, Q-R., AND CHAN, P. K. (1990). stretch of homopurine-homopyrimidine of retroposons in the human genome. 459.
Identification of a long sequence in a cluster J. Mol. Biol. 212: 453-
A HUMAN
PSEUDOGENE
399
31.
SLAGEL, V., FLEMINGTON, E., TRAINA-DROGE, V., BOADSHAW, H., AND DEININGER, P. (1987). Clustering and subfamily relationships of the Alu family in the human genome. Mol. Biol. Evol. 4: 19-29.
32.
STALLINGS, R. L., OLSON, E. A. W. S., THOMPSON, L. H., BACHINSKI, L., AND SICILIANO, M. J. (1988). Human creatine kinase genes on chromosomes 15 and 19 and proximity of the gene for the muscle form to the genes for apolipoprotein C2 and excision repair. Am. J. Hum. Genet. 43: 144-151. STOPPA-LYONNET, D., CARTER, P. E., MEO, T., AND TOSI, M. (1990). Clusters of intragenic Alu repeats predispose the human Cl inhibitor locus to deleterious rearrangements. Proc. Natl. Acad. Sci. USA 87: 1551-1555.
33.
34.
TRAEXUCHET, G., CHEBLOUNE, Y., SAVATIEXZ, P., et al. (1987). Recent insertion of an Alu sequence in the beta-globin gene cluster of the Gorilla. J. Mol. Evol. 25: 288291.
35.
ULLU, E., AND TSCHUDI, C. (1984). Alu sequences are processed 7SL RNA genes. Nature 312: 171-172. VLJAYA, S., STEFFEN, D. L., AND ROBINSON, H. L. (1986). Acceptor sites for retroviral integrations map near DNase l-hypersensitive sites in chromatin. J. Virol. 60: 683-692.
23.
MANIATIS, T., FRITSCH, E. F., AND SAMBROOK, J. (1982). “Molecular Cloning: A Laboratory Manual,” Cold Spring Harbor Laboratory, Cold Spring Harbor, New York.
36.
24.
MARIMAN, E. C. M., BROERS, C. A. M., CLAESEN, C. A. A., TESSER, G. I., AND WIERINGA, B. (1987). Structure and expression of the human creatine kinase B gene. Genomics 1: 126-137.
37.
25.
NELSON, D. L., LEDBETTER, S. A., CORBO, L., et al. (1989). Alu polymerase chain reaction: A method for rapid isolation of human-specific sequences from complex DNA sources. Proc. Natl. Acad. Sci. USA 86: 6686-6690.
VILLARREAL-LEW, G., MA, T. S., KERNER, S. A., ROBERTS, R., AND PERRYMAN, M. B. (1987). Human creatine kinase: Isolation and sequence analysis of cDNA clones for the B subunit, development of subunit specific probes and determination of gene copy number. Biochem. Biophys. Res. Commun. 144: 1116-1127.
38.
WATTS, D. C. (1973). In “Creatine Kinase (Adenosine 5-Triphosphate-creatine Phosphotransferase)” (P. D. Boyer, Ed.), pp. 383-455, Academic Press, New York. WEINER, A. M., DEININGER, P. L., AND EFSTRATIADIS, A. (1986). Nonviral retroposons: Genes, pseudogenes, and transposable elements generated by the reverse flow of genetic information. Annu. Rev. Biochem. 55: 631-661. WILLARD, C., NGUYEN, H. T., AND SCHMID, C. W. (1987). Existence of at least three distinct Alu subfamilies. J. Mol. Evol. 26: 180-186. YAMAMOTO, T., DAVIS, C. G., BROWN, M. S., et al. (1984). The human LDL receptor: A cysteine-rich protein with multiple Alu sequences in its mRNA. Cell 39: 27-38. ZABAROSVSKY, E. R., CHUMAKOV, I. M., PIUSSOLOV, V. S., AND KISSELEV, L. L. (1984). The coding region of the human c-mos pseudogene contains Alu repeat insertions. Gene 30: 107-111.
26.
27.
PERRYMAN, M. B., KERNER, S. A., BOHLMEYER, T. J., AND ROBERTS, R. (1986). Isolation and sequence analysis of a fulllength cDNA for human M creatine kinase. Biochem. Biophys. Res. Commun. 140: 981-989. REED, K. C., AND MANN, D. A. (1985). from agarose gels to nylon membranes. 7207-7221.
Rapid transfer Nucleic Acids
and evolution
of DNA Res. 13:
28.
ROGERS, J. H. (1985). The origin sons. Znt. Rev. Cytol. 93: 187-279.
of retropo-
29.
SHIH, C-C., STOYE, J. P., AND COFFIN, J. M. (1988). Highly preferred targets for retrovirus integration. Cell 63: 531-537.
30.
SINGER, M. F. (1982). SINES and LINES: Highly repeated short and long interspersed sequences in mammalian genomes. Cell 28: 433-434.
39.
40.
41.
42.