GENOMICS
8,271-278
(1990)
DNA Sequence Polymorphisms MASATO Oncogene
Division, National
ORITA,
TAKAO SEKIYA, AND KENSHI HAYASHI’
Cancer Center Research Institute Received
February
5, 1990;
Press, Inc.
INTRODUCTION
Analysis of restriction fragment length polymorphism (RFLP) (Botstein et al., 1980) has been shown to be useful in construction of the linkage map of the human genome (Donis-Keller et al., 1987), in prenatal diagnoses of various hereditary diseases (McKusick, 1988), and in isolating the genes responsible for such diseases (Friends et al., 1986; Rommers et al., 1989). Construction of a high-resolution linkage map has been proposed to provide landmarks for cloning and sequencing of the entire human genome and to facilitate finding polymorphic DNA markers tightly linked to various genes of interest (Alberts, 1988). Such studies require many new polymorphic DNA markers and, preferably, the development of a technique simpler than those currently available for their detection. The human genome contains nearly a million copies of a family of short (approximately 300 bp long) repetitive sequences, Ah repeats (Jelinek and Schmid, 1982). These Ah repeats are distributed essentially at random throughout the genome (Moyzis et al., 1989), although some bias in their distribution has been reported (Korenberg and Rykowski, 1988). Because they are so numerous, almost every region of the human genome of more than a few kilobase pairs (kb) long contains at least one repeat. Alu repeats are correspondence
should
revised
l-l, Tsukiji Echome, May
Chuoku, Tokyo, Japan
14, 1990
retrotransposons, and as a result of accumulation of mutations during spreading or after insertion, their sequences have become divergent (Jelinek and Schmid, 1982; Britten et al., 1988; Jurka and Smith, 1988). We thought that Alu repeats should be a rich source of polymorphism for the following reasons: (1) They spread throughout the genome without beneficial effects for the organism that bears them (Weiner et al., 1986); thus, there is apparently no selective pressure for strict conservation of their sequence. (2) The consensus sequence of Ah repeats is rich in CG dinucleotides, which are known to be hot spots of mutation (Barker et al., 1984). (3) The repeats carry Aclusters at the middle and at 3’-flank, and such simple sequences are known to be sources of length polymorphisms (Weber and May, 1989). (4) Because of their extreme repetitiveness, the chance of changing sequence by a mechanism similar to gene conversion should be high. Such phenomena are frequently observed among members of gene families or genes and their pseudogenes (Efstratiadis et al., 1980). However, Ah repeat polymorphism has not yet been fully exploited, because no technique has been available for its detection (Hobbs et al., 1985). We have developed a simple, sensitive method for detection of sequence changes in a given genomic sequence (Orita et al., 1989a). In this technique, the sequence to be examined is amplified and labeled by the polymerase chain reaction (PCR) (Saiki et al., 1988) using labeled primers (Hayashi et al., 1989) or a labeled nucleotide, and then denatured and separated by nondenaturing polyacrylamide gel electrophoresis. Under nondenaturing conditions, single-stranded DNA has a folded structure, which is sequence specific because it is maintained by sequence-dependent local intramolecular interactions. On autoradiography of the gel, sequence changes can be detected as mobility shifts as a result of changes in the folded conformations of the separated strands (single-strand conformation polymorphism (SSCP); Orita et al., 1989b). By this method it was possible to detect all the mutations in DNA fragments of up to 200 bp that were
We have developed an efficient method for detection of sequence differences in genomic DNA based on a new principle (M. Orita et al., 1989, Genomice6: 874-879). Using this method, we show here that approximately half the AZu repeats interspersed in the human genome are significantly polymorphic. Analysis of Ah repeat polymorphism should be useful in construction of a high-resolution map and also in identifying genotypes of individuals for clinical and other purposes because the repeats are ubiquitous and the technique for their detection is simple. o 1990 Academic
’ To whom
in A/u Repeats
be addressed. 271
All
Copyright 0 1990 rights of reproduction
0338-7543/90 $3.00 by Academic Press, Inc. in any form reserved.
272
ORITA,
SEKIYA,
amplified from 18 different mutant cells (12 mammalian and 6 bacterial) known to have different base substitutions or small insertions/deletions at various genes (Orita et al., 1989a; Suzuki et al., 1990; and our unpublished data). More recently, we used this technique to detect sequence changes in a mutated gene for the DNA repair enzyme of Bacillus subtilis by dividing the gene sequence into 400-bp blocks. On PCR-SSCP analysis we could detect 10 of 12 mutated sequences (83%) by their significantly altered mobility (Morohoshi et al., manuscript in preparation). Therefore, this method is sensitive and should be capable of detecting most sequence differences in fragments of this size range. We have shown that polymorphisms in repetitive sequences such as Alu repeats could also be examined with this technique by choosing primer sequences in single-copy regions that bracket the repeats. Here we report an extensive survey of polymorphisms in the Alu repeats showing that the repeats are an abundant source of polymorphisms. MATERIALS
AND
METHODS
PCR-SSCP Oligdnucleotides were 5’-end-labeled to a high specific activity and used as primers for the PCR as described previously (kayaslii et al., 1989; Orita et al., 1989a): The amplified prpducts (approximately 400 bp) were dilutedwith formamide solution containing marker dyes, heated, and’separated on 5% polyacrylamide gel (acrylamide/NJV’-methylene bisacrylamide, 49/l; 20 X.40 X Cf.03cm; 0:5 cm per lane) in 0.09 M Tris-borate and 2 hM EDTA with or .without 10% glycerol as indicated in,TabIe 2. Electrophoresis was carried out at 30 W for 3 to 6 h until xylene cyan01 reached 5 to 10 cm from the bottom, with rigorous cooling by fans placed on both sides of the gel plate. An aluminum plate was attat?hed tb’oneside to ensure even temperature distribution duringelectrophoresis. The gekwas dried and exposed to X-ray film for 3 to 6 h with an intensifying screen. Sequehcing T.he target sequences were asymmet,rically amplified.using the same primers ag those for PCR-QSCP analysis. Th,e,products served as templates for dideoxyztermination reac$ions,iSequenase kit, USB, Cleveland, OH), start~gfrom labeled Iprimers (Gylleasten and E&h, 1988). RIZSUETS We. analyzed 43 Ah repeats “on v rious+chrpmosomesusing PCRfSSCP (Tab@ 1). d’ &st were cho*n from collections ofthe repeats in two Grevious reports
AND
HAYASI
(Moyzis et al., 1989; Britten et al., 1988). Some of the reported repeats were not suitable for PCR because they were adjacent to another repetitive sequence such as a further Alu repeat itself or an Ll repeat. We found that half of the Ah repeats have more than two allelic forms (Table 1). The conformation of singlestranded DNA can be altered depending on the temperature, ionic strength, or presence of a denaturant (Orita et al., 1989a). Therefore, we also tried different conditions of electrophoresis (see note to Table 2 for conditions), but found no additional polymorphisms in this way. Some of the Alu repeats (indicated by asterisks in Table 1) gave considerably broad bands, which clearly indicated polymorphism, but did not allow precise assignment of alleles. Close examination of the sequences in the GenBank database revealed that all these Alu repeats had poly(A) stretches of 18-mer or longer at their 3’-termini. Inaccurate replication of homopolymer stretches during the PCR as reported by Economou et al. (1990) may explain this broadening of the SSCP bands. Sixteen different non-Alu sequences of similar lengths were also examined for comparison. These sequences were all located near Alu repeats examined and outside exons, except for two sequences (58 and 59, see Table 1 for the sequences) that were within a noncoding exon of the LDL receptor gene (Yamamoto et al., 1984). A quarter of the non-Alu sequences turned out to be polymorphic. This frequency is consistent with our expectation that Alu repeats are likely to be polymorphic. Further details of the sequence diversity are described later. We determined the frequencies of each allele of 18 Alu repeats that showed clear polymorphisms in DNAs of 27 to 35 unrelated individuals. Figure 1 shows parts of the autoradiograms obtained in the analyses for all but two sequences (6 and 17) which had only one infrequent minor allele. An example of the interpretation of results is shown in Fig. 2. The absence of bands in one-third of the lanes in Fig. lm suggests lack of the sequence HLA DR@+l) in these individuals. The lack of this pseudogene in certain haplotypes of DR locus is well documented (Larhammer et al., 1985; Andersson et al., 1987). This result suggests that haplotypes of the locus can be determined by PCR-SSCP analysis of Ah repeats. As summarized in Table 2, among the 18 Ah repeats analyzed, 12 had PIC values of above 0.3, and 5 of below 0.3. One had an uncertain value (sequence 31, see legend to Fig. 1). An Alu repeat in the LDL receptor gene had the highest PIC value of 0.73. Using,DNA from families of 2 and 3 generations, we confirmed that these polymorphisms were segregated according to Mendel’s law. Figure 3 shows that alleles of Alu repeats in the @-globin and angiogenin genes are inherited as expected. The inheritance of all other
DNA
SEQUENCE
POLYMORPHISMS
TABLE Sequences
Examined
Locus (chromosome)
No.
IN
Ah REPEATS
273
1
by the PCR-SSCP
Method Amplified sequence
Gene
Polymorphism
Alu repeats 1 ; 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
HUMAlATP HUMACHRA7 HUMADAG
HUMAFP HUMAGG
(14) (NA) (20)
(4) (14)
HUMALBGC HUMALUAGP HUMAPOAZI HUMAPOAlI HUMAPOE4 HUMClAINl(17) HUMERPA HUMFIXG HUMFOL5 HUMGAST3 HUMHBA4 HUMHBB
u,-Antitrypsin Acetylcholine Adenosine
receptor deaminase
oc-Fetoprotein Angiogenin (4)
(NA) (1) (11) (19) (7) (x) (5) (17) (16) (11)
Serum
albumin
cu,-Acid alvconrotein Apolipoprotein AI1 Anohuonrotein AI. CIII Apolipoprotein E Pro-a-1 type-l collagen Erythropoietin Factor X Dihydrofolate Gastrin a-Globin @-Globin
reductase
HUMHPARSZ HUMIFNBB HUMILlAG HUMINS HUMMHDRB3 HUMMYCC HUMNGFB HUMPALD HUMPOMC HUMTHBNB
(16) (2) (2) (11) (6) (8) (1) (18) (2) (11)
Haptoglobin-related Interferon p-3 Interleukin 1-a Insulin HLA-DR-B-+1 c-myc oncogene B-Nerve growth factor Serum prealbumin Proopiomelanocortin Prothrombin
HUMTKRA
(17)
Thymidine
HUMTPA Ref.
(8)
Tissue
17 (19)
LDL
kinase
plasminogen
HUMADAG
(20)
HUMAGG
(14)
HUMAPOAQI HUMFIXG HUMHBB
(x) (il)
HUMINSOl HUMMYCC Ref.
17 (19)
Adenosine
(11) (8)
Apolipoprotein Factor IX fl-Globin
Insulin c-myc LDL
oncogene receptor
4899 * 5251 1210 + 1625 4861 --, 5264 5561+ 5955 8145 + 8525 24441 + 24819 5891 + 6270 1061 + 1421 2503 + 2887 9858 + 10238 11884 + 12239 653 - 1028 2531 + 2921 3256 + 3631 2071 + 2465 955 + 1353 1782 + 2214 7264 + 7669 24126 + 24514 1511 + 1883 131 + 540 8491 + 8899 78 + 478 1926 c 2300 32380 + 32769 44768 + 45184 8951 + 9336 13166 -) 13553 4837 + 5223 31-, 429 2793 + 3153 6169 + 6559 5182 + 5592 5709 + 6108 5301 + 5701 3065 + 3458 3557 + 3936 1222 + 1590 4350 + 4723 7397 + 7771 5638 + 6028 25585 + 25948 3635 + 4046
+ +* + + +* + + +* +
+ + +* + + + +* -
+ +
sequences
deaminase
Angiogenin (1)
activator
receptor Non-Alu
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
a-subunit
AI1
5968-6378 9386-9790 635-1017 1401-1806 2133-2542 6107-6506 461-887 88-1323 1308-1700 2218-2728 44348-44772 2884-3274 5353-5749 5755-6130 3234-3654 4497-4945
+ + +
Note. Names of loci and positions of amplified sequences (including primers) are according to GenBank Database Release 57. Chromosomal locations are according to Human Gene Mapping 9.5 (1988). NA, not assigned. Arrowheads in the amplified sequences indicate orientations of Alu repeats. Pairs of PCR primer sequences (20-mers) having balanced base compositions and spanning 350 to 406 bp that inelude A-clusters at 3’-ends of the repeats were chosen. DNAs were prepared from leukocytes of eight unrelated individuals. Sequences were regarded as polymorphic when more than two alleles were observed in these samples. * These repeats definitely showed more than two alleles, but bands were broad under all electrophoretic conditions tested, and not all individual alleles were clearly identified. Polymorphisms of sequences 5, 8, and 26 were reported in our previous paper (26).
274
ORITA,
SEKIYA,
TABLE
AND
HAYASI
2
Polymorphisms in Ah RepeatsDetected by the PCR-SSCP Method Sequence no.
Condition
A
H
PIG
3
R
2
0.23
0.20
5
R+G
3
0.26
0.24
6
R
3
0.07
0.06
8
R+G
7
0.54
0.48
10
R+G
2
0.55
0.38
11
R
2
0.55
0.35
12
R+G
3
0.32
0.50
13
R
4
0.63
0.57
15
R+G
3
0.26
0.33
17
R+G
2
0.06
0.06
21
R
3
0.45
0.49
23
C
3
0.47
0.41
26
R+G
5
0.48
0.51
29
R
2
0.26
0.31
32
R
2
0.31
0.17
38
R
3
0.45
0.36
43
R
7
0.60
0.73
Note. Sequence numbers correspond to those in Table 1. Conditions 4°C (C), are indicated because the times required for electrophoreses conditions (at room temperature with 10% glycerol, R+G). A, number
Ah repeat polymorphisms listed in Table 2 was also compatible with this law in all members of both families (data not shown), except in one case in which polymorphism within sequences of PCR primers was suspected. In one family, the mother and father carried different types of apparently homozygous Ah repeats in the a-acid glycoprotein gene (data not shown). All the children were expected to be heterozygous, but one was apparently homozygous for the maternal allele. This finding can be explained by supposing that the sequence within the primer(s) was polymorphic and that the father had an allele that could not be amplified by PCR. If this was so, the father and the child who was apparently homozygous were in fact both heterozygous. Sequence changes of Ah repeat polymorphisms were characterized by direct sequencing after asymmetrical amplification of several loci from individuals
Primer GCTGCAGTCAGCCTCAACTT GCACTTTTGTGATAGCTGTC CTGGCAAGTGAACAGGTACA CTCTGCATCAGAGAGGGACA ACTATTTGGTCAAACTTCTG GGTCCAAGCATCCAATTATC CCTCATTTCCTATTAGGGAG CCTGAGGCACATTAAGACAT GGCAGAACATCTTGCAAGGT TAGTGACCTTCTCCAAGTCC GTTCTCATAATCCACCTTCC CTCCTTCTCGGTAGAATGGA CTGACACTTATCATCAGAGG AACCTACTTAGTCATGTCCT ATCCCTCGATTTCTGGAGGT TAACAGGAAATTATGAGGGG GTTTCCCCCATCCTTGAGAT CTATCTGCCTGCAATGCATT GGTAAAATGGAGCAGCAGAG CAGAAGGTATGCAATAAGAC CTGGAGAACTGATCTGAAGA CAGGGGACTGAACCATGTGG TCAAGTAGTGTCAGGAATTA ACCCAGACCTATGTGGTAGA AAGTTGATGCTGGATAGAGG ATTCTCTTGAGACTACATTG CTGGTACAAAGTAAGCACCC GCTGGTAGTATTCATATAGG GTCTTAGGTAAGAATTGGCA CGTTAGAAAGGTCTCTGGAC GGGCTAGTGAGTTTCTTGAG GGGTTTGAGCTGAAGTGATC TGCCTCCAAGCCATTCACTT AGAGGCCATGGGCTGCTGAT
for electrophoresis, without glycerol at room temperature (R) or at under these conditions were about half those under the standard of alleles; H, frequency of beterozygotes.
who carry various alleles. Seven loci gave readable sequence ladders and we could identify sequence changes in all 17 alleles of these loci (Fig. 4). Except in two cases, one of the alleles of every sequenced locus had exactly the same sequence found in the GenBank database. In Fig. 4, we include the two GenBank sequences (alleles g in No. 26 and No. 29) as different alleles. The absence of these two alleles in our sample may indicate different distributions of alleles in different populations. Pairwise comparison of alleles 2 or 3 in Fig. 4 with the major sequences (alleles 1) revealed that most mobility shifts in the PCR-SSCP analyses were explained by sequence changes at single sites. This fact indicates that most polymorphisms in the Alu repeats are the results of single hits. The sequence changes among alleles included 10 substitutions and four insertions/deletions. Three
DNA
SEQUENCE
POLYMORPHISMS
IN
a
275
Ah REPEATS k
n
e
FIG. 1. PCR-YSCP analysis of Ah repeats. Photographs correspond to the following 11; f, 12; g, 13; h, 15; i, 21; j, 23; k, 26; 1,29; m, 31; n, 32; o, 38; and p, 43. Polymorphisms samples were reported previously.
substitutions involved CG dinucleotides and one insertion/deletion was at the A-cluster in the 3’-flanking regions. Relatively infrequent polymorphism at the A-cluster is unexpected because Economou et al.
a
b 12345
1
2
3
4
5
nnnm-
-A-A -c-c-B -A -c
-c
-A-A -D -A -B
-A
-A -D
FIG. 2. Interpretation of Ah repeat polymorphisms in the apolipoprotein A II locus. Part of Fig. lg (a) and the assignment of alleles (b) are indicated. Four alleles (A to D) were identified. Two individuals (lanes 1 and 4) were homozygous and three (lanes 2.3, and 6) were heterozygous. Some lanes in Fig. 1 show three bands as the result of corn&ration of two of the four bands of heterozygotes of these loci. Comparison of the bands of homozygous individuals and also of intensities of the bands supports this interpretation.
sequence numbers in Table 1: a, 3; b, 5; c, 8; d, 10; e, of sequences 5,8, and 26 shown with different DNA
(1990) reported frequent length polymorphisms of the A-cluster in the 3’-flank of an Ah repeat near the fl-globin gene. Perhaps length polymorphisms of Aclusters in Alu repeats are not generalized for other members of the repeats, or, in PCR-SSCP, Ah repeats with long A-clusters are not faithfully amplified as mentioned before, and their polymorphisms are not detectable. It should be stressed, however, that at least in one case, a difference was detected by this technique and that no other locus we have sequenced showed this length polymorphism. DISCUSSION
We demonstrated that Alu repeats are a rich source of polymorphic DNA markers and that the polymorphism can be detected rapidly by simple procedures. Because of their ubiquitous distribution, several polymorphic Ah repeats are likely to be found in the vicinity of any genes of interest and may be used as polymorphic DNA markers tightly linked to the genes. Here we showed Alu repeat polymorphisms in the adenosine deaminase gene (Nos. 3,5, and 6), the @-globin gene (Nos. 23 and 26), and the LDL receptor gene (No. 43). These polymorphisms can serve as
276
ORITA,
P-globin
SEKIYA,
AND
It is interesting to assessthe difference between nucleotide diversities in regions of Alu repeats and other ordinary single-copy sequences. We estimated heterozygosities at the nucleotide level (nucleotide diversities) as defined by Nei (1975) and Cooper et al. (1985). In the following calculation we assumed that each polymorphic sequence has only one nucleotide change from the major sequence and that all polymorphic changes are at different sites. These assumptions are valid, as evidenced by our sequencing data (Fig. 4). Suppose a region, length 1, is examined by PCRSSCP, and N sites are found to be polymorphic. Heterozygosity at the ith polymorphic nucleotide, hi, is given by the equation
locus
Angiogenin
HAYASI
locus hi = 1 - {IT + (1 - pi)“} = Zfi(l - fi), where fi is the fraction of the variant nucleotide at this site. Heterozygosity at the nucleotide level of a region (H,,) is defined as the arithmetic mean of the heterozygosity of each nucleotide in this region. Since heterozygosity at nucleotides other than the polymorphic nucleotide is zero,
FIG. 3. Mendelian inheritance of Alu repeat polymorphisms. Ah repeats in the P-globin gene (No. 26 in Table 1) and angiogenin gene (No. 8 in Table 1) were analyzed using DNA of a three-generation family.
H, = ~ hi/l = 2 ~ {fi(l i=l
i=l
fi))/Z*
111
In PCR-SSCP analysis, not all polymorphisms are detected. It is reasonable to assume that the fi of undetected polymorphisms has the same distribution as the fi of detected polymorphisms. Therefore, Eq. [l] can be modified as
DNA markers in prenatal detection of affected alleles of cases of hereditary adenosine deaminase deficiency, P-thalassemia, and familial hypercholesterolemia, respectively (McKusick, 1988).
20 50 140 110 220 240 280 ..CAC . . . GCCGi. ..hGGCGTG . ..TCCCAGiTACTCG...GAGGiTGCAG...C.:...i-CAa
ConBensus
*
Seq.
no.
3 5 10 21 26
29 38
*
*
*
**
*
*
+
t*
1:
*
Allele l(=l?) 2 l(=g) 2 l(=g) 2 I(=&!) 2 3 g 1 2 3 g 1 2 l(=g) 2 3
TCAs A7
GCTGA A
CAC c T c
AGAGACG A A A
TCCCAGATATTCT AC TA
TCCCAGCTTACTCG G A GGGTGGAGGCGs GI G A G7
AAGGTTGTGG G
GAGCXTGCAG T AG T AG T cx
TG C C
FIG. 4. Nucleotide changes in the.Alu repeats that showed polymorphisms in the PCR-SSCP analyses. Only the sequences around the polymorphic nucleotides are shown. Asterisks below the consensus Alu repeat sequence (18) indicate positions of nucleotide variations. Sequence numbers are the same as those in Table 1. The most abundant allele was numbered 1, while other numbers are arbitrary. A g denotes a sequence found in the GenBank database; X in-No. 26 indicates a one base deletion. The position of the G-cluster in No. 38 is not accurate because the sequence diverged considerably from the consensus sequence in this region,
DNA
Hn =
2
SEQUENCE
POLYMORPHISMS
5 (fit1 - fi)}/(ZF), 0 < F < 1, i=l
where F is the efficiency of detection. In our experience, the present method detects approximately 80% of sequence changes within 400-bp fragments (F = 0.8, I = 400) as described elsewhere in this paper. We determined the fls of 39 polymorphisms in 16 AZu repeats and of 6 polymorphisms in 4 non-Alu sequences and then calculated H,, for each loci. Z-Z, av was obtained as an average of Z-Z, for all examined loci, assuming that the Ah repeats that showed only one allele in the first screening (8 individuals) are not polymorphic. This assumption may not be true (examining more individuals may reveal other alleles), and thus H, av for AZu repeats may be an underestimation. Excluding loci that gave ambiguous bands (asterisks in Table 1) from the calculation, we obtained a value of H,, av = 0.06% for AZu repeat regions (38 loci, 15kb regions of 8 to 30 individuals, or a total of 525 kb examined), and H, Bv = 0.03% for the adjacent non-AZu regions (16 loci, 6.4-kb regions of 8 individuals, or a total of 102 kb examined). Therefore, it seems that AZu repeats are twice as likely to be polymorphic as other regions. A possible source of error in our calculation is uncertainty in estimation of F. It is unlikely, however, that F can differ by a factor of 2. Also, a change in F does not effect a difference in H,, av of AZu repeats and non-AZu sequences. Previous estimates of sequence diversity of the human genome range widely. Values of about 0.3% were obtained on the basis of the frequency of polymorphisms detected as the RFLP (Cooper and Schmidtke, 1984; Cooper et al., 1985). These values may be overestimates because recognition sequences of restriction enzymes often include a CG dinucleotide, a sequence known to mutate frequently (Barker et aZ., 1984; Cooper et aZ., 1985). Also, in these studies randomly selected clones were used as probes, and therefore the area examined was not confined to the region near the coding sequence (as in the present study) which is believed to be relatively well conserved. Recently, Yandell and Dryja (1989) screened polymorphic sites in several regions of the retinoblastoma susceptibility locus by direct sequencing of several regions amplified by the PCR. They found heterozygosity at the nucleotide level of approximately 0.07% for the intron sequences. The difference between their estimate and ours (0.03%) for non-AZu sequences may be explained by the fact that one of the sequences they examined contained a highly variable sequence, VNTR. The difference may also be attributable to regional fluctuations. It should be stressed that the number of nucleotides screened here by PCR-SSCP is much larger than that screened by direct sequencing (Yandell and Dryja, 1989) or by RFLP analysis (Cooper et al., 1985) for the estimation.
IN
Alu
REPEATS
277
The abundance of polymorphisms in AZu repeats is advantageous when chromosome-specific polymorphic markers are required. Such markers can be isolated from chromosome-specific human genome libraries, most of which are constructed from DNA of human-rodent hybrid cells. One obvious way of selecting human sequences against a rodent sequence background starting from these libraries is to pick up AZu repeats, because these repeats are known to be present only in the human genome. Since polymorphisms are frequent in AZu repeats, searching for polymorphic markers in these repeats is a feasible approach. In construction of a high-resolution linkage map, the genotypes of many polymorphisms must be determined in many DNA samples obtained either from individuals of defined families, such as those collected by Centre d’Etude du Polymorphisme Humain, or from single sperms, as reported by Li et al. (1988). The PCR-mediated method for detection of polymorphisms is simple and requires only small amounts of DNA. Therefore, it may be better suited to such purposes than conventional methods that involve restriction enzyme digestion, blotting, or hybridization to probes. Several repeated sequences other than AZu repeats have been shown to be suitable for polymorphism analysis using the PCR because of their sequence variability. Polymorphisms in CA dinucleotide repeats and VNTR are especially useful when markers of high PIC values are needed (Weber and May, 1989; Litt and Luty, 1989; Jeffreys et aZ., 1990). These repeats are, however, less abundant than AZu repeats. Perhaps, analyses of polymorphisms in these three repeats will complement each other and become very useful in mapping the total human genome. ACKNOWLEDGMENTS We thank Masahiko Shiraishi and Ryuichi Sakai for providing DNA samples, and Martin Brinkworth for help in preparing this manuscript. We also thank Dr. M. Hasegawa for discussion on the estimation of heterozygosity. This work was supported in part by a Grant-in-Aid from the Ministry of Health and Welfare for a Comprehensive lo-Year Strategy for Cancer Control, Japan, and Grants-in-Aid from the Ministry of Science, Education and Culture of Japan and from the Special Coordination Fund of the Science and Technology Agency of Japan. M. Orita was a recipient of a Research Resident Fellowship from the Foundation for Promotion of Cancer Research.
REFERENCES 1.
2.
ALBERTS, B. M. (Chairman) (1988). Report of the on Mapping and Sequencing the Human Genome, Academy Press, Washington, DC. ANDERSSON, G., LARHAMMAR, D., WIDMARK, E., B., PETERSON, P. A., AND RASK, L. (1987). Class the human major histocompatibility complex. J. 262: 8748-8758.
Committee National SERVENIUS, II genes of Biol. Chem.
278
ORITA,
SEKIYA,
3. BARKER, D., SHAFER, M., AND WHITE, R. (1984). Restriction sites containing CpG show a higher frequency of polymorphism in human DNA. Cell 36: 131-138. 4. BOTSTEIN, D., WHITE, R. L., SKOLNICK, M., AND DAVIS, R. W. (1980). Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Amer. J. Hum. Genet. 32:314-331. 5. BRITTEN, R. J., BARON, W. F., STOUT, D. B., AND DAVIDSON, E. H. (1988). Source and evolution of human Ah repeated sequences. Proc. Natl. Acad. Sci. USA 85: 4770-4774. 6. COOPER, D. N., AND SCHMIDTKE, J. (1984). DNA restriction fragment length polymorphisms and heterozygosity in the human genome. Hum. Genet. 66: l-16. I. COOPER,D. N., SMITH, B. A., COOKE, H. J., NIEMANN, S., AND SCHMIDTKE, J. (1985). An estimate of unique DNA sequence heterozygosity in the human genome. Hum. Genet. 69: 201205. 8. DONIS-KELLER, H., GREEN, P., HELMS, C., CARTINHOUR, S., WEIFFENBACH, B., STEPHENS, K., KEITH, T. P., BOWDEN, D., AKOTS, G., REDIKER, K. S., GRAVIUS, T., BROWN, B. A., RISING, M. B., PARKWER, C., POWERS, J. A., WA=, D. E., KAUFFMAN, E. R., BRICKER, A., PHIPPS, P., MULLER-KAHLE, H., FULTON, T. R., NG, S., SCHUMM, J. W., BRAMAN, J. C., KNOWLTON, R. G., BARKER, D. F., CROOKS, S. M., LINKOLN, S. E., DALY, M. J., AND ABRAHAMSON, J. (1987). A genetic linkage map of the human genome. Cell 51: 319-337. 9. ECONOMOU, E. P., BERGEN, A. W., WARREN, A. C., AND ANTONARAKIS, S. E. (1990). The polydeoxyadenylate tract of Alu repetitive elements is polymorphic in the human genome. Proc. Natl. Acad. Sci. USA 87: 2951-2954. 10. EFSTRATIADIS, A., POSAKONY, J. W., MANIATIS, T., LAWN, R. M., O’CONNELL, C., SPRY, R., DERIEL, J. K., FORGET, B. G., WEISSMAN, S. M., SLIGHTOM, J. L., BLECHL, A. E., SMITHIES, O., BARALLE, F. E., SHOULDERS,C. C., AM) PROUDFOOT, N. J. (1980). The structure and evolution of the human @-globin gene family. Cell 21: 653-668. 11. FRIENDS, S. H., BERNARDS, R., BOGELJ, S., WEINBERG, R. A., RAPAPORT, J. M., ALBERT, D. M., AND DRYJA, T. P. (1986). A human DNA segment with properties of the gene that predisposes to retinoblastoma and osteosarcoma. Nature (London) 323~643-646. 12. GYLLENSTEN, II. B., AND ERLICH, H. A. (1988). Generation of single stranded DNA by the polymerase chain reaction and its application to direct sequencing of the HLA-D&A locus. Proe.
Natl.
Acad.
Sci. USA
85: 1652-7656.
13. HAYASHI, K., ORITA, M., SUZUKI, Y., AND SEKIYA, T. (1989). Use of labeled primers in polymerase chain reaction (LPPCR) for a rapid detection of the products. Nucleic Acids Res. 17: 3605. 14. HOBBS, H. H., LEHXMAN, M. A., YAMAMOTO, T., AND RUSSEL, D. W. (1985). Polymorphism and evolution of Ah sequences in the human low density lipoprotein receptor gene. Proc. Natl. Acad. Sci. USA 82: 7651-7655. 15. Human Gene Mapping 9.5 (1988). Cytogenet. Cell Genet. 49: Nos. l-3. 16. JEFFREYS,A. J., NEUMANN, R., AND WILSON, V. (1990). Repeat unit sequence variation in minisatellites: A novel source of DNA polymorphism for studying variation and mutation by single molecule analysis. Cell 60: 473-485. 17. JELINEK, W. R., AND SCHMID, C. W. (1982). Repetitive sequences in eukaryotic DNA and their expression. Annu. Rev. Btichem. 51: 813-844.
AND
HAYASI
18. JURKA, J., AND SMITH, T. (1988). A fundamental division in the Ah family of repeated sequences. Proc. Natl. Acad. Sci. USA 85: 4775-4778.
19. KORENBERG, J. R., AND RYKOWSKI, M. C. (1988). Human genome organization: Alu, Lines, and the molecular structure of metaphase chromosome bands. Cell 53: 391-400. 20. LARHAMMER, D., SERVENIUS, B., RASK, L., AND PETERSON, P. A. (1985). Characterization of an HLA DR pseudogene. Proc.
Natl.
Acad.
Sci. USA
82: 1475-1479.
21. LI, H., GYLLENSTEN, U. B., Cm, X., SAIKI, R. K., ERLICH, H. A., AND ARNHEIM, N. (1988). Amplification and analysis of DNA sequences in single human sperm and diploid cells. Nature (London) 335: 414-417. 22. Lrrr, M., AND LUTY, J. A. (1989). A hypervariable microsatellite revealed by in vitro amplification of a dinucleotide repeat within the cardiac muscle actin gene. Amer. J. Hum. Genet. 44: 397-401. 23. MCKUSICK, V. A. (1988). “Mendelian Inheritance in Man,” 8th ed., Johns Hopkins Univ. Press, Baltimore. 24. MOYZIS, R. K., TORNEY, D. C., NIEYNE, J., BUCKINGHAM, J. D., WV, J. R., BURKS, C., SIROTKIN, K. M., AND Gown, W. B. (1989). The distribution of interspersed repetitive DNA sequences in the human genome. Genomics 4: 273-289. 25. NEI, M. (1975). “Molecular Population Genetics and Evolution,” North-Holland, Amsterdam. 26. ORITA, M., SUZUKI, Y., SEKIYA, T., AND HAYASHI, K. (1989a). Rapid and sensitive detection of point mutations and DNA polymorphisms using the polymerase chain reaction. Gerwmics 5: 874-879.
27. ORITA, M., IWAHANA, H., KANAZAWA, H., HAYASHI, K., AND SEKIYA, T. (1989b). Detection of polymorphisms of human DNA by gel electrophoresis as single-strand conformation polymorphisms. Proc. Natl. Acad. Sci. USA 86: 2766-2770. 28. ROMMERS, J. M., IANNUZW, M. C., KEREM, B., DRUMM, M. L., MELMER, G., DEAN, M., ROZMAHEL, R., COLE, J. L., KENNEDY, D., HIDAKA, N., ZSIGA, M., BUCHWALD, M., RIORDAN, J. R., TSUI, L., AND COLLINS, F. (1989). Identification of the cystic fibrosis gene: Chromosome walking and jumping. Science246:1059-1065. 29. SAIKI, R., GELFAND, D. H., STOFFEL, S., SHARF, S. J., HIGUCHI, R., HORN, G. T., MUUIS, K. B., AM) ERLICH, H. A. (1988). Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science 239: 487-491. 30. SUZUKI, Y., ORITA, M., SHIRAISHI, M., EIAYASHI, K., AND SEKIYA, T. (1990). Detection of ras gene mutations in human lung cancers by single-strand conformation polymorphism analysis of polymerase chain reaction products. Oncogene, in press. 31. WEBER, J. L., AND MAY, P. E. (1989). Abundant class of human DNA polymorphisms which can be typed using the polymerase chain reaction. Amer. J. Hum. Genet. 44: 388-396. 32. WEINER, A. M., DEININGER, P. L., AND EFSTRATIADIS, A. (1986). Nonviral retrotransposons: Genes, pseudogenes and transposable elements generated by the reverse flow of genetic information. Annu. Rev. Biochem. 56: 631-661. 33. YAMAMOTO, T., DAVIS, C. G., BROWN, M. S., SCHNEIDER, W. J., CASEY, M. L., GOLDSTEIN, J. L., AND RUSSEL, D. W. (1984). The human LDL receptor: A cyst&r-rich protein with multiple Alu sequences in its mRNA. Cell 39: 27-38. 34. YANDELL, D. W., AND DRYJA, T. P. (1989). Detection of DNA sequence polymorphisms by enzymatic amplification and direct genomic sequencing. Amer. J. Hum. Genet. 45: 547-555.