Proc. Nati. Acad. Sci. USA Vol. 76, No. 3, pp. 1179-1183, March 1979 Biochemistry

BK virus DNA sequence: Extent of homology with simian virus 40 DNA* (tumor virus/DNA sequence analysis/viral proteins)

ROBERT C. A. YANG AND RAY WUt Section of Biochemistry, Molecular and Cell Biology, Cornell University, Ithaca, New York 14853

Communicated by Leon A. Heppel, December 18, 1978

ABSTRACT The primary nucleotide sequence of three regions of BK virus (BKV) variant (MM) DNA has been determined. The region between map positions 0.715 and 0.900 includes the initiation points and partial coding sequences of the putative VP2 and VP3 proteins of BKV(MM), the amino acid sequences of which show over 80% homology with those of VP2 and VP3 of simian virus 40. The sequence of a potential leader protein X, 66 amino acids long for BKV(MM) and 62 long for simian virus 40, is also deduced. The regions between 0.595 and 0.398 and 0.310 and 0.175 include the coding sequence for the entire small t antigen and most of the large T antigen of BKV(MM). The DNA sequence within these regions comprises over 50% of the complete BKV(MM) genome and shows a 70% sequence homology with the corresponding regions of simian virus 40 DNA. This high degree of homology is at variance with the reported homology values of 11-20% estimated by hybridization measurements of heteroduplex analyses. Possible explanations for the discrepancy are discussed.

BK virus (BKV) is a papovavirus of human origin discovered by Gardner et al. (1). A variant of BKV, BKV(MM), was later isolated from the urine of a patient with Wiskott-Aldrich syndrome, an X-linked recessive disorder characterized by defects of the immune system (2). This disease has been associated with a high incidence of malignancies of the reticuloendothelial system (3). BKV and BKV(MM) DNA are essentially identical in hybridization analysis and have similar physical maps as determined by Howley et al. (4). A more detailed physical map of the two strains of BKV DNA has also shown close similarity except in the region of 0.52-0.58 map positions (unpublished observation). Several hybridization techniques have been used to determine the DNA sequence homology between the genomes of BKV and simian virus 40 (SV40). An overall homology of 11-20% between these viruses has been reported by several investigators who used stringent hybridization conditions (4-6). Localization of BKV DNA sequence homology with respect to the physical map of SV40 led Khoury et al. (5) to conclude that homology is primarily within the late region of the SV40 genome (e.g., Hind fragments D, G, J, and K). Since essentially no reassociation occurred when BKV DNA was incubated in the presence of SV40 fragments (e.g., Hind A, H, and I) of the early region, the reason for the relatively strong crossreaction between BKV T antigen and antiserum against SV40 T antigen was unknown (5). Newell et al. (7) reported values of 20%, 42%, 83%, and 92% homology when the heteroduplexes were mounted for microscopy in 60%, 50%, 40%, and 30% formamide, respectively. Since the hybridization conditions chosen by the investigators were arbitrary, it appears that the exact DNA homology can be obtained only by direct nucleotide sequence analysis. The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U. S. C. §1734 solely to indicate this fact.

In this communication, we present information on the sequence of 2600 nucleotides of BKV(MM) DNA determined by direct DNA sequence analysis. We have found extensive DNA sequence homology (about 70%) between BKV(MM) and SV40 DNA in both the early and late regions of the SV40 genome. In addition, we have located the DNA sequence on the BKV(MM) genome that corresponds to the initiation codons and an extensive coding region of the VP2 and VP3 proteins and the large T antigen, as well as of the entire small t antigen and a late leader protein X.

MATERIALS AND METHODS BKV(MM) and BKV were grown in human embryonic kidney cells, and the DNA was purified as described (8, 9). The BKV(MM) and BKV DNA were cleaved by various restriction enzymes (9, 10) and labeled with 32p in vitro, and sequence analyses were carried out according to the procedure of Maxam and Gilbert (11) except that thinner gels (0.4 mm or 0.6 mm) were used (12). RESULTS AND DISCUSSION DNA sequence analysis of BKV(MM) The entire genome of BKV(MM) has been extensively analyzed by means of physical mapping with restriction endonucleases (8-10). The 90 specific DNA fragments obtained by digestion with 13 different restriction enzymes are all within workable lengths for direct DNA sequence determination by the chemical degradation technique (11) coupled with the use of thin gels so that about 400 nucleotides can be determined from a single labeled terminus (12). For DNA sequence analysis, doublestranded DNA fragments generated by restriction enzymes were terminally labeled at either 5' ends or 3' ends (8, 10). The two labeled ends of each fragment were then separated either by a secondary cleavage with another restriction enzyme or by strand separation (8). Each single end-labeled fragment was subjected to four sets of base-specific chemical reactions (11), and the DNA digests were fractionated by gel electrophoresis.

The DNA sequence at map positions 0.31-0.21 is shown in Fig. 1. The sequence of only the late strand of BKV(MM) DNA, the strand that corresponds to the early mRNA sequence, is shown here together with the SV40 DNA sequence reported by Reddy et al. (13) and Fiers et al. (14). The nucleotides that are identical in BKV(MM) and SV40 DNA are underlined. Out of the 485 nucleotides shown here, approximately 70% are homologous. The sequence is written as triplets in the only Abbreviations: BKV, BK virus; SV40, simian virus 40; tm, melting temperature. * This is paper 7 in a series on "Nucleotide sequence analysis of tumor virus DNA." Paper 6 is ref. 10. t To whom correspondence should be sent. 1179

1180

Biochemistry: Yang and Wu

Proc. Natl. Acad. Sci. USA 76 (1979)

BKV(MM) [SV40

0.31 5' GT GTT TTT GAA GAT GTA AAA GGG ACA GGA GCT GAA TCA AAG GAT TTG CCT TCA GGA CAT GGA ATA AAC AAT TTA GAC AGT 70 60 80 50 1 10 20 40 30 0.31 5' TA GiTT TT GAG GAT GIA AAG GGC ACT GGA GGG GAG TCC AGA GAT TTG CCT TCA GGT CAG GGA ATT AAT AAC CTG GAC ATT 3327

BKV(NMli)

TTG AGA GAT TAT TTA GAT GGA AGT GTT AAG GIA AAT TTA GAA AAG AAA CAT TTA AAC AAA AGA ACC CAA ATA TTT CCA CCA

SV40

TTA

AGG

BKV(MM)

GGC

TTG GTT ACA ATG AAT GAG TAT CCT GTC CCT AAA ACC CTG CAA GCT AGA TTT GTA AGA CAA ATA GAT TTT AGG CCC AAA

SV40

230 240 220 210 170 180 200 190 GGA ATA GTC ACC ATG AAT GAG TAC AGT GTG CCT AAA ACA CTG CAG GCC AGA Tll GIA AAA CAA ATA GAT ITT AGG CCC AAA

BKV(MM)

ATA TAT TTA AGA AAA TCC TTA CAA AAC TCA GAG TTC TTA CTT GAA AAA AGA ATT TTA CAA AGT GGA ATG ACC TTG TTG CTA

SV40

GAT TAT TTA AAG CAT TGC CTG GAA CGC AGT GAG TTT TTG TTA GAA AAG AGA ATA ATT CAA AGT GGC ATT GCT TTG CTT CTT

bNV(MM)

CTG CTA ATT TGG TTT AGG CCT GIA GCT GAT ITT GCA ACT GAT ATA CAA TCT AGA ATT GTT GAA TGG AAG GAA AGG CTG GAT

SV40

ATG TTA ATT TGG TAC AGA CCT GTG GCT GAG TIT GCT TCT GAG ATA AGI AIT TAT ACT ITT TCA AGG AJG AAA

GAT

90 TAT

100

TTG GAT GGC AGT GTT

250

260

330

BKV(MM)

340

110 AAG GTA AAC

270

290

280

350

140

130

120

160

150

TTA GAA AAG AAA CAC CTA AAT AAA AGA ACT CAA ATA TTT CCC CCT

360 CAA

370

380

AGT ATT CAG AGC AGA ATT

GIG GAG

320

310

300

390 TGG AAA GAG

AGA

400 TTG GAC

!AI AM AIA TGC AN 6G6 AAA T1T ATl CIT GE ATT ACA AA GAA 480 470 450 430 440 460 SV40 AGT TTG GTG TAT CAA AAA ATG AAG ITT AAT GTG GCT ATG GGA ATT GGA GTT TTA GAT TGG CTA AGA AAC 2345 FIG. 1. Nucleotide sequence of BKV(MM) DNA between map positions 0.219 and 0.310. Only the late strand (corresponding to the early mRNA) of the BKV(MM) genome is given. The sequence is presented in triplets according to the only unique reading frame. The SV40 DNA sequence (13, 14) is also given for comparison. The nucleotides in homology between BKV(MM) and SV40 DNA are underlined. The SV40 nucleotide numbers in this and other figures are adopted from the published data of Reddy et al. (13).

410 AAA GAG TTT

0.55b 4 70

BKV(,IM)

his

0.547(T2) 1 85 TRP SER SER SER GLU VAL CYs ALA AsP PHE PRO LEU CYS PRO PHE GLY THR TTT GGA A(C xxx TGG AGT AGC TCA GAG GTT TGT GCT GAT UTT CCT CTT TGC CCA 270 280 290 300 310 90 85 80 75 PHE GLY GLY PHE TRP AsP ALA THR GLU VAL PHE ALA SER SER LEU ASN PRO SLY 75

GLN PRO AsP

CAT CAG

CCT GAT 260

70

;V40

420 TCA

His GLN PRO AsP

80

CAT CAA CCT GAC TTT GGA GGC TTC T6G GAT GCA ACT GAG GTA 4874

T

TRP

95

VAL AsP ALA MET TRY CYs LYs GLN GGT GTT GAT GCA ATG TAC TGC AAA 4799

CM

0.601(T-SPLICE)

0.38

4

ITT GCT TCT TCC TTA AAT CCT

90 95 VAL THR LEU TYR CYs LYs GLU GTT ACC CTG xxx TAC TGC AAG GAA 330 320

100

40 LEU MET LYs

PRO MET PRO

GTA ATT ATT TTT UTT ATA GGT GCC AAC CTA T6G AAC AGA AGA GTG GGA GTC CTG GTG GAG TTC CTTTA ATG AAP BKV(MM) TGG CCT ATG CCC SZ 400 410 420 340 350 360 370 380 390 0.547 0.534(T2) J.51'9 4. 100 4, 1174 TRP PRO L[ ILE CTA Ej GGT AAA TAT AAA ATT TTT AAG T1T ATA ATG TGT TAA ACT ACT GAT TCT AAT TGT TTG TGT ATT TTA G SV40 IGG CCT ATT

0.522 50

45

60

55

65

70

ASN GLY MET LYs ILE ILE CYs His GLU ASP MET PHE ALA SER Asp GLU GLU ALA THR ALA AsP SER GLN HIS SER THR PRO PRO LYs LYs BKV(MM,) AAT GGG ATG AAG ATT ATT TGC CAT GAA GAT ATG TTT GCC AGT GAT CAA AM CCA ACA GCA GAT TCT CAA CAC TCA ACA CCA CCC AAA AAA 430

440

450

460

470

480

490

500

510

0. 52'

1

20

25 40 45 35 30 ASN G.U G.U ASN LEU PHE CYS SER GLU G.LU MET PRO SER SER Asp ASP GLu ALA THR ALA ASP SER rGLN HliS SER THR PRO PRO LYS LYS AAT GAA CTG TTT GAG AAC ISV40TGC TCA GAA GA ATG CCA TCT A(T GAT GAT GAG GCT ACT GCT GAC TCT CAA CAT TCT ACT CCT CCA AAA AAG

444J1

75 30 90 95 100 35 LYS ARG LYS VAL GLU ASP PRO LYs AsP PHE PRO SER AsP LEU His GLN PHE LEU SER GLN ALA VAL PHE SER ASN ARG THR LEu ALA CYS BKV(MM) AAA AGA AAG GTA GAA GAC CCT AAA GAC TUT CCC TCT GAT CTA CAC CAG UTT CTT AGT CAA GCT GTA TTT AGT AAT AGA ACC CTT GCC TGC 530 540 550 560 570 530 520 590 600

SV40

LYs ARG LYs VAL GLu AsP PRO LYs AsP PHE PRO SER 'Lu LEu LEU SER PHE LEU SER His ALA VAL PHE SER ASN ARG THR LEu ALA CYS AAG AGA AAG GTA GAA GAC CCC AAG GAC TTT CCT TCA GAA TTG CTA AGT Tll TTG AGT CAT GCT GTSC TTT AGT AAT AGA ACT CTT GCT TGC

4264

FIG. 2. Nucleotide sequence of BKV(MM) DNA between map positions 0.490 and 0.556. The nucleotide sequence is written in triplets from the late strand (corresponding to the early mRNA) of the BKV(MM) genome. Nucleotides 1-251, which have been published elsewhere (10), are not included. Two putative proteins are deduced and placed above the nucleotide sequence. BKV(MM) t antigen gene terminates at nucleotide 347, and T2 antigen gene (second part of large T antigen) starts at nucleotide 299. Amino acids 1-39 of BKV(MM) T2 antigen are not written in this figure. In the SV40 DNA sequence, bracket [I] corresponds to nucleotides 4778-4560 and bracket [II] corresponds to nucleotides 4494-4440 (not shown in this figure). Thus, SV40 DNA contains about 260 nucleotides more than BKV(MM) DNA in this region.

Biochemistry: Yang and Wu Hindlil

HaeIII

MboI

I

l

1

Proc. Natl. Acad. Sci. USA 76 (1979)

0 547 0.5380.522

0.595 BK V(PAM)t *

SV40 ( SV4O(

o--T2 i

T1

::>

T2

*

t 1>>T

0~~~~~.589

0T.589

0.648

0.534

FIG. 3. Relative location of the large T and small t antigen genes on the BKV(MM) and SV40 genomes. Ti and T2 are the first part and the continuous part, respectively, of the large T antigen gene in both viral genomes. The shaded regions refer to the regions of homology in both nucleotide and amino acid sequences. XXX, the intervening sequence in SV40 DNA.

possible reading frame. The derived protein sequence, presumably a part of the T antigen sequence near the carboxyl end of the protein, exhibits a 75% homology (details of this and the remaining coding sequence for T antigen will be reported elsewhere). Another region of the T antigen coding sequence near the amino terminus, a small portion of which has been reported (10, 15), is shown in Fig. 2. Again, a high degree of both DNA and protein sequence homology is found between BKV(MM) and SV40. The high degree of amino acid sequence homology between the SV40 T antigen (13, 14) and the predicted sequence for BKV(MM) T antigen may explain the reported strong crossreactivity between BKV-infected cells and antiserum against SV40 T antigen (5, 16). The relative locations on the SV40 genome for large T and small t antigen genes (13, 14) and those of the putative BKV(MM) large T and small t antigen

1181

genes are shown in Fig. 3. In both viruses, the large T antigen can be coded for by two noncontiguous regions of DNA sequence. On BKV(MM) DNA, the coding sequence for the T2 fraction (the continuous part of the large T antigen) starts at nucleotide 299 and overlaps 49 nucleotides (map positions 0.547-0.537) with a possible coding sequence of small t antigen. The reading frames for T2 and small t antigen are different in this region. However, the actual point of splicing for BKV(MM) large T antigen coding sequence remains to be determined. The small t antigen of SV40 is predicted to be 174 amino acids in length, whereas that of BKV(MM) DNA is only 100 amino acids in length. The DNA sequence of a third region of BKV(MM) DNA has

been analyzed. This region spans map locations 0.715-0.90. The DNA sequence, written in Fig. 4 as triplets, has the capacity to code for two different proteins. Nucleotides 1-198 of BKV(MM) DNA can code for a small protein of 66 amino acids, designated leader protein X, before reaching the termination codon TAG. The corresponding region of the SV40 genome as reported by Reddy et al. (13) can code for a protein of 62 amino acids. These two putative proteins share a number of common amino acids. If such proteins indeed are synthesized in vivo, it would be of interest to isolate them and determine their biological functions. Starting from nucleotide 239 of the BKV(MM) DNA sequence, a protein sequence can be written which shows a high degree of homology with the known VP2 protein of SV40 (14, 17,

18). Comparison of DNA homology between BKV(MM) and SV40 as determined by sequence analysis versus hybridization methods The sequence homology between BKV DNA and SV40 DNA determined by various stringent hybridization methods gave values ranging from 11% to 20% (4-6), whereas heteroduplex

Px 5 10 15 20 25 MET VAL LEu ARG GLN LEU SER ARG GLN ALA SER VAL LYs VAL GLY Lys THR TRP THR GLY THR LYs LYs ARG ALA GLN ARG _

_

EMGTT

CTG CGC CAG CTG TCA CGA CA GCT TCA GTG AAA GTT GGT AM ACC TGG ACT GGA ACA AAA AAM AGA GCT CAG AGG 10 20 30 40 50 60 70 80 MET VAL LEU ARG ARG LEU SER ARG GLN ALA SER VAL Lys VAL ARG ARG SER TRP THR GLU SER Lys Lys THR ALA GLN ARG 0.72 '3 5' ATG GTG CTG CGC CGG CTG TCA CGC CAG GCC TCC GTT MG GTT CGT AGG TCA TGG ACT GAA AGT AAA AAA ACA GCT CAA CGC 253

BKV(MM) 0.71L5 5'

1

SV40

30

35

40

45

_

ILE PHE ILE PHE ILE LEU GLU LEU LEU LEU GLU PHE Cys ARG GLY GLu AsP SER VAL AsP GLY ATT TTT ATT TTT ATT TTA GAG CTT TTG CTG GM TTT TGT AGA GGT GM GAC AGT GTA GAC GGG 90 100 110 120 130 140 LEU PHE VAL PHE VAL LEu GL6 LEu LEU LEU GLU PHE CYS (iLU 6LY GLu AsP THR VAL AsP GLY CTT TT GTG TTT GTT TTA GAG CTT TTG CTG CAA TTT TGT GAM GGG GM GAT ACT GTC GAC GGG

BKv(rO1) [SV40

60

ALA LEU PRO ALA GCT TTA CCT GCT 170 ARG LEu THR GLU AGG TTA ACT GAA

BKV(MM)

iSV40

65

190

LYs PRO GLU SER

AAA CCA GAA AGT

LYs ARG LYs LYs PRO GLU AM CGC AM AAA CCA GAA

0753

VAL LYs AsP SER VAL Lys Asp SER l GTA AAA GAC TCT GTA AAA GAC TCC Eji GTA AG_ MT

130

50 Lys ASN Lys SER THR THR MA AAC AAA AGT ACC ACT 150 160

200

C[C

TTT TTT TTT GTA m CCA GGT TC

210

230

220

TM CTG GTA AGT450TTA GTC TTT TTG ICT TTT ATT TCA GGT CC

VP2 BKV(MM)

5'

GGT GCT GCT CTA GCA 250

CTT TTG

CTA GTT GCC AGT GTA TCT GAG GCT GCT GCT GCC ACA GGA

m

GCT GCT TTA ACA CTG TTG GGG GAC CTA ATT GCT ACT GTG TCT GM GCT GCT GCT GCT ACT GGA F3GGT 480

m

240

260

GGG GAC

270

280

290

300

310

TCA

TCA 554 FIG. 4. Nucleotide sequence of BKV(MM) DNA between map positions 0.715 and 0.780. Homology in amino acid sequences between a putative BKV(MM) and an SV40 late leader protein X (Px) is marked. Nucleotides 1-198 have recently been reported (8). The initiation (ATG) codon of VP2 is indicated. [SV40

5'

1182

Proc. Natl. Acad. Sci. USA 76 (1979)

Biochemistry: Yang and Wu

analysis gave a value of 20% homology (7). These values agree reasonably well, and one is tempted to conclude that all of these methods are reliable in quantitatively estimating sequence homology between two species of DNA. However, the percentage homology, as revealed by direct DNA sequence analysis of BKV(MM) DNA (Figs. 1, 2, and 4 and unpublished data) and of SV40 DNA (13, 14), is an average of 70% in most regions so far examined. The regions not shown here (e.g., map positions 0.92-0.16 clockwise) have an even higher degree of homology (7). Table 1 shows that the homology estimates from heteroduplex analysis under stringent conditions (column b) do not correspond at all to values obtained by direct DNA sequence analysis (column a). For example, the sequence shown in Fig. 1 between nucleotides 40 and 430 (map positions 0.225-0.300) showed no homology at all by heteroduplex analysis (7) at melting temperature (tim) -130C (column b). However, direct DNA sequence analysis revealed a 73% homology (average of 65%, 74%, and 80%). In examining the DNA sequence of BKV(MM) and SV40 in this region, we found that the longest stretch of sequence with perfect homology is only 22 nucleotides in length (221-242), and the next longest stretches are three 14-nucleotide-long segments; two of them are interrupted by a single nucleotide. The majority of the sequence showed very short stretches of perfect homology (e.g., 2-8 nucleotides in any Table 1. Comparison of homology between BKV(MM) DNA and SV40 DNA

Map position

Percent homology by direct DNA sequence

on SV40

analysis*

tm-130C

DNA

(a)

(b)

(c)

(d)

0.175-0.200 0.200-0.225 0.225-0.250 0.250-0.275 0.275-0.300 0.300-0.325 0.325-0.350 0.400-0.425 0.425-0.450 0.450-0.475 0.475-0.500 0.500-0.525 0.600-0.625 0.625-0.650 0.725-0.750 0.750-0.775 0.775-0.800 0.800-0.825

23 28 65 74 80 74 75 76 76 78 79 70 68 76 78 50 84 78 81 70 73

12 0 0 0 0 0 0 0 0 6 0 6 0 0 23 23 23 14 6 0

10 10 15 54 81 90 90 66 71 95 84 70 42 42 72 76 81 76 72 36 40

0 10 35 91 100 100 100 100 100 100 100 100 91 91 82 91 91

0.825-0.850 0.850-0.875 0.875-0.900 Average %

Homology by heteroduplex analysist at

6

tm-200C tm-28'C

91 91 91

91

'70 83 homology 6 60 Over 50% of the genome of BKV(MM) and SV40 is compared. Not shown here is the report by Dhar et al. (15) of 60-80% DNA sequence homology in the region of map positions 0.64-0.68. It is expected that when the entire genome is compared the data will be similar to that shown here. * Percentage homology is calculated by comparing the DNA sequence of BKV(MM) shown in this paper with that of the published SV40 sequence of Reddy et al. (13) and Fiers et al. (14). t Percentage homology is calculated from the heteroduplex data presented by Newell et al. (7).

a BKV NNNNNNNNNNNNNNNNNNNNOOOOOOOOOO SV40 NNNNNNNNNNNNNNNNNNNNOOOOOOOOOO (N)20 (°),o b BKV NNNNNNNNONNNNNNNNNNNNOOOOOOOOO SV40 NNNNNNNNONNNNNNNNNNNNOOOOOOOOO 0 (N)g (N), 2 (0)9 FIG. 5. Examples of two types of sequence homology. Structure a, perfect homology within the N region; structure b, hyphenated homology. Only a 30-nucleotide-long segment is shown; the actual DNA sequence is assumed to consist of multiple repeats of the types shown in structure a or b. The choice of the length (20 long) of the homologous sequence is arbitrary. N, homologous nucleotides; 0, nonhomologous nucleotides. N/N+O = 20/30 = 66% DNA sequence homology between BKV and SV40.

given stretch). It appears that all these short stretches of perfect homology, including the stretch 22 nucleotides in length, will not appear as duplex DNA by- heteroduplex analysis at tm-13°C or by the hydroxyapatite method of homology analysis (5). Under less stringent conditions (columns c and d), the correspondence between heteroduplex analysis and DNA sequence analysis is better, but far from exact. For example, the correspondence is fairly good at map positions 0.275-0.300 but poor at map positions 0.225-0.250. Thus, one must conclude that homology studies that utilize either the well established hybridization procedures or heteroduplex analysis can give only qualitative or comparative data. Quantitative data must depend on direct DNA sequence analysis or an improved hybridization procedure. The difficulties in using the hybridization and heteroduplex analyses for determination of the exact extent of homology between two species of DNA molecules are inherent in the methods. One of two procedures for determination of the relative proportions of duplex DNA and single-stranded DNA is usually used in hybridization methods. Hydroxyapatite columns at 60-65°C separate single-stranded DNA from duplex DNA (19). Alternatively, single-strand-specific nuclease S1 may be used to degrade single-stranded DNA, and the radioactivity of the remaining duplex DNA can be measured. The methods are probably quantitative only under conditions in which the homology within long stretches of DNA is either nearly perfect or zero. When homology is partial (e.g., between 10 and 90%) and perfectly matched regions are therefore relatively short, these methods are not likely to give reliable quantitative data. Two hypothetical examples given in Fig. 5 illustrate this point. In Fig. 5 structure a, if a stretch of perfect homology is 20 nucleotides in length, the t m of this segment at 0.14 M salt (NaCl or sodium phosphate) may be near 65°C. Then, under stringent hybridization conditions (7) [e.g., tm of DNA - 13°C = 77°C assuming a t m of 90°C (20)], the above sequence or a multiple unit of structure a would show no homology at all. On the other hand, under less stringent hybridization conditions (e.g., at 600C), structure a or a multiple unit of it may be entirely retained on a hydroxyapatite column, showing 100% homology; similarly, a multiple unit of structure a may appear as a perfect duplex under the electron microscope if the 20-nucleotide-long homologous regions can hold the adjacent lO-nucleotide-long nonhomologous regions together. Thus, an actual sequence homology of 66% may appear as 100% under less stringent conditions and as 0% under stringent hybridization conditions. If the hybridization condition is between stringent and nonstringent, the homology can be anywhere between 0 and 100%,

Proc. Natl. Acad. Sci. USA 76 (1979)

Biochemistry: Yang and Wu and depends on the conditions chosen and the specific DNA molecules under consideration. Structure b in Fig. 5 shows a hyphenated homology with an unmatched base pair between completely homologous segments. This duplex is less stable than that of structure a. Just how much less stable depends on whether the unmatched base pair (0:0) is G-T, A-C, T-C, or G-A and on the location of the mismatched base pair(s) within the 20-nucleotide-long homologous region. The stability of the above structure will also depend on the neighboring sequence and the G+C content. Without knowing these facts for given pairs of DNA molecules, one cannot predict the stability of such structures under any given hybridization conditions. Thus, although the method of hybridization is a very useful technique in determining the relative homology of different species of DNA (19), the percentage homology obtained should not be considered quantitative. The use of single-strand-specific nuclease Si in place of the hydroxyapatite column (19) in distinguishing between double-stranded and single-stranded DNA segments also has its problems. For example, a double-stranded 20-nucleotide-long duplex remaining after nuclease S1 digestion may be only partially precipitable with trichloroacetic acid or partly retained on a filter. Structure b (or much longer DNA segments with a similar type of hyphenated homology) is even less likely to be detected as nuclease S1 resistant double-stranded DNA. Thus the method employing nuclease Si will tend to give a low estimate of the percentage homology. One possible improvement involves the use of paper chromatography or homochromatography to determine the amount of mononucleotides instead of acid-soluble nucleotides liberated by nuclease S1. In summary, extensive DNA sequence analysis of BKV(MM) DNA has been carried out. This sequence shows a 70% homology with the corresponding regions of SV40 DNA. This value is much higher than the reported homology values of 11-20% estimated by hybridization analyses. An explanation is offered for resolving this discrepancy. We thank A. Young and S. Hanson for valuable assistance. This work supported by Research Grant CA-14989 awarded by the National

was

1183

Cancer Institute and Grant VC-216 from the American Cancer Society.

1. Gardner, S. D., Field, A. M., Coleman, D. V. & Hulme, B. (1971) Lancet i, 1253-1257. 2. Takemoto, K. K., Rabson, A. S., Mullarkey, M. F., Blaese, R. M., Garon, C. F. & Nelson, D. (1974) J. Natl. Cancer Inst. 53, 1205-1207. 3. Gatti, R. A. & Good, R. A. (1971) Cancer 28, 89-98. 4. Howley, P. M., Khoury, G., Byrne, J. C., Takemoto, K. K. & Martin, M. A. (1975) Virology 16,959-973. 5. Khoury, G., Howley, P. M., Garon, C., Mullarkey, M. F., Takemoto, K. K. & Martin, M. A. (1975) Proc. Natl. Acad. Sci. USA 72, 2563-2567. 6. Wold, W. S. M., Mackey, J. K., Brackmann, K. H., Takemori, N., Rigden, P. & Green, M. (1978) Proc. Natl. Acad. Sci. USA 75, 454-458. 7. Newell, N., Lai, C.-J., Khoury, G. & Kelly, T. J., Jr. (1978) J. Virol. 25, 193-201. 8. Yang, R. C. A. & Wu, R. (1978) Proc. Natl. Acad. Sci. USA 75, 2150-2154. 9. Yang, R. C. A. & Wu, R. (1978) J. Virol. 28,851-864. 10. Yang, R. C. A. & Wu, R. (1979) Virology, in press. 11. Maxam, A. M. & Gilbert, W. (1977) Proc. Natl. Acad. Sci. USA

74,560-564. 12. Sanger, F. & Coulson, A. R. (1978) FEBS Lett. 87, 107-110. 13. Reddy, V. B., Thimmappaya, B., Dhar, R., Subramanian, K. N., Zain, B. S., Pan, J., Ghosh, P. K., Celma, M. L. & Weissman, S. M. (1978) Science 200, 494-502. 14. Fiers, W., Contreras, R., Haegeman, R., Rogiers, R., Van de Voorde, A., Van Heuverswyn, H., Van Herreweghe, J., Volekaert, G. & Ysebaert, M. (1978) Nature (London) 273, 113-120. 15. Dhar, R., Lai, C. & Khoury, G. (1978) Cell 13,345-358. 16. Takemoto, K. K. & Mullarkey, M. F. (1973) J. Virol. 12, 625631. 17. Ghosh, P. K., Reddy, V. B., Swinscoe, J., Choudary, P. V., Lebowitz, P. & Weissman, S. M. (1978) J. Biol. Chem. 253, 3643-3647. 18. Contreras, R., Rogiers, R., Van de Voorde, A. & Fiers, W. (1977) Cell 12, 529-538. 19. Britten, R. J., Graham, D. E. & Neufeld, B. R. (1974) Methods

Enzymol. 29, 363-418. 20. Doty, P. (1962) Biochem. Soc. Symp. 21, 8-28.

BK virus DNA sequence: extent of homology with simian virus 40 DNA.

Proc. Nati. Acad. Sci. USA Vol. 76, No. 3, pp. 1179-1183, March 1979 Biochemistry BK virus DNA sequence: Extent of homology with simian virus 40 DNA*...
926KB Sizes 0 Downloads 0 Views