VIROLOGY
186, 280-285
(1992)
The Sequence
of the Genome of Adenovirus Type 5 and Its Comparison with the Genome of Adenovirus Type 2
JADWIGA CNRS,
URA
CHROBOCZEK,
1333, lnstitut Max Grenoble Received
FRANK BIEBER,
Von Laue-Paul Langevin, Outstation, 156X, 38042 June
AND
BERNARD
and European Molecular Grenoble Cedex, France
19, 199 1; accepted
September
JACROT Biology
Laboratory,
25, 199 1
We report the sequence of 7558 nucleotides of the adenovirus type 5 genome. With this sequence and previously published data, the complete sequence of this genome is now available and can be compared with the already known sequence of the adenovirus type 2 genome. These two serotypes belong to the same subgroup and sequence comparison shows 94.7% homology between the two genomes. The differences are not at all randomly distributed. Transitions between C and T and between A and G account in total for 58.3% of the differences and even for 68.6% for the genome devoid of the fiber and the hexon genes (instead of 33O/0 expected for an equal probability of changes). In the fiber gene the transitions account for 47% of the differences. The detailed analysis of the nucleotide substitution between the two genomes suggests that the Ad2 genome could derive from that of Ad5 one, with the exception of the fiber gene which is likely to be present in Ad2 genome as a result of genetic recombination. The homology between the amino acids o 1992 Academic sequences of the structural proteins varies from 1 OOYo (proteins pVll and IX) to only 69.2% for the fiber. Press, Inc.
MATERIAL
INTRODUCTION Nucleotide
The human adenovirus constitute a large family with 42 serotypes identified so far. These serotypes are usually classified in seven subgroups essentially on the basis of DNA homology (Green et a/., 1979) determined by methods like restriction mapping or hybridization. The homology is above 50% within a subgroup but less than 25% between subgroups. The complete sequence of the genome of the serotype 2 has been determined (Roberts et al., 1984) and partial sequences of biologically important parts of the genome of several other serotypes can be found in the data banks. Adenovirus type 5 (Ad5) is a serotype belonging to the same group as adenovirus type 2 (Ad2). The parts of the Ad5 genome that were previously sequenced include the left 32% (see Van Ormondt and Galibert, 1984) the region of the hexon gene (Kinloch et al., 1984) the early E3 transcription unit (Cladaras and Wold, 1985) the regions of the fiber gene (Chroboczek and Jacrot, 1987) and of the penton base gene (Neumann et a/., 1988) the region between coordinates 60 and 72% (Kruijer et a/., 1980, 1981) and finally part of the right-hand extremity (see Van Ormondt and Galibert, 1984). We have completed the sequence of the 35,935 bp which constitute the genome of Ad5 and a comparison can be made with the 35,937 bp of Ad2. 0042-6822192
$3.00
CopyrIght Q 1992 by Academtc Press, Inc. All rights of reproducton I” any form resewed.
AND METHODS
sequencing
The DNA used for sequencing was isolated from bacteria carrying either the plasmid Bluescribe Ml 3+, into which we inserted the appropriate Ad5 DNA fragments (map coordinates 27.3-46.6 and 45.1-52.6, respectively) or the plasmid pFG23 (map coordinates 60-100) a kind gift of Frank Graham. Nucleotide sequencing was carried out according to the method of Sanger (1977) by direct plasmid sequencing; sometimes subcloning into M13tg130 and M13tg131 (Amersham) was used. The enzyme used for sequencing was Sequenase from USB. The appropriate oligonucleotide primers were synthesized in EMBL (Heidelberg) and in CENG (Grenoble). The complete sequence was determined on both strands.
Sequence
comparison
The DNA and proteins sequences were analyzed and compared with the help of the program library of the University of Wisconsin Genetics Computer Group.
RESULTS Nucleotide
AND DISCUSSION
sequence
We have sequenced Ad5 DNA between 32and38%(2158bp),45and52%(2470bp),72and 280
coordinates
SEQUENCE
81 61
OF
ADENOVIRUS
TYPE
5 GENOME
281
~SVEKKDSLTAPSEFATTAS~~PTTFPVEAPPLEEEEVIIEQDPGWSEDDEDRS E F VPTEDKKQDQDNAEANKEQVGRGDEpJ,WYLDVWDVLLKHLQMCAIICDALQERSDVp D Q
121
LAIADVSLAYERHLFSPRVPPKRQENGTCEPNPRLNFYPVP
181
LSCPANRSRADKQLALRQGAVIPDIASLNEVpKIFEGLGRDEKRAANALQQENSENESHS D
241
GVLVELEGDNARLAVLKRSIEVTHFAYPALNLPPKVMSTVMSELIVPRAQPLERDANLQE
C
b
1
HnQDATDPAVRMLQSQPSGLNSTDDWRQVMDRI"SLTARNpDAFR‘QPQANRLSAILKA
61
VVPARANPTHEKVLAIVNALAENRAI~DEAGLVYDALLQRVARYNSGN"QTNLDRLVGD
R 301
QTEEGLPAVGDEQIJ.RWLQTREPADLEEp.RKL"NAAVLVTVELECMQRFFADPS"QRKLE E
361
ETLHYTFRQGYVRQACKISNVELCNLVSYLGILHENRLGQNVLHSTLKGWLRRDWRDCV
121
VP.EAVAQRERAQQQGNLGSMVALNAFLSTQPANVPRGQEDYTNFVSALRLMVTETPQSEV
421
YLFLCYTWQTAHGVWQQCLEECNLKELal(LLKQNL~LWTAFNERSV~H~IIFPERL R
181
YQSGPDYFFQTSRQGLQTVNLSQAFKNLQGLWGVMPTGDRATVSSLLTPNSRLLLLLIA
481
LKTLQQGLPDFTSQSMLQNFRNFILERSGILPATCCALPSDNPIKYRECPPPLWGHCYL
241
PFTDSGSVSPDTYLGHLLTLYPXAIGQAHVDEHTFQEITSVSPAIGQEDTGSLEATLNYL
541
LQLANYLAYHSDIMUIVSGDGLLECKCRCNLCTPHRSLV~SQL~ESQIIGTFEL~PS S
301
LTNRRQKIPSLHSLNSEEERILRYVPQSVSLNL"RDG"TPSVALDMTARN"EPGmASNR
601
PDEKSMPGLKLTPGLWTSAYLRlWPEDYHAHEIWYEDY~EI~YEDQSRPPNAELTACVITPGHIL
361
PFINRL"DYLHRAAJ+VNPEYFTNAILNPHWLPPPGFYTGGFEVpEGNDGFLWDDIDDSVF
661
GQLQAINKARQEFLLRKGRGWLDPQSGEELNPIPPPPQ~YQQQP~~SQDGTQKS.M t
421
SPQPQTLLEL(MREQAEMLRRESFRRPSSLSDLWVVLPRGSLTSTRTT F
A
720
AAMTRGRGGILGQSGRGGFGRGGGGSDGRLGSPRRGSFRGRRGVPJU~TVTLGRIPLAGA D Q *
481
RPRLLGEEEYLNNSLLQPQREKNLPPAFPNNGIESLVDKMSRWKTYAQEHRDVpGPp.pPT
780
PEIGNRFQHGYNLRSSGAAGTARSPTQP S R C
541
RRQRRDRQRGLVWEDDDSADDSSVLDLGGSGNPFARLRPRLGRMF
PROTEIN
801
1OOK
PROTEIN
585
IIIa
FIG. 1. Sequences of adenovirus type 5 structural proteins. The sequence of the unpublished Ad5 structural proteins is given using one letter code. Amino acids which are different in Ad2 are given below the corresponding Ad5 amino acid. A star indicates a missing amino acid In one or the other serotype. Protein pVll which is identical in the two serotype is not shown.
76% (1512 bp), and 92 and 96% (1264 bp). We have also sequenced 154 nucleotides which were missing in the sequence on the right-hand side. Altogether we have sequenced some 7564 nucleotides to obtain the complete sequence of the Ad5 gene. More precisely we have sequenced the following fragments: 11,565 to 13,722 between two HindIll sites. 16,286 to 18,919 between a Sfil site and a Smal site. The sequence determination was stopped at 18,765 as previously published sequence started at 18,618. 25,819 to 27,331 between a site defined by a primer synthetized using the published sequence and the EcoRl site. 33,096 to the right-hand DNA end, starting from a Smal site. The sequence was determined between 33,096 and 34,359 and between 34,700 and 34,859 to cover the missing gaps. The complete sequence is available from GenBank (Accession Number M73260). Ad5 DNA comprises 35,935 nucleotides, two less than Ad2 DNA. The two sequences can be easily aligned without ambiguity except in the E3 region. In
this region an alignment slightly different from the one proposed by Cladaras and Wold (1985) can be used. In all, not including the gaps, there are 1,688 mismatches between the two genomes. The best alignment is obtained with 38 gaps (with lengths ranging between 1 and 36 nucleotides and containing in total 2 13 nucleotides) in the Ad5 sequence and 31 gaps (with lengths between 1 and 69 nucleotides and a total length of 211) in the Ad2 sequence. So adding the mismatches and the gaps there are altogether 2112 differences between the genomes of these two serotypes. This corresponds to 5.34% of the genome or to an homology of 94.7% which is lower than that (99%) estimated from DNA hybridization (Green e2 al., 1979). This discrepancy is not surprising as hybridization followed by digestion with the Sl endonuclease is not sensitive to point mutations or one nucleotide gaps which constitute a large part of the differences between the two genomes. It may be worth noting that a previous estimate, using the available restriction maps (Chroboczek and Jacrot, 1987) namely, 96% homology, was not too far from truth. This indicates that, in the absence of
282
52
CHROBOCZEK,
1 60
JACROT
PRRRVQWKGRRVKRVLRPGTTWFTPGERSTRTYKRVYDEWERLGEFAY R GKRHI(DMLALPLDEGNPTPSLKPVTLQQVLPALAPSEEKRGLKRESGDLRPT~JQLMVPKR T
180
QRLEDVLEKMTVEPGLEPEVRVR~~KQVAPGLGVQTVDV~TQT
240
SPVASAVADMVQAVAAAASKTSTEVQTDPWMFRVSAPRRpRGSRKYGAASAI,LPEYALH A R T
300
PSIAPTPGYRGYTYRPRRRATTRRRTTTGTRRRRRRRQPVLApISVRRVAREGGRTLVLP
360
TARYHPSIV
d
36s
PROTEIN
____-----_______________________________--------------------------
61
AND
H~K~IK~~~~~IAP~I~GPPKKEEQ~~KP~~KRVKKKKK~DDDDELDDEVE~~~TA D
120
1
BIEBER,
V
1 60
MEDINFASLAPRHGSRPFMGNWQDIGTSNMSGGRFSWGSLWSG~KNFGSTVKNYGSKAWN I
HA~KK~LQL.~PP~TD~EE~SQ~~VLD~~EEDWES~DEEASEVE~VSDETPSP l P D *
A
SVAFP~~A~Q~~ATG~~~TT~A~QAPPAL~~RR~NRRWDTTGTRAAHTAP~TA L VP I P
120
AATQKQRR~D~KTLTK~KK~T~~LRLAPNEPVFPQSR V
180
GQEQELKIKNRSLRSLTRSCLYHKSEDQLRRTLEDAEALFSKYCALTLKD
229
SSTGQMLRDKLKEQNFQQKG~SGISGWDYWQAVQNKINSKLDPRPPVEEPPPAV
121
ETVSPEGRGEKRPRPDREETLVTQIDEPPSYEEALKQGLPTTRPIAPMATGVLGQHTPVT
181
LDLPPPADTQQKPVLPGPTAVWTRPSRASLRRAASGPRSLRPVASGNWQSTLNSIVGLG M S
241
VQSLKRRRCF
PROTEIN 33K ---___--____-_____-_____________________--------~-----------------1
250
61
PROTEIN
pV1
1
MSKEIPTPY~SYQPQMGLG~QDYSTRINYMSAGPHHISRYNGIRRHRNRILLEQAA
61
ITTTPRNNLNPRSWBAALWQESPAPTTWLPRDAQAEVaMTRSPG
121
QGITHLTIRGRGIQLNDESVSSSLGL~DGTFQIGGAG~SFTPRQAILTLQTSSSEPRS K S
181
GGIGTLQFIEEFVPSVYFNPFSGPPGHYPDQFIPNFDAVKDSADGYD
PROTEIN
MHPVLRQMRePPQQRQEQEQRQTCRAPSPPPTASGGATSAY~ S
A
DLEEGEGLARLGAPSPERYPRVQLKRDTREAWPR(INLFR H
121
LRHGLNRERLLREEDFEPDARTGISPA~HVARADLVTAYRVRT
181
LVAREEVAIGLMHLWDFVSALEQNPNSKP~QLFLIVQHS~NEAF~AL~I~PEGR
241
WLLDLINILQSIWQERSLSLD~MI~SMLSLGKFYARsGF
301
YMRMUXVLTLSDDLGVYRNERIHKAVSVSRRREI,SDRELM"SLQRALAGTGSGDREAES
361
YFDAGADLRWAPSRRALEAAGAGPGLAVAPRRAGNVGGVEEYDEDDEYEPEDGEY
415
227
pVII1
PROTEIN
52/55K
FIG. 1 -Continued
sequence information, this second method should be prefered to the first one to estimate DNA homology. Comparison of the protein
sequences
The newly determined sequences provide the complete amino acid sequence of several late gene products, so far unknown or partially known. This applies to the following proteins: Illa, V, pVI, pVII, 33K, 1OOK,and the 52/55 K protein. These sequences are given in Fig. 1 together with the corresponding Ad2 products. The sequences of the structural proteins and of the proteins involved in the morphogenesis of the virion of both serotypes can be compared (Table la). It was shown previously (Kinloch et a/., 1984; Chroboczek and Jacrot, 1987) that there are quite large differences
for the two structural proteins exposed at the surface of the virion, namely, the hexon and the fiber. For all the other ones there are altogether 52 amino acids different out of 3,853 (1.35%) between the two serotypes. It is difficult to make a similar analysis with the early nonstructural proteins since not all early products are well identified. Table 1b shows the comparison of a few well-characterized early proteins. The main feature is that for the gene products of the families El, E2, and E4 the differences between the two serotypes are of the order of 1% as for the majority of late proteins. The differences are much larger for the E3 products which have already been analyzed by Cladaras and Wold (1985). Although the E3 family may play a role in modulating the host response to the infection, as this is well
SEQUENCE TABLE
OF
ADENOVIRUS
la
DIFFERENCES BETWEEN THE GENES OFTHE MAIN PROTEINS INVOLVED IN THE ARCHITECTURE OR IN THE MORPHOGENESIS OF ADENOVIRUS TYPE 2 AND 5 Differences in DNA sequences (%) Hexon Fiber Penton llla IX PVl pVlll pVll V 1OOK 23K 33K 52/55K
base
17 27 1.3 0.9 0.24 1.2 3.5 0.67 2.2 3.13 0.65 3.22 1.4
amino
Differences in acid sequences
(%)
13.7 30.8 1.4 0.17 0 1.2 0.88 0 1.63 2.35 0.49 3.95 0.72
Note. The frrst nine proteins are structural proteins and the last four are involved in the morphogenesis of the virion. The proteins pVI, pVII, and pVlll are present in the virion in its mature form after a proteolytic cleavage made by the virus coded 23K protease.
established for the 19K protein (Wold era/., 1985) the E3 family is nonessential, as pointed out by Cladaras and Wold (1985) since the virus can be grown even when the corresponding DNA is deleted. In general, the proteins produced by Ad2 and Ad5, with the exception of the fiber, the hexon, and the E3 products, differ by about 1.5%. If this percentage is the one which exists between the two serotypes in the absence of selective immunological and functional pressure, the probability of differences should follow a Poisson distribution. This does not seem to be the case; the existence of two proteins (pVII and IX) with identical amino acids sequences in the two serotypes is highly improbable with such a distribution. Nor does the differences at the DNA level follow a Poisson distribution. The differences in 1 OOK protein amino acid sequence are somewhat above average (see Table 1). This protein attaches to the hexon and takes part in its folding and transport. Hexon protein shows large differences between these two serotypes (Kinloch et al., 1984). One may speculate that the differences in 1 OOK protein from Ad2 and Ad5 are necessary to accomodate the variations in the hexons. More generally it is probable that due to their function some protein can tolerate more amino acid changes. Comparison
of the genomes
There are no significant differences in the frequency of pairs or triplets of nucleotides. The nucleotide com-
TYPE
5 GENOME
283
position of the two genomes is remarkably similar (Table 2); the number of G and C differs by only four nucleotides out of a total of nearly 20,000. This is a striking fact when one considers that 80% of the differences between the two genomes involves a G or a C. A rough calculation shows that, on the basis of random mutations, one expects a difference larger than that observed by at least one order of magnitude. That mutations between the two genomes are not random is confirmed by the analysis of the 1688 changes of nucleotides as shown in Table 3. The most frequent differences are transitions (mutations between purines or between pyrimidines) between C and T and between A and G which account for 58.39/o of the differences between the two genomes instead of 33% expected if mutations between bases were at random. Indeed, this high proportion of transitions corresponds to what is observed (59.2%) for the mutations in pseudogenes (Li et al., 1984) which are supposed to evolve without selective pressure. In this case the observed proportion of transitional mutations is 59% and the most frequent transitions are C to T and G to A. If one does not include the genes for the hexon and the fiber, transitions represent 68.6% of the differences. A chemical mechanism which favors transitions at the expense of transversions in point mutations has been described by Topal and Fresco (1976). This mechanism is based on intermediate states with non canonical base pairing. Another mechanism increases the frequency of transitions from C to T, namely, the conversion of methylated cytosine to thymine upon deamination (Coulondre et a/., 1978; Razin and Riggs,
TABLE DIFFERENCES
lb
IN SOME OF THE GENE PRODUCTS OF EARLY GENES
amino El 6 products 21K 55K E2B products 105K (polymerase) 87K (terminal protein) IVa2 E2A products 72K (DNA binding protein) E3A products w 19K gp 10.5K E4 products 11K
Differences in acid sequences
1.7 2.42 0.29 1.38 1.11 1.7 17.6 3.5 0
(%)
284
CHROBOCZEK,
BIEBER, TABLE
BASE COMPOSITION Total
A C G T G + C/total Note.
In the last two columns
AND
JACROT
2
OF ADS AND ADS GENOMES Fiber
genome
Ad2
Ad5
Ad2
8342 (23.2%) 10045 (28.0%) 9793 (27.3%) 7757 (21.6%) 0.55205
8367 (23.3%) 10073 (28.0%) 9761 (27.2%) 7734 (2 1.5%) 0.55197
568 (32.5%) 433 (24.8%) 31 1 (17.8%) 437 (25.0%) 0.425
the base composition
is given
for the r-strand
1980). This conversion also increases the frequency of transitions from G to A in one strand as a result of a C to T transition in the other strand. In the absence of selective pressure, these mechanisms should account for the ratio of transitions to transversions. If there is a selective pressure, this pressure will favor the silent or conservative mutations. Silent mutations are largely associated with transitions on the third base of codons and they do not permit escaping from the immunological pressure. Most of the conservative mutations are due to transversions. For instance the conservative substitution between aspartic and glutamic acid or between leucine and isoleucine can result only from transversions. So one expects, and one observes (Nei, 1987), that for genes subject to selective pressure the proportion of transitions is not as large as on genes which mutate without this pressure. However, for globin genes this proportion is still higher TABLE
3
DIFFERENT TYPES OF BASE “MUTATIONS”
BETWEEN ADS AND ADS
Ad2
Ad5
I
II
III
T T T C C C A A A G G G
C A G T A G T C G T C A
284 114 68 257 117 55 111 120 212 61 58 231
65 40 20 70 36 11 45 22 40 18 25 49
54 45 24 59 32 11 44 56 54 14 13 52
Note. This table should be read as follows: the first line means that 284 T of Ad2 are replaced by C in Ad5; among them 65 are in the hexon gene and 54 are in the fiber gene. I, Mutations in the complete genome; II, mutations in the hexon gene; Ill, mutations in the fiber gene.
gene Ad5 542 455 317 432
(3 1 .O%) (26.1%) (18.2%) (24.7%) 0.442
of the fiber gene.
than 33% expected for random mutations. The fiber and the hexon carry most, if not all, the antigenic determinants of the virion. There must exist a strong selective pressure on the genes of these two proteins. Indeed the nucleotide substitutions in these genes are rather different from the ones observed in the rest of the adenovirus genome; the transitions account for only 50.8% of the substitutions in the hexon gene and 47.8% in the fiber gene. This last figure is close to the 45.5% observed for globin genes (Gojobori et a/., 1982). One may wonder if one of the two serotypes resulted from mutations on the other one or if both derive from a common ancestor. To try to answer this question, a possible approach is to consider the relative rate of transitions from C or G toward T or A compared to the opposite transitions (T or A towards C or G). Li et a/. (1984) have shown that, in the absence of selective pressure, the first type is more frequent than that of the second one. To do such a comparison one must adjust the percentages of mutations to take into account the
TABLE
4
PERCENTAGE OF THE 12 TYPES OF NUCLEOTIDE SUBSTITUTIONS BETWEEN ADS AND Ao5 GENOMES Ad5
Ad2
A T C G
A
T
C
G
7.67 6.08 12.31
6.94 -
7.51 19.11
13.35 3.25
3.09
13.26 4.57 2.86 -
Note. These percentages have been corrected to take into account the base composition of the genome (Gojobori et al., 1982; see text). This table indicates for instance that 19.1 1% of the mutations going from the Ad2 to the Ad5 genome are a replacement of a TbyaC.
SEQUENCE TABLE
OF ADENOVIRUS
5
PERCENTAGE OF THE 12 TYPES OF NUCLEOTIDES SUBSTITUTIONS BETWEEN ADZ AND ADS GENOMES EXCLUDING THE HEXON AND FIBER GENES Ad5 A
Ad2
A T C G
4.25 5.47 14.11
T
C
G
3.04
5.80 24.20 -
16.3 3.52 3.69 -
14.35 3.15
2.17
fact that the four bases are not in equal number in the genome. This was done according to Gojobori et al., (1982). In Table 4, these normalized percentages are given for the total genome. The most frequent mutation is the one in which a C in Ad5 is replaced by a Tin Ad2. Considering what was said before, this suggests that the evolution has been from Ad5 to Ad2. The point is even stronger if one consider the mutations in the genome, while excluding the hexon and fiber genes (Table 5). Then, not only are the C to T transitions (going from Ad5 to Ad2) more numerous, but also there are also more G to A transitions going from Ad5 to Ad2, as expected if the evolution has been in that direction. The percentages of the different types of nucleotide substitutions are very similar to those found by Wu and Maeda (1987) for a fragment of primate DNA which has no known function and is mildly constrained by selection. However one must be careful before drawing firm conclusions. The mechanism which favors C to T transition is a consequence of DNA methylation. Wienhues and Doerfler (1985) found no evidence for DNA methylation during productive infections in cell culture. However, the situation might be different in the normal life cycle of the virus. A similar comparison made for the fiber gene of the two serotypes shows very different features (Table 6). As already mentioned, the transitions are not so fre-
TABLE
6
PERCENTAGE OF THE 12 TYPES OF NUCLEOTIDE SUBSTITUTIONS BETWEEN Ao2 AND ADS GENOMES IN THE FIBER GENE Ad5 A
Ad2
A T C G
9.99 6.75 15.74
T
C
G
7.79
9.91 12.01 -
9.56 5.33 2.32 -
12.44 4.23
3.93
TYPE
5 GENOME
285
quent as in the rest of the genome. In particular the C (from Ad5) to T (from Ad2) transition is not the more frequent substitution. This certainly reflects the effects of selective pressure. It also suggests that this gene has evolved independently from the rest of the genome and that the final genome of Ad2 results from an evolution of the Ad5 genome with a recombination with a fiber and possibly an hexon gene from a yet unknown origin. REFERENCES CHROBOCZEK, J., and IACROT, B. (1987). The sequence of adenovirus fiber: similarities and difference between serotypes 2 and 5. Virology 161, 549-554. CLADARAS, C.. and WOLD, W. S. (1985). DNA sequence of the early E3 transcription unit of adenovirus 5. Virology 140, 28-43. COULONDRE, C., MILLER, J. H., FARAEJAUGH, P. I. and GILBERT, W. (1978). Molecular basis of base substitution hotspots in fscherichia coli. Nature 274, 775-780. GOJOBORI, T., LI, W.-H., and GAUR, D. (1982). Patterns of nucleotide substitutions in pseudogenes and functional genes. 1. Mol. Evol. 18,360-369. GREEN, M., MACKEY, J. K., WOLD, W. S. M., and RIDGEN, P. (1979). Thirty-one adenovirus serotypes (Ad1 -Ad31) form five groups (A-E) based upon DNA genome homologies. Virology 93, 481-492. KINLOCH, R., MACKAY, N., and MAUTNER, V. (1984). Adenovirus hexon. Sequence comparison of subgroup C serotypes 2 and 5.1. Biol. Chem. 259, 6431-6436. KRUIJER, W., VAN SCHAIK, F. M. A., and SUSSENBACH, J. S. (1980). Nucleotide sequence of a region of adenovirus 5 DNA encoding a hitherto unidentified gene. Nucleic Acids Res. 9, 4439-4457. KRUIJER, W., VAN SCHAIK, F. M. A., and SUSSENBACH, J. S. (1981). Structure and organisation of the gene coding forthe DNA binding protein of adenovirus type 5. Nucleic Acids Res. 10, 4493-4500. LI, W. H., Wu, C. I., and Luo, C. C. (1984). Nonrandomness of point mutation as reflected in nucleotide substitutions in pseudogenes and its evolutionary implications. J. Mol. Evol. 21, 58-71. NEI, M. (1987). “Molecular evolutionary genetics.” Columbia Univ. Press, New York. NEUMANN, R., CHROBOCZEK, J., and JACROT B. (1988). Determination of the nucleotide sequence for the penton base gene of human adenovirus type 5. Gene 69, 153-l 57. RAZIN, A., and RIGGS, A. D. (1980). DNA methylation and gene function. Science 210, 604-610. ROBERTS, R. J.. O’NEILL, K. E., and YEN, C. T. (1984). /. Biol. Chem. 259, 13,968-l 3,985. TOPAL, M. D., and FRESCO, J. R. (1976). Complementary base pairing and the origin of substitution mutations. Nature 263, 285-289. VAN ORMONDT, H., and GALIBERT, F. (1984). Nucleotide sequences of adenovirus DNA. In “The molecular biology of adenoviruses” (W. Doerfler, Ed.), Springer-Verlag, Berlin/New York. WIENHUES. U., and DOERFLER, W. (1985). Lack of evidence for methylation of parental and newly synthesized adenovirus type 2 DNA in productive infections. /. Viral. 56, 320-324. WOLD, W. S. M., CLADARAS, C.. DEUTSCHER, S. L., and KAPOOR, Q. S. (1985). The 19.kDa glycoprotein coded by region E3 of adenovirus. J. Biol. Chem. 260, 2424-2431. Wu. C-l., and MAEDA, N. (1987). Inequality in mutation rates of the two strands of DNA. nature 327, 169-l 70.