VIROLOGY
179,339-346
(1990)
Sequence
BRUCE D. PARKER,* *Ludwig
and Transcription of Raji Epstein-Barr Spanning the B95-8 Deletion Region
ALAN BANKIER,t
Institute
SANDRA
SATCHWELL,t
for Cancer Research, St Mary’s and tMRC Laboratory of Molecular Received
BART BARRELL,t
Hospital Medical School, Norfolk Biology, Hills Road, Cambridge
April 30, 1990; accepted
Virus DNA
AND
Place, London C/32 2QH
PAUL J. FARRELL*,’ W2 IPG;
/u/y 3, 1990
The DNA sequence of Raji DNA spanning the deletion found in 695-8 cells has been determined. Three open reading frames and a region of homology with the BarnHI-H fragment are found within the deletion. The deletion contains a region of 102-bp repeats which is transcribed into an mRNA. The Raji sequence reported here varies slightly from a smaller M-ABA sequence reported previously. This paper completes the sequence of all parts of the wild-type EpsteinBarr VirUS genome. 0 1990Academic Press, Inc.
INTRODUCTION
clones of EcoRI-C were also made by inserting restriction endonuclease fragments into pUCl9. Regions of ambiguity obtained during Ml 3 sequencing were confirmed by double-stranded DNA sequencing (Chen and Seeburg, 1985) of appropriate parts of the pUC clones using oligonucleotide primers flanking the ambiguous regions. Thus 99.5% of the sequence was determined on both strands of the Raji DNA with an average gel reading density of 6.97. Computer analysis and compilation of the DNA sequence were performed using the programs of Staden (1982, 1984).
Epstein-Barr virus (EBV) is a human herpesvirus which infects human B lymphocytes and particular epithelial cells; EBV can immortalize human B lymphocytes at high efficiency. The virus is linked with the development of Burkitt’s lymphoma and nasopharyngeal carcinoma and EBV is the principal cause of infectious mononucleosis (for a review see Klein, 1989). The genome of the most-studied strain of EBV (B958) has been completely sequenced (Baer et al., 1984). However, the B95-8 genome contains a deletion of approximately 12 kb relative to other EBV strains (RaabTraub et a/., 1980). The deletion removes one of the origins of Iytic replication (oriLyt) (Hammerschmidt and Sugden, 1988) and part or all of at least three potential coding regions. We have now determined the DNA sequence of the region of the genome of the Raji strain of EBV which spans this deletion. In this paper we report this sequence and the potential genetic organization of this portion of the viral genome. MATERIALS
AND
Isolation
Raji and P3HRI (Pulvertaft, 1965; Hinuma et a/., 1967) cell lines are derived from Burkitt’s lymphomas. AG876-CR is a lymphoblastoid cell line made by immortalizing B lymphocytes from an EBV-negative British donor with the AG876 strain (Pizzo et a/., 1978) of EBV. Cell lines AG876-CR, Raji, and P3HRI (which all contain EBV) were grown to a density of 5 X 1 O5 cells/ ml and induced with TPA at 30 rig/ml for 3 days. Some TPA-treated cells were also treated with PAA at 125 pg/ml. Control cells were grown under identical conditions without TPA or PAA. P3HRI-superinfected Raji cells were prepared as described previously (Biggin et a/., 1987). RNA was prepared either as described previously (Biggin et al., 1984) or by the acidic phenol/guanidine thiocyanate/chloroform extraction method (Chomczynski and Sacchi, 1987). With the exception of P3HRI-superinfected Raji RNA, all RNA samples were poly(A)-selected.
METHODS
DNA sequencing A library of M 13 clones was prepared from sonicated DNA (Bankier and Barrell, 1983) of the cloned EcoRI-C fragment of Raji EBV DNA. M 13 clones were picked at random and sequenced using the dideoxynucleotide chain termination method @anger et al., 1977). SubSequence EMBUGenBank ’ To whom
data
from this article have been deposited with Data Libraries under Accession No. M35547. requests for reprints should be addressed.
of RNA
the
339
0042-6822/90
$3.00
Copyright 0 1990 by Academc Press. Inc All rights of reproduction in any form reserved.
340
PARKER IO
20
30
40
ET AL.
50
60
70
80
90
GGCCGCTGTT CACCTAAAGT
GACGCAAGGT CTGTCAGCCG CCAGGGTCCG TTTACCAGGC TTTCAGGTGT GGAATTTAGA TAGAGTGGGT GTGTGCTCTT
100
GTTTAATTAC ACCAAGATCA
CCACCCTCTA
200
GGACGCAGGC ATACAAGGTT ATTACCCAGT
TCCATATCCC
ACAATTGATA
AACCTCCGCA
TGTCCAACCA
CAGGATGTGG CACCCTAAGA
GGTAGCATCA TTTACACTAA
AAGCAGTGAC
300
CTTGTTGGTA CTTTAAGGTT GGTCCAATCC ATAGGCTTTT TTTGTGAAAA CCCGGGGATC GGACTAGCCT TAGAGTAACT CAAGGCCAAG
CATTTCACAC
400
TGCAGCTTTA GATGGCAAGG AAACTTGGGT TTCAGGCATA GAAAGCCTGG CTCACTATAG
500
CTTAGAAAAC
TCTAAAGGCC
600
TGTGGCTAGA CGTATGGCCT ACCCAAGACG
TTGGGGGTCT CGGGTAGGCC
700
CAGTTCCCAA
TCTCCTGCCA
TGAGGACTTA
800
CTTCAGCGGG ACACGTGTTT
900
CTGCAAATGC ACCATGTAAC
CACAGATCTA
CCTTGTATGC CTGGTGTCCC CTTAGTGGGA CGCAGGCCTA
CCACGTTGAA
AACTGAAAGT
CAGCCCATGT TTGTTCCAGG GTGGGGGAAA GGCACGTGCC TTTCCTCACA ATACAAATGT
TTAGCTGCAA AAATTCTATT
TACTAACGTC TGCCCTCTGG AGACCTGCTA
ATGATTCTTC CAGGCATAGG TTACAACCAG
TCACTGCTAT CAAGCCTACT
CGCAGCACAT
GTGTTGGGAG AGCCTCTATA ACCCCCCGCC
TGGCAGTGTT TACTGTTCTG CTTTTACTCT TGGACCAGGC TGTCATTCTA TCAGAATAAC
AGGGGAAGCA AGGCCCCCTG
CTAGAATCTC GGAGCCAATA
ACTACCTGCC
CATACACGTG ATGTAAGTTT AGCCAGTTTA TTGTTACACC
1000
AATGCCCCGA
CTGTCCCTTT GGGTCTCAGG ACCCAGCCCT h
GGAGCTCGGG GGCGGCCGGG TGGCCCACCG GGTCCGCTGG GTCCGCTGCC
1100
CCGCTCCGGC GGGGGGTGGC CGGCTGCAGC CGGGTCCGGG GTTCCGGCCC TGGAGCTCGG GGGGCGGCCG GGTGGCCCAC CGGGTCCGCT GGGTCCGCTG
1200
AAGTCTCCCC
CCTCTAATCT
GTATGCTGCA TGAAAAACCA
>
< CCCCGCTCCG GCGGGGGGTG GCCGGCTGCA GCCGGGTCCG GGGTTCCGGC CCTGGAGCTC GGGGGGCGGC CGGGTGGCCC ACCGGGTCCG
TGCCCCGCTC
C-GCGGGG
------------------------CONTINUES
VlTH
20
MORE
-------------------------------------------.----------~-----~~-GG
REpEAT
CGCTGGGTCC GCTGCCCCGC
1300
G,-CCTGGf,GC
3400
“NITS------------------------------.
TGGCCGG,--G
TCGGGGGGCG GCCGGGTGGC CCACCGGGTC
CTGGGTCCGC
CAGCCGGGTC
CGGGGTTCCG
TCCGGCGGGG GGTGGCCGGC TGCAGCCGGG TCCGGGGTTC CGGCCCTGGA
3500
> GCTCGGGGGG CGGCCGGGTG GCCCACCGGG TCCGCTGGGT CCGCTGCCCC
GCTCCGGCGG GGATGGGGGT GCGCTCCCAG
GCCGGACCCT GGTGCCAGGC
3600
AGGGACCCCG CGCCACCCGC
CCGGGCCGGC TGGGAGGTGT GCACCCCCCG
AGCGTCTGGA CGACGCTGGC
3700
GAGCCGGGCC AGCTCGCCTT CTTTTATCCT CTTTTTGGGG TCTCTGTGCA ATACCTTAAG GTTTGCTCAG GAGTGGGGGG CTTCTCATTG GTTAATTCAG GTGTGTGATT TTAGCCCGTT GGGTTACATT AAGGTGTGTA ACCAGGTGGG TGGTACCTGG AGGTCATTCT ATTGGGATAA CGAGAGGAGG AGGGGCTAGA
3800
GGCCCGCGAG ATTTGGGGTA GGCGGAGCCT CAGGAGGGTC CCCTCCATAG
TACCTAGTGG
4000
CCAGGTGGGT GGAGCTAGGT AGGATTCAGC
4100
TTCATGGGGG GGGAGGCCGC CGCAAGGACG
GTGGAGCCTA GAGGTAGGTA TCCATAGGGT TCCATTATCC TAGGTTCCTA CTGGGGTACC CCCCTACCCT
GGTTGAACCA GGAGGGGGAG GATCGGGCTC CGCCCCGATA
TGGAGGTATC CTAAGCTCCG
ACCTTAAGGT GCGCCACCCT
TCCTCCTTCC
CCCCTATATA
GTTTTAATGG TAGAATAACC
GTGGAATAGG GTATTGCAGC TGGGTATATA CCTATAGGTA
TATAGAACCT AGAGGAAGGG AACCCTATAG
CCTTACGGTT GCCTGAGCCC ATCCCCCACC
GGGGTGACGT GGCACCCCGC
GACCGATGCT CGCCACTTCC GAGCCACTAA
CCAGCACCCC
GTGGTGGCTG CCTTTGGGAT GCATCACTTT
4400 4500 4700
CTGGGTACAC AGGTGGGGCG GCAGCCTCTA ACTTTGGCTG TGGCCTCTAT TTCCTCCCTT TCCTAGCCAG
TCCTGCATGT CTACTTGCCT CCTGTGGTGG CAGAGCTTGG CCCTGGGCCC AACCCCCGCC T T C C T G T T T G CTGGCAACTT ACTGGCAGCC AAAATGTCAC
TTGGGAGCCT GTAGGGGCCA ACACCCTTGG
CAAGATTTGT CATGCACACC
4800
TTTGTTTGTG
4900
TGCACACCTG
5000
TTGGTATTGG GTTTCTATTC TTGAGTGTGA
5100
TAATTCCTCT TACCTGTTTA GGGTATTGTG CAATTCTTCA GCCTGCCTAT TTTCAATTTG CCTAAGGTGG
5200
CAATTTAAGA TGTGGTTAAT TAACCATTTT CCTGTCTGAC ACCACTGCAT
GGGCAACCGG GTTCCATGGC ACATTTAGAG ATAAACATAG
TTGCTCATGT GCAGAGGAGG GGGTGTTGGT GTGCAATATA GTTTCTGGAT TCCAAATTGA
ATGTCTTGTC
GTTGGGGGTG CTATTTTCAC TATGGAATTA AATTACTGAC
ATTAGACAGT GGACACCGGG CTATATGTGG GGATGTCTGT GGCTTGTCAT TTCCTCTTAG AAGGTAATCC CAAGCCCTGG
GGCCATGTGT
GAGCAGATTC TAATGGGCGC CCGCCTTCTT TCTCTCTTGT TTTATTAATA GAATCTCAGC CAGGACCTAT
TCAAAGTCTG GTCCTGGGTT CTGAGACCCC
AAGTTTGTAA AAAAATTCAT
TAGGAAGCAC
4300
4600
ACCTGAGACT
TAATCCATGA GCCCCGCCTT
4200
GGACGGAAGC TGGATTTTGG CCAGTCTTCA ATTTTGGGGA G T G G T T T T G T GTGAGCCGGA AGTTGGCAAT GGGGTGAGGG TGGCGCTGGT TAAGCTGACG GTCTCTCACC
AAGCCCCTGC
CCCCCCCCTC
CACGTCCCGG
ACCTCCCAAG
ATGACACACT
TGTAATCCCT CCCCCCCCTA
GTGCCTTACT GACTTGTCAC CTTTGCACAT TTGGTCAGCT
TGGGTCATGA CCTGGCCTGT GCCTTGTCCC ATGGACAATG TCCCTCCAGC
GCCCCCGTTG CTCGCCTTGC CTGCCTCACC
TATAGGTTAT TAACCTAGTG
3900
CCCATCTTAA
CTTCCCTTTA AATTGTGATG
G T T A T T T A T A GAATGATTAT CTAGGTTTGA TAGTCTGAAG GCTGGGCAGA GAATGTTTGT A A T T T T T A T T CACCTTCTTT ACCCCCCACG
AGTATCCAGT TCTAGAAGAT CTCCTGATAT CCCGGGCTGC CATTATTCCC
5500 5600
AGGATGTTAG
5700
C T T T T T T G G T GGGGCTGGTT GTCAGGAAGA GGTTCCAGTG TTGTCCTTTA T T T T T A G A T G T T A G C T T T G T GTTAGGTTAG TATGGGCTGG GTATTCACTA
5800
GTGAAGGCAA CTAACACAGT
TTGAGTGTTA TAGCTTCCTC TTAACTTAAG CAAGAGCTCC
5300 5400
TAGACGTGCT AGTTGTGCCC ACTGGTGTTT ATCCGCTCCC
ACAGAACACA
GGGGGCTGGA TTTGGCAGCA
5900
GCTTCCCCTC CCCTTAACAG
GGGGTCTCGC GGGGTGCCAA
6000
CTTCCCCTTG TTAACTTATA GCATGATAGG TAGGTCACCT AACGTGGAAG CCTGGTGGGT GATCCTTCCT CGGTAGGGAG
6100
GCACTTGTGC T T T T G T T G A T TTTTACCCGT GTATCAGAGT GGGGGATGCT AGCCAATTTA TTGTCGCCTG CCTTCCCCCG
CGCTTAGGGC TGTTGAGCTC AACAGCCCCA
CCTGGGTAAA ATGTATGTTC TAAAGAGTTA CCCAATTATA
ATGGAAGTGG TATTGTTGCC GTTGAAAGAC GGGTGTCCTG GCTCAAGTTC GCACTTCCTA CCAGTGTACT CGATAATGTC GACTGCTGCG AAAGGTTTGG ACCGTCTTCC CTTTAGGAGG GACCCTTAAG
CCAGACAATG
AAATGTCACC
ACAAAACTGT
TGTAGGGTAA CGAAGACCTG
TACAGTGTTA AAGCCTTGTA TCGGAAGTTT GGGCTTCGTC
AGTAGGTGTT GGGGGTCCCA AATCACGAGG
6200 6300
TTAGGCAGGT GCACTTGGCT
6400
TAGTGCCCCT T T T T T T T G C A AATTGGCCTT ATTATTAATT TCTTGTTAAC ACTAATTCTG TTCTATGACC
6500
C T G T G T T T T T CAGATGCCGT TGAACGTGTC ACTGAGCTGA ATTTGGACGC AGCTACTTGA CCTTTGCCCC
TTGGTGGGGC CTTGGTGGGC TGCATAAGGC AGTAGGTTTG AGGTGACCTA CTTGGACCAT
6600 6700
GTGGATCCAG TGTCCTGATC CTGGACCTTG ACTATGAAAC
AATTCTAAAA
GTGCACTCGG AAGTCTCATC
6800
ATCTCCGTTT GTGTGTTTAG TGTGGCCAGT ACGGCCACCC
CTGTGCCACG CCCTGGCATG CTGCTGACAT CTGGCCGCCA ATTTCAGCGG GCCCTTTTCC
6900 7000
CACTTTGTGT TACAGGTGGG CCAAACCTCC
CCCTTGTTCA CCCCATAGCA
AGAATATCAA
AAATGCATCA
CGTGCCTCCA
TAGTCCAGTG TCCAGGGACA
GCGCTGATAA GTGCTGCGTC
AGAAGGGTAG GTTACATGGG TATTTTCCCA TCAGCACCTG ACTGGCCGGT GCAATTAGAG GAGAGGGCAA CAACGCAAGG
SEQUENCE
OF
RAJI EBV
DNA
DELETION
341
C T G T T G T T T T ATTTGGGTTA CAAGAGCTGC
GGCGGTCGAT GGGTTCACTG ATTACGGTTT CCTAGATTGT ACAGATGAAC
GGGTCGTAGA CAGTGTGCTT ACCAGACTTC
CATGGAAGAT GTGAATTTGC TGCTAGCTAT ATGGGTGGTG CTATGGGCTC CCTAGGGACT CATGTAGTGG
GGCTTTGTGA TAGCTAATGA ATGTGGCAGC T G T T G T T T G T ACTGGACCCT
TAGAACTGTC ACAATCTATG
GAATTGGAAA CAGTAACTTG GATTCTGTAA CACTTCATGG GTCCCGTAGT
7100 7200 7300
GACAACTATG CTGAATATCT TGAATATGGG AGGAGGGGGG CTTTGGGTTC CATTGTGTGC CCTTTCCTGG CCAACGTGAG
GGTCCTAGTG TTATAGGGCG
7400
TGGCAGTTTT CTTGAGGGCT AATAACCCGG
CGGTTACAAA GTCATGGTCT
7500
AGTAGTTGTG ACCCTGCAAA
GTGAGGCGGT TGTCACAGGT GCTAGACCCT GGAGTTGAAC CAGTACCACT
GCTACGTGGG GATGAGCAGC CAGGGACTTT GGTTGGCAAG CAGACAGGCG
GGCCACCTCT T T G G T T C T G T ACATATTTTG TTATTGTACA TAACCATGGA
GTTGGCTGTG GTGCACTCCA
GTTTAGTCTA TGCCAATGTT TACCTGCCTT GGGTTACTAT TCCAAACGAC
CACACCTTTG AGGACACCTG
ACCTGGCTTC TGTTGGGTCA GACAGTTTGG TGCGCTAGTT GTGTGCTTAG CAGCAACGCA GCATCCTAAG GAGGGATTCT GGAGTGCCTT TCGCGTGAAG CATGCCCTGA CGCCAATTTG TACATCCTAG
CACCAGGCTG
GCGCATTGGA ACCCCAGAGG AGTGTCCCGG
7600
TCTGGTAAGG GGGCTGGTGC GGACGCCTGT
7700
GAGCCCTGAT CATTCTCGGC TTTTACTGCC
7800
ACTGCCTTAG CAGTGTGGCC C T T T A T T G T G
7900
TGTGCCGCTG GATGAGGGAG
8000
CTCTGGCCAT AGAGTTAGCC CACCCTTGTG TCTCCCTTTG GCCTTTGCGG TGCCAATTTC CGGTGGTTTC CCTTTTCCGC
GACGTACTCG AGTTAGGACT TAATCGCTCC
8100
CCGTTTATCC AATAGCATGT AAGAGAGGTT GCCTAGATTT GGCAACTTTG AGGGAACGTT CCGTGTAGCT GGTGACCTAA CACCCGCCCA
TCACCACCGG
ACAGATTCTG AACTTGTCCT G T G G T G T T T G G T G T G G T T T T GGGGTACGCA GGAGTACGTT GGAATGCTTT GGAGCCGAGA GGGATGGGCC CGCTTGTGCG C T T A T G T G T T ACACGGTGCC
CCGGTGCGGC TGCCCCGTGA CCCGTGGGCC TTACCTTCCT GGCCATCGGG GGACCCTGGT GCTAGGGTCC
8400
CTTGTGTTGC TTTCTGCCAT AGGGGGGAAA GCATCGCCTT CAGAATTGGC TGCTCCGTTG GAACATTTGA GGCCTACTGT ATCCGTGTCC TGACAACATT
8500
CCCCGCAAAC
AATAACCGGC
8200 8300
ATGACATGGG TTAATTTAAA C A T G T T T T G T TTGCTTGGGA ATGCTCTTAG GGCCTGGAAG CTTGTCATTG GATTCATCGT TTCCTGAACT
ACAGGCGTAG GGCCTATTGT AGCAGGCATG TCTTCATTCC TGCGTACCGA ATGGCATGAA GGCACAGCCT
8600
GTTACCATTG GCACCTTTTT TCCATGTAAA
8700
CCTCCGTGAT CCTGGGTCCT TTGGAGACTC AAGTGTGAAT T T G T T T T G G T GTTCGGCGCC AGGGTATCTC GACGTTGGAA TGTCAACTCA ACTTGGCCAC
8800
CTCGATAACC
GGCTCGTGGC TCGTACAGAC
GATTGTTTGG CTCTGTAACT TGCCAGGGAC GGCTGACGAT GTGTTTAGTC TGCCACTTGC ATCCGGCGCT
TTGGTTACTC GGGAGACTAA TGGGGGGTGT GGTATGGCAC AGGCTGGGGG TGAGTCTGGG GATGTCCCTG GGCGTTGCTG CAGCCCATTC GATGAGATGT TCAGGGGTGG CCGGTACCCT TCTTCGGAAA CGCCAACCCA
ACGCTGCCGA
8900 9000
TTTACATAAT ATAAATTGTA AATGCTGCAG TAGTAGGGAT CTGGACGCGC GACCTGCTAC
9100
ATACTCCCGC
9200
CCCAGGCCTC
CCTGGTGAGC CCTTGCCGCT
TTTCCTTCCC CCGTTAATGA AAAGAATGAC
AGTGAGGTTG TGACAGAAGG
9300
ACAGCTTTAT TCAGTTTACA GAGTGCCCTC GGAGGCTACG ATATTCCCGT TAAATGTCTT GTTGATTCTC TCAAAGGTGG GGAGGGAGGA GCTCTCCACA
9400
CCCCGCATTC
GGAGCGTCGC CTCTGGCCCC
GCCCTCTGGG
CTGCTTTCGG CGCCCCTGCG
ACAATGTTCC CTGGCAGCGT GAGCGCGCAG
GATCCCGATG ACAGCAGGCC
CATGCGACTG CTCGCCCCCT
CCCTGCCGTT GGATGTATCT TCTCATGATG GTGCTGATAG AGGGGTCTCC GGCGTAGATG AAAAAGGCCT
9500
GGGCCATGCT CTGGCCGGTC ACGATCGTTA T G G G G T T G T T GGAAATGTTC CGGACCGTCA GCTTGAGGGT CTGGCCCGGC TTCCACTCCT
GTGGGTAGAC CGTGATGAAA
9700
CGCCAGTGGC TCTTAGGGTA GGGGTTGTAA CGGAAGGCAA
9800
GTAGAAGACC GGGTTGGAGG AGTGGGACAC
GACAACGGCC
TTAGGAGGAA TAATCACAGG
GGAGCCAAAG
TAGCGGATGT CTGTGGATTC CCCGTCCCAG
TAATCACATC ATCCAATAGG
GTCATGCCCA
GCGGCGTATG GCGTTTACCG GACACACCTT
GTAATCTTGG AGCTCAGGGG GGCCTCGTAG G T G T T G T T G T ATTCCAGCTC
CCTTGACGTT CAGCGGGCCC TCTCGTTTCA GGTCCGGCGT GTCCACGGAG ACTCGGACGT AGCCCTTACC CCCCGGGAAT
GTGTGAATAC GGGCGTATGA CTTTAGAAAT GGGGGCGTGT GCTGCGCCAG CAGGTAAGGC
AGGCACTCGT CCTGGCTGGT GACGGGAGAG CCACTGAGGA AGATCTGGGG CTCGCTGGTG T T T A G C T T G T CCCCGCTCTG
9600
9900 10000
GGTGCAGGAG CGTGTCAGCT
10100
TGGGCACAAT GTACCAGGTA AACTTCCCTT TCTCTATGAA GGGCTTGAGG AATACCTCGT TGTCTTCCGC TCCAAAGAAC
10200
GCGGCTGCCG TAGGTGGTCG TGTTGAAGGA GAAAGAAGGT AACTTGAAGC TGAGTATCTG GCCCACCGAG GGGCAGGGAG
10400
GCAGCTCTTG GCACTGCGCG TCCAGCTGCA ATACCTGCTT GTTGGTGACG CGGACGTATG AGGGGAAGAT CTCGTACTTC CACACGCCTC TCATGAACGA CGTGTCTGGT TTTTCAGTGG GCCGCAGGCG GCGGAGGCTG TTCCTGAACG ACGAGCGCCG GGACGCTAGT GCTGCATGGG CTCCTCCGGG GTAAGCTTCG
10600
GCCATGGCCG GAGCTCGTCG ACGGGCAAGG TGAGAGTCGG GGGGCGGGCG ACGGTGCGGC CCCAATACAA
10700
GAATGTCGCT CTGCCCGGGC AGAATCTGCA CAGGCTGCGC GGATTCAGGA CGCTTAGCAC AAAAACGCGA
CCGTAAAGTA
GGTAGAGGTA GGGGTTCTTG ACCAATCTGA GATGTCCTGG TCAGAGTGCA TAACGAAGAA
CCCGGCGTCT AGGTTGTCAC TTCGCTCGGC CGGCCAGAAG
AGCGCAAGTC
ACTGTCATTT TTATGGGCGA GTGGGCGGTC CACACGCGCG
GGCGCAGCAC CCATTGGTCG CACGCCGCCT
TGCCGCCCTG GAGATGGCCC TGTGCCATCA CATGTATTTC CTCCTTGGGT GGAACAACGG TTTCAGGGTC ACGAGACCGG
GGTCCATGCT CAGAGAACAG
CCCTCCCGGG
CTCTCCGCTC
CCTGGAATGT AAACCCCCAG AGCAGAACGT
GGCCGCGCCA GAGCAGAGCC
CGGGACTGAC TCAGGGCCAC
ACCTCTCTCC
GGCACCCCAG
CCTGGTCAGC
ACACCTTCAC
TCCTTGTTAG GTTGATAGAA TGTCGGTACC ACGCCACGGG GGGCGGGCCC GCATAGGGAA AAGCCAGGGA
GGCTCAGGCG GCCCCAGACA
AATGAGACAC
TACCTGGCCA
10800 10900 11000 11100
ACGTCCCGTG TCCCAGCCGA
11200 11300 11400
GAGCGATGTG GGCGAGGATG
11500
TCACGGGGGG CGCATAGGAG
11600
CGTGAGGATT TAGGCAACGC AGGCTTGTCT TTATAGTTAC AAACATGGGA
11700
GGCCGCAGCC TGCCCCTCGG CGGCCCGTGC CCCAGCTCCG AGACACAGGT
10500
GCCCTCCTCG
CCCGCGGACG CCGCCTCGCC AGCCCCCGGG CCTTCATGGG CCCGCTTTCT
GCGTGCACCT GGAAGATGCA GCTGGGGTAG ATCTTTACAT CTTTACAGGG CGCAGCGGCC GCCAGACACT AGAGGGAGCC CACCAGCCCG
CGCAGACCCA
CGCAGAGGAA GCATGACCTT GGGGTGGGAC GGGGCAGGCG TGATCCTGGG CGCAATCTTT GCCGATCCCC
CGCAATTTGC CCCCCTGGGC
GGGTATATCT AGGATAGCCG CACCTACACA
AGAGAGGGCG
CGTCGTGCTC CGGGTGGAGG GGGAATAGCG TCCAGGCATC
CGGTGGGCGG CCCGGGCTCC
GCCCTGTCCA CCTGTATGTC CAGGTGCACG GACCCGGAGG CTGCGTCTCG TGACATGGCC AGGCCTGGTG CCAGCCGACC ATCCCCTCGG
GTTAGCTGGT AGAATATCCG
CAAGTCTGGT GCTGGGGCCG ATGTGCAGCG GTTTGTGCCC GCAGTTGTAG
10300
GAAGGGCAGA GTTCACGGCG GGCACCTCCC
CGGCC
11800 11835
FIG. 1. The nucleotide sequence of Raji DNA covenng the B95-8 deletion. The sequence is numbered from the first base of the 4-base redundancy (GGCC) found at the begrnning and end of the sequence. The Pstl repeats are found from bases 1046 to 3562. Horizontal arrows delrneate the boundanes of the complete repeats shown. The left boundary of the repeats containing the Incomplete repeat unit is indicated with a vertical arrow. The sequence is shown without 20 copies of the repeat in the middle of the repeat region. DR is found between bases 3554 and 4609. Reading frame LFl begins 140 bp beyond the right end of the sequence shown (B958 position 152.152) and ends at base 10,566. Reading frame LF2 begrns at base 10,605 and stops at base 931 1. Reading frame LF3, which spans the Psfl repeats, starts at base 3625 and ends at 851. This sequence has been deposrted with the EMBLsequence database.
Northern
blot analysis
RNA samples (2 Fg per track) were denatured using glyoxal and electrophoresed and blotted as described previously (Biggin eta/., 1984). Probes were nick-translated plasmids (Rigby et al., 1977) containing restriction fragments of Raji DNA covering the entire deleted re-
gion of 895-8 or small gel-purified fragments just the predicted open reading frames. Determination
covering
of repeat copy number
A subclone of the Raji EcoRI-C fragment spanning the PSI repeats was digested with Ban1 and Ddel. This
342
PARKER DCBAABCD
ET AL
SIZE (Kb)
-2322 -2027
-
1353
-
107%
-
872
-
603 584
= -
310 281 234 194
FIG. 2. Determination of the repeat copy number of the PSI repeats in Raji DNA. The ladder produced by the partial digest is shown in lane A. Lane B shows the complete digest of the same DNA. Lanes C and D contain size markers. The sizes of the marker fragments are shown on the side of the figure.
digestion removes ail but 30 bp of the DNA flanking the repeats. The remainjng 2.5-kb fragment containing the Psti repeats was purified by electroelution from an agarose gel slice. The DNA was subjected to a partial digestion using Pstl and electrophoresed on a 270 agarose gel. HindIll-digested X DNA and +X174 DNA digested with Haelll were included as size markers. The gel was stained with ethidium bromide and photographed. The repeat copy number was determined by counting the bands in the resulting ladder relative to the size markers. RESULTS DNA sequence The DNA sequence of the Ml3 library of Raji DNA covering the deletion in 695-8 was compiled to give a completely contiguous sequence and is shown in Fig.
I‘
P
1. The sequence in Fig. 1 has a G-C composition of 60.7% which is very close to that of 60% determined for the entire 895-8 genome (Baer et al., 1984). Although Fig. 1 shows only the sequence deleted from B95-8, several kilobases of Raji sequence flanking this were also determined to a slightly lower degree of accuracy (data not shown). The flanking sequences were very similar to the equivalent B95-8 sequence (less than 0.5% variation, 30 of 6670 bases), as was found earlier when comparing most of the adjacent EcoRI Dhet region from both Raji and B95-8 (Hatfull et a/., 1988). There is a redundancy of 4 bases in the Raji sequence around the point of deletion in 895-8. One copy of the redundancy (GGCC) is present in B95-8 at the point of the deletion ~nucleotides 152,009-l 52,012). However, the sequence GGCC is found at both ends of the Raji DNA spanning the deletion. Therefore, the precise position of the missing sequence in B95-8 and the deletion of Raji sequences has been fixed arbitrarily as the first base in the redundancy. The 4 base redundancy (GGCC) was used to start and finish the sequence in Fig. 1. The Raji DSR includes a region of 102-bp repeats, each repeat containing a Fstl site. It was not possible to sequence directly through all of these repeats, so the copy number was determined by partial restriction digestion of the cloned DNA. This Raji DNA clone proved to contain 24.7 copies of the 102-bp Pstl repeat (Fig. 2) and this number of repeat copies was inserted into the sequence data given in Fig. 1 to provide a complete copy of the DNA deleted from 895-8. The repeat region is found between bases 1046 and 3562. in order to identify the maximum number of whole repeat units, the right bounda~ of the repeats was fixed at the sequence CCGGCGGGG which occurs at the left end of the DR sequence and in all copies of the repeat. This fixed boundary permitted identification of 24 complete repeat units moving right to left. The most leftward repeat unit is incomplete when using this method of determining the repeat units. It is missing a G at nucleo-
i
GGTGGCCGGCTGCAGCCGGGTCCGGGGTTCCGGCCCTGGAGCTC~GGGGCGG~CGGGTGGCCCACCGGGTCCGCTGGGTCCGCTGCCCCGCTCCGGCGGGG 10 20 30 40 50 60 70 80 90 CCACCGGCCGACGTCGGCCCAGGCCCCAAGGCCGGCCGGGACCTCGAGCCCCCCGCCGGCCCACCGGGTGGCCCAGGCGACCCAGGCGACGGGGCGAGGCCGCCCC T I
A
P
Q
LRTRPEP S
GPARPAAPHGVPDAPDAAGSRRP W
FIG. 3. Variations found in the sequence of the fstl repeat. Individual point amino acid sequence, with the resulting changes, is shown below the parent than one base, the GC to AG at positions 97-98 in the diagram.
H variations are indicated DNA repeat sequence.
100
P R above the base changed. Only one of the changes
The putative involved more
SEQUENCE
oriLy1 I
OF
R&I
(D R ) I
ti
LF3
LF2
LFI
FIG. 4. A diagrammatic representation of the sequence found in Fig. 1. The predicted open reading frames are shown as boxed arrows The vertical arrows indicate potential polyadenylation sites. The promoter for the LF3 reading frame IS indicated by the angled arrow. The probes used for Northern blot analysis are shown as horizontal arrows labeled A through F. Probes A, B, and C were gelpurified restriction fragments. The other probes were plasmids containing the regions indrcated. The DR element containing the second orilyt is also indicated. The mRNA correspondrng to LF3 is Indicated beneath the boxed arrow.
tide 1058 in Fig. 1 and the repeat stops at nucleotide 1046, resulting in 0.7 of a repeat unit. It was not possible to establish the sequence precisely through all of the repeats and some minor heterogeneity of the Pstl repeats was evident from the sequence database. It is clear that the outer two complete copies of the repeat on each end of the cluster are as indicated in Fig. 1 but thereafter it was not possible to assign the point variations to particular repeat copies. A summary of the point variations detected is shown in Fig. 3, where for simplicity all the variations detected are shown against a single DNA repeat copy. Some of these changes will cause minor alterations in the protein sequence as shown in Fig. 3. So far as we could tell there was no evidence for insertions or deletions in the repeat copies but this cannot be absolutely excluded. A previous report (Laux et a/., 1985) indicated the presence of an Apal restriction site within two of the repeats in M-ABA DNA but an Apal digest of the clones used in this study yielded no fragments indicative of Apal sites (data not shown). The sequence was translated in all possible reading frames on the computer and the long open reading frames are shown in Fig. 4. Although it represents a departure from the systematic nomenclature of 895-8
EBV
DNA
343
DELETION
reading frames, to avoid confusion we have used the same names for these reading frames that other workers (Hitt et a/., 1989) who have interpreted this sequence data have used. The putative amino acid sequences of the three reading frames are shown in Fig. 5. A FASTA search (Pearson and Lipman, 1988) of the NBRF-PIR library (release 24) revealed no significant homologies between these sequences and those in the database. FASTA searches of just the reading frames found in CMV and HSV also failed to detect any homologies. The sequence contains a duplicated region known as DSR which is similar to a region in BarnHI-H containing oriLyt and a series of /Votl repeats (Dambaugh and Kieff, 1982). This left copy of the duplication is known as DS. These duplicated regions contain sequences of 1055 bp which are almost identical and have been called DL and DR (Laux et al., 1985). DL and DR seem to be equivalent to the two copies of oriLyt (HammerSchmidt and Sugden, 1988). Dc is found between bases 52642 and 53697 in B95-8 and the Raji DR between bases 3554 and 4609 in Fig. 1. Northern
blotting
analysis
RNA was obtained from four cell lines containing EBV and analyzed for the presence of mRNA transcribed from the sequence shown in Fig. 1. A 2.8-kb message which was recognized by a probe covering the Pstl repeats was found in P3HR-1 cells induced with TPA and in P3HR-I-superinfected Raji cells (Fig. 6). The same probe recognized a 3.7-kb message in AG876-CR cells. The larger size of this RNA in AG876 most likely results from a higher copy number of fstl repeats in AG876 virus (Heller et al., 198 1; Dambaugh and Kieff, 1982). The RNA from the Pstl repeats has been mapped and reported previously (Laux et a/., 1985) and the presence of the RNA in PAA-treated induced cells confirms that it is an early mRNA. No other RNA was detected in any of the cell lines tested with probes covering the entire deleted region (probes C-F in Fig. 4) or shorter probes that correspond just to the open reading frames (probes A and B in Fig. 4). Total RNA (not poly(A)-selected) was also tested with each probe but no other RNA was detected (data not shown). DISCUSSION This report of the DNA sequence of a region of EBV deleted from the prototype B95-8 strain completes the sequence of all parts of the virus genome found in wildtype EBV. Both 895-8 and Raji are subtype A (Zimber et al., 1986). This region has been mapped (Dambaugh and Kieff, 1982) for the AG876 strain (subtype B) and
PARKER
344 READING
FRAME LFl
MOLECULAR
10 MALPTDTHAU
20 RVEIGTRGLM
110 SLRCLNPHDL
120
210
220 HASSAADQAG
310 410
READING
420 ALFUPAERSD
20 AALASRRSSF
110
310 PNFITELEYN
320
410 IVVESSSLPT
READING
420 FERINKTFNG
10 110 PVGHPAAPRA 210 ADPVGHPAAP 310 DPADPVGHPA 410
20
30 130 PATPRRSGAA
220
230
RAPGPEPRTR
LQPATPRRSG
320 APRAPGPEPR
370 SLUGFTFQEA
280 GLAMSRDAAS 380 ACDQUVLRPR
190 RHSINLTRSE
200 GVGIGKDCAQ
290 GSVHLDIQVD
300 RAEEGUVCOV
390 VUTAHSPIKM
400 TVYNCGHKPL
470 LTPEEPMQH
60 160
250 RVSVDTPDLK
260 REGPLNVKVG
340
350 UKPGQTLKLT
360 VRNISNNPIT
70 LDAQCQELPP
80 CPSVGQILSF
170 ILPGQSDIQL
180 TRSCTQSGDK
270 MTLLDDVIIA
280 FRYNPYPKSH
370 IVTGQSMAQA
90 KLPSFSFNTT
380 FFIYAGDPSI
190 LNTSEPQIFL
100 TYGSRYFTVA 200 SGSPVTSQDE
290 URUDGESTDI
300 RYFGSPVIIP
390 STIMRRYIQR
400 QGCALTLPGN
TL
HPRRSGAADP
120 PGPEPRTRLQ
460
LVKNPYLYLQ
NPVFYVYPQE
MOLECULAR
APGSGLGAHP
360
180
100 HAPMFVTIKT
430 NIVASEGTLG
FRAME LF3
MKRVARGPCL
270
90 KIYPSCIFPV
YAGPPPVAUY
WDTGRGRLAP
VRVTNKQVLQ
150
240
330 ITAVVVSHSS
260
50
KFTUYIVPIR
NAIRRGKGYV
170 SPTSLSLAFP
SRALLURGLG
80 LAAAAPCKDV
= 50498
140
230
160 LRVUGRLSPS
PVDELRPWPK
VWKYEIFPSY
NPRSLFIEKG
HTFPGKVCPV
NTYEAPLSSK
WEIGHT 40
130
220
450
70 VPAVNSALQC
AQGHLQGGTP
TVARPPTLTL
EKPDTSFMRG
SDQDIVLSVL
PPFLKSYARI
350 VPPKEEIHVM
440
30
120
210
340
LTSGELYUGR
RNSLRRLRPT
VFLKPFFVMH
CLPYLLAQHT
250 AGRGOVALSQ
MOLECULAR
10
FLFFGAEDNE
240 EGPGAGEAAS
430 NLDAGRIFYQ
FRAME LF2
MAEAYPGGAH
150 GQAAAAPGGK
FPLHPEHDAV
60 RYGLVGSLUE
140
330 LVTLKDAWTL
50 VSAYEALAVA
RDGAGARAAE 230
320 EGCSLSMDPG
HIGPSTRLGL
130
VPERGRKRAH
= 48693
40 EGPYHKLRLP
ILDIPLLCAP
DHACPVPPQG LLEPGPPTAR
30 FSNCVPLHLP
CLCLICVGAA
WEIGHT
ET AL
WEIGHT
= 94306
40 ADPVGHPAAP
50 RAPGPEPRTR
140 DPADPVGHPA
150 APRAPGPEPR
240 AADPADPVGH
60 LQPATPRRSG 160 TRLQPATPRR
250
260
PAAPRAPGPE
PRTRLQPATP
70 AAOPADPVGH 170 SGAADPADPV 270 RRSGAADPAD
330
340
350
360
370
TRLQPATPRR
SGAAOPADPV
GHPAAPRAPG
PEPRTRLQPA
TPRRSGAADP
420
430
440
180 GHPAAPRAPG 280 PVGHPAAPRA 380 ADPVGHPAAP
190 PEPRTRLQPA
100 RRSGAADPAD 200 TPRRSGAADP
290
300
PGPEPRTRLQ
PATPRRSGAA
390
400
RAPGPEPRTR
LQPATPRRSG
460
470
PAAPRAPGPE
PRTRLQPATP
RRSGAADPAD
PVGHPAAPRA
PGPEPRTRLQ
PATPRRSGAA
DPADPVGHPA
APRAPGPEPR
TRLQPATPRR
510 SGAADPADPV
520 GHPAAPRAPG
530 PEPRTRLQPA
540 TPRRSGAADP
550 ADPVGHPAAP
560 RAPGPEPRTR
570 LQPATPRRSG
580 AADPADPVGH
590 PAAPRAPGPE
600 PRTRLQPATP
610
620
630
640
PVGHPAAPRA
PGPEPRTRLQ
PATPRRSGAA
710 TPRRSGAADP 810 PATPRRSGAA 910 VVIGSEILET
720 ADPVGHPAAP
730
740
RAPGPEPRTR
LQPATPRRSG
820 DPADPVGHPA
830 APRAPGPEPR
650 DPADPVGHPA 750 AADPADPVGH
840 TRLQPATPRR
850 SGAADPADPV
660 APRAPGPEPR 760 PAAPRAPGPE 8-50 GHPAAPELQG
480
90 PRTRLQPATP
AADPADPVGH
RRSGAADPAD
450
80 PAAPRAPGPE
490
670
680
690
TRLQPATPRR
SGAADPADPV
GHPAAPRAPG
770 PRTRLQPATP 870 UVLRPKGTGG
780 RRSGAADPAD 880 DFRGIGVTIN
790 PVGHPAAPRA 890 ULNLHHVYVV
500
700 PEPRTRLQPA 800 PGPEPRTRLQ 900 FHAAYRLEGQ
920 RVPLKQGALL
PLLF
FIG. 5. The putative amino acid sequences of the reading frames found in the Raji sequence of the B95-8 deletion. The reading frames were translated on the computer using the first methionine as the start of the sequence. The predicted molecular weights are also indicated. The first repeat peptide in LF3 corresponding to the Pstl nucleotide repeat begins with PRRS at residue 22 and ends with QPAT at residue 55. The putative LF3 protein contains all of the predicted repeat units.
part of the sequence reported here has been determined previously (Laux et al., 1985) in the M-ABA strain (subtype A). The reported M-ABA sequence which spans the Pstl repeats has a single nucleotide deletion close to the beginning of the LF3 reading frame relative
to our sequence at base 3559. This change makes a substantial difference in the resultant hypothetical protein sequence (because of the unusual sequence composition of the repeats, all three potential reading frames are open). A poly(A)+ RNA has been demon-
SEQUENCE P3HRl U TPA
OF RAJI EBV
AG076-CR PAA
U TPA
M
4-
-
e
FIG. 6. Northern blotting analysis of RNA hybridlzlng to probes spanning the B95-8 deletion. Probe F In Fig. 4 was hybridized to RNA from P3HRl cells uninduced, induced with TPA, or induced withTPA In the presence of PAA. RNA from uninduced and induced AG876CR cells is also shown. Radioactive X HindIll size markers were also Included.
strated to traverse the LF3 frame (Laux et al., 1985) (Fig. 6) and a protein product has been identified for the equivalent Not1 repeats in D& (Nuebling and MuellerLantzsch, 1989) so it is likely that a protein will be expressed from the LF3 frame despite its unusual structure. Other minor differences between the M-ABA and Raji sequences occur in the regions containing no major open reading frames. The remainder of the region contains only two major reading frames (Fig. 4). Northern blotting analysis revealed no corresponding RNAfor these reading frames in any of the B-lymphocyte cell lines tested. The reading frames may be expressed at extremely low levels in these cells, or may be expressed only in epithelial cells. Novel rightward-spliced transcripts have been described in the Cl 5 epithelial tumour line (Hitt et a/., 1989) and it is possible that some exons of these RNAs come from the B95-8 deletion region. The region from bases 4610 to 9310 in Fig. 1 contains no long open reading frames and no easily predicted pattern of splice sites. No RNA was detected with probes from this region. Most of the rest of the EBV genome is very tightly packed with viral genetic functions (Farrell, 1989) so it would be surprising if this unique sequence had no function. If there is intricate splicing of small exons or if there are nontranscribed genetic elements within this region, it is difficult to identify them from the sequence alone. The DSR and D& regions show an extraordinary degree of sequence conservation (Dambaugh and Kieff,
DNA
DELETION
345
1982; Laux et al., 1985) and it seems that one must have evolved from the other. Both can function as origins of viral DNA replication in the replicative cycle (Hammerschmidt and Sugden, 1988) but clearly both origins are not required for replication since B95-8 lacks one and in cell culture replicates as well as other strains. The requirement for two Iytic origins might be manifested only when replication is occurring very efficiently. The significance of the transcription unit to the left of each oriLyt (through the Notl and Pstl repeats) is still mysterious. There is no gap between the DR and the fstl repeats, but there is a gap of 525 bp between the D, and the Notl repeats. Although the LF3 and BHLFl frames, the Pstl and Not1 repeats, and the relevant viral RNAs appear superficially equivalent, the sequences of the LF3 and BHLFl proteins are only moderately related; both have a very high proline and arginine content within the repeats. The three open reading frames in the deletion from B95-8 are evidently not vital for either transformation or replication of the virus in B-cells, as B95-8 is able to do both rather efficiently. Although there are no open reading frames clearly homologous to LFl and LF2 in the remainder of the EBV genome, there could be proteins encoded by the virus which can substitute their function or they may be important only in viva. Although their sequences are different, the proteins encoded by LF3 and BHLFl seem likely to have similar functions because of their related genomic organization. Perhaps B95-8 can function without LF3 because it retains BHLFl, and P3HR-1 (which has a deletion of BHLFl, Bornkamm eta/., 1982) can substitute LF3 forthe missing BHLFl. ACKNOWLEDGMENTS We thank Beverley Griffin, John Arrand, and Lars Rymo for supplying clones of Rajl EBV DNA and Cliona Rooney for AG876-CR cells.
REFERENCES BAER, R., BANKIER, A. T., BIGGIN, M. D., DEININGER, P. L., FARRELL, P. J., GIBSON, T. J., HATFULL, G., eta/. (1984). DNAsequence and expression of the 695-8 Epstein-Barr virus genome. Nature (London) 310,207-211. BANKIER, A. T., and BARRELL, B. G. (1983). “Techniques In the Life Sciences.” Elsevier, Ireland. BIGGIN, M., BODESCOT. M., PERRICAUDET, M., and FARRELL, P. (1987). Epstein-Barr virus gene expression in P3HRl -superinfected Raj cells./. Viral. 61, 3120-3132. BIGGIN, M., FARRELL, P. J., and BARRELL, B. G. (1984). Transcriptton and DNA sequence of the BamHl L fragment of 895-8 EpsteinBarr Virus. EMBOJ. 3, 1083-l 090. BORNKAMM, G. W., HUDEWITZ. J., FREESE. U.. and ZIMBER. U. (1982). Deletion of the nontransforming EBV strain P3HR-1 causes fusion of the large internal repeat to the DSL region. /. Viral. 43, 952-68. CHEN. E. Y., and SEEBURG. P. H. (1985). Supercoil sequencing: A fast and simple method for sequencing plasmid DNA. DNA 4, 165. 170.
346
PARKER
CHOMCZYNSKI. P., and SACCHI, N. (1987). Single-step method of RNA isolation by acid guanidinium thiocyanate-phenol chloroform extraction. Anal. Biochem. 162, 156-l 59. DAMBAUGH, T. R., and KIEFF, E. (1982). Identification and nucleotide sequences of two similar tandem direct repeats in Epstein-Barr virus DNA. J. Viral. 44,823-833. FARRELL, P. 1. (1989). “Epstein-Barr Virus in Genetic Maps” (S. O’Brien, Ed.). Cold Spring Harbor Laboratory, Cold Spring Harbor, NY. HAMMERSCHMIDT, W., and SUGDEN, B. (1988). Identification and characterization of orilyt, a lytic origin of DNA replication of DNA replication of Epstein-Barr virus. Cell 55, 427-433. HATFULL, G., BANKIER, T., BARRELL, B. G., and FARRELL, P. J. (1988). Sequence analysis of Raji Epstein-Barr virus DNA. virology 164, 334-341. HELLER. M., DAMBAUGH, T.. and KIEFF, E. (1981). Epstein-Barr virus DNA. IX. Variation among viral DNAs from producer and nonproducer infected cells. J. Viral. 38, 632-648. HINUMA, Y., KONN. M., YAMAGUCHI, J., WUDARSKI, D., BLAKESLEE, J., and GRACE, J. (1967). lmmunofluorescence and herpes-type virus particles in the P3HR-1 Burkitt lymphoma cell line. J. Vifol. 1, 10451051. Hire. M. M., ALLDAY, M. J.. HARA, T., KARRAN, L., JONES, M. D.. BusSON, P., TURSZ, T., ERNBERG, I., and GRIFFIN, B. E. (1989). EBV gene expression in an NPC related tumour. EMBOJ. 8,2639-2651. KLEIN, G. (1989). “Advances in Viral Oncology,” p. 273. Raven Press, New York. LAUX, G., FREESE, U. K., and BORNKAMM, G. W. (1985). Structure and evolution of two related transcription units of Epstein-Barr virus carrying small tandem repeats. J. Viol. 56,987-995.
ET AL NUEBLING, C. M., and MUELLER-LANTZSCH, N. (1989). Identification and characterization of an Epstein-Barr virus early antigen that is encoded by the Not1 repeats. /. Viral. 63, 4609-46 15. PEARSON, W. R., and LIPMAN, D. J. (1988). Improved tools for biological sequence analysis. froc. A/at/. Acad. SC;. USA 85, 2444-2448. PIZZO. P. A., MAGRATH, I. T., CHAT~OPADHYAY, S. K., BIGGAR, R. J., and GERBER, P. (1978). A new tumor-derived transforming strain of Epstein-Barr virus. Nature (London) 272, 629-631. PULVERTAFT, R. J. V. (1965). A study of malignant tumours in Nigeria by short term tissue culture. 1. C/in. Parho/. 18,261-273. RAAB-TRAUB, N., DAMBAUGH, T., and KEIFF, E. (1980). DNA of EpsteinBarr virus. VIII. B95-8, the previous prototype is an unusual deletion derivative. Cell22, 257-267. RIGBY, P.. DIEI(AMNN, M., RHODES, C., and BERG, P. (1977). Labelling deoxyribonucleic acid to high specific activity in vitro by nick translation with DNA polymerase I. J. Mol. Biol. 113,237-251. SANGER, F., NICKLEN, S., and COULSON, A. R. (1977). DNAsequencing with chain-terminating inhibitors. froc. Nat/. Acad. SC;. USA 74, 5463-5467. STADEN. R. (1982). Automation of the computer handling of gel reading data produced by the shotgun method of DNA sequencing. NucleicAcids Res. 10, 4731-4751. STADEN, R. (1984). Graphic methods to determine the function of nucleic acid sequences. Nucleic Acids Res. 12, 521-538. ZIMBER, U., ALDINGER, H. K.. LENOIR, G. M., VUILLAUME, M., KNEBELDOEBERITZ, L. G., DESRANGES, C., WITTMANN, P., et a/. (1986). Geographical prevalence of two Epstein-Barr virus types. L/iro/ogy 154,56-66.