Sequence and transcription of Raji Epstein-Barr virus DNA spanning the B95-8 deletion region.

VIROLOGY

179,339-346

(1990)

Sequence

BRUCE D. PARKER,* *Ludwig

and Transcription of Raji Epstein-Barr Spanning the B95-8 Deletion Region

ALAN BANKIER,t

Institute

SANDRA

SATCHWELL,t

for Cancer Research, St Mary’s and tMRC Laboratory of Molecular Received

BART BARRELL,t

Hospital Medical School, Norfolk Biology, Hills Road, Cambridge

April 30, 1990; accepted

Virus DNA

AND

Place, London C/32 2QH

PAUL J. FARRELL*,’ W2 IPG;

/u/y 3, 1990

The DNA sequence of Raji DNA spanning the deletion found in 695-8 cells has been determined. Three open reading frames and a region of homology with the BarnHI-H fragment are found within the deletion. The deletion contains a region of 102-bp repeats which is transcribed into an mRNA. The Raji sequence reported here varies slightly from a smaller M-ABA sequence reported previously. This paper completes the sequence of all parts of the wild-type EpsteinBarr VirUS genome. 0 1990Academic Press, Inc.

INTRODUCTION

clones of EcoRI-C were also made by inserting restriction endonuclease fragments into pUCl9. Regions of ambiguity obtained during Ml 3 sequencing were confirmed by double-stranded DNA sequencing (Chen and Seeburg, 1985) of appropriate parts of the pUC clones using oligonucleotide primers flanking the ambiguous regions. Thus 99.5% of the sequence was determined on both strands of the Raji DNA with an average gel reading density of 6.97. Computer analysis and compilation of the DNA sequence were performed using the programs of Staden (1982, 1984).

Epstein-Barr virus (EBV) is a human herpesvirus which infects human B lymphocytes and particular epithelial cells; EBV can immortalize human B lymphocytes at high efficiency. The virus is linked with the development of Burkitt’s lymphoma and nasopharyngeal carcinoma and EBV is the principal cause of infectious mononucleosis (for a review see Klein, 1989). The genome of the most-studied strain of EBV (B958) has been completely sequenced (Baer et al., 1984). However, the B95-8 genome contains a deletion of approximately 12 kb relative to other EBV strains (RaabTraub et a/., 1980). The deletion removes one of the origins of Iytic replication (oriLyt) (Hammerschmidt and Sugden, 1988) and part or all of at least three potential coding regions. We have now determined the DNA sequence of the region of the genome of the Raji strain of EBV which spans this deletion. In this paper we report this sequence and the potential genetic organization of this portion of the viral genome. MATERIALS

AND

Isolation

Raji and P3HRI (Pulvertaft, 1965; Hinuma et a/., 1967) cell lines are derived from Burkitt’s lymphomas. AG876-CR is a lymphoblastoid cell line made by immortalizing B lymphocytes from an EBV-negative British donor with the AG876 strain (Pizzo et a/., 1978) of EBV. Cell lines AG876-CR, Raji, and P3HRI (which all contain EBV) were grown to a density of 5 X 1 O5 cells/ ml and induced with TPA at 30 rig/ml for 3 days. Some TPA-treated cells were also treated with PAA at 125 pg/ml. Control cells were grown under identical conditions without TPA or PAA. P3HRI-superinfected Raji cells were prepared as described previously (Biggin et a/., 1987). RNA was prepared either as described previously (Biggin et al., 1984) or by the acidic phenol/guanidine thiocyanate/chloroform extraction method (Chomczynski and Sacchi, 1987). With the exception of P3HRI-superinfected Raji RNA, all RNA samples were poly(A)-selected.

METHODS

DNA sequencing A library of M 13 clones was prepared from sonicated DNA (Bankier and Barrell, 1983) of the cloned EcoRI-C fragment of Raji EBV DNA. M 13 clones were picked at random and sequenced using the dideoxynucleotide chain termination method @anger et al., 1977). SubSequence EMBUGenBank ’ To whom

data

from this article have been deposited with Data Libraries under Accession No. M35547. requests for reprints should be addressed.

of RNA

the

339

0042-6822/90

$3.00

Copyright 0 1990 by Academc Press. Inc All rights of reproduction in any form reserved.

340

PARKER IO

20

30

40

ET AL.

50

60

70

80

90

GGCCGCTGTT CACCTAAAGT

GACGCAAGGT CTGTCAGCCG CCAGGGTCCG TTTACCAGGC TTTCAGGTGT GGAATTTAGA TAGAGTGGGT GTGTGCTCTT

100

GTTTAATTAC ACCAAGATCA

CCACCCTCTA

200

GGACGCAGGC ATACAAGGTT ATTACCCAGT

TCCATATCCC

ACAATTGATA

AACCTCCGCA

TGTCCAACCA

CAGGATGTGG CACCCTAAGA

GGTAGCATCA TTTACACTAA

AAGCAGTGAC

300

CTTGTTGGTA CTTTAAGGTT GGTCCAATCC ATAGGCTTTT TTTGTGAAAA CCCGGGGATC GGACTAGCCT TAGAGTAACT CAAGGCCAAG

CATTTCACAC

400

TGCAGCTTTA GATGGCAAGG AAACTTGGGT TTCAGGCATA GAAAGCCTGG CTCACTATAG

500

CTTAGAAAAC

TCTAAAGGCC

600

TGTGGCTAGA CGTATGGCCT ACCCAAGACG

TTGGGGGTCT CGGGTAGGCC

700

CAGTTCCCAA

TCTCCTGCCA

TGAGGACTTA

800

CTTCAGCGGG ACACGTGTTT

900

CTGCAAATGC ACCATGTAAC

CACAGATCTA

CCTTGTATGC CTGGTGTCCC CTTAGTGGGA CGCAGGCCTA

CCACGTTGAA

AACTGAAAGT

CAGCCCATGT TTGTTCCAGG GTGGGGGAAA GGCACGTGCC TTTCCTCACA ATACAAATGT

TTAGCTGCAA AAATTCTATT

TACTAACGTC TGCCCTCTGG AGACCTGCTA

ATGATTCTTC CAGGCATAGG TTACAACCAG

TCACTGCTAT CAAGCCTACT

CGCAGCACAT

GTGTTGGGAG AGCCTCTATA ACCCCCCGCC

TGGCAGTGTT TACTGTTCTG CTTTTACTCT TGGACCAGGC TGTCATTCTA TCAGAATAAC

AGGGGAAGCA AGGCCCCCTG

CTAGAATCTC GGAGCCAATA

ACTACCTGCC

CATACACGTG ATGTAAGTTT AGCCAGTTTA TTGTTACACC

1000

AATGCCCCGA

CTGTCCCTTT GGGTCTCAGG ACCCAGCCCT h

GGAGCTCGGG GGCGGCCGGG TGGCCCACCG GGTCCGCTGG GTCCGCTGCC

1100

CCGCTCCGGC GGGGGGTGGC CGGCTGCAGC CGGGTCCGGG GTTCCGGCCC TGGAGCTCGG GGGGCGGCCG GGTGGCCCAC CGGGTCCGCT GGGTCCGCTG

1200

AAGTCTCCCC

CCTCTAATCT

GTATGCTGCA TGAAAAACCA

>
< CCCCGCTCCG GCGGGGGGTG GCCGGCTGCA GCCGGGTCCG GGGTTCCGGC CCTGGAGCTC GGGGGGCGGC CGGGTGGCCC ACCGGGTCCG

TGCCCCGCTC

C-GCGGGG

------------------------CONTINUES

VlTH

20

MORE

-------------------------------------------.----------~-----~~-GG

REpEAT

CGCTGGGTCC GCTGCCCCGC

1300

G,-CCTGGf,GC

3400

“NITS------------------------------.

TGGCCGG,--G

TCGGGGGGCG GCCGGGTGGC CCACCGGGTC

CTGGGTCCGC

CAGCCGGGTC

CGGGGTTCCG

TCCGGCGGGG GGTGGCCGGC TGCAGCCGGG TCCGGGGTTC CGGCCCTGGA

3500

> GCTCGGGGGG CGGCCGGGTG GCCCACCGGG TCCGCTGGGT CCGCTGCCCC

GCTCCGGCGG GGATGGGGGT GCGCTCCCAG

GCCGGACCCT GGTGCCAGGC

3600

AGGGACCCCG CGCCACCCGC

CCGGGCCGGC TGGGAGGTGT GCACCCCCCG

AGCGTCTGGA CGACGCTGGC

3700

GAGCCGGGCC AGCTCGCCTT CTTTTATCCT CTTTTTGGGG TCTCTGTGCA ATACCTTAAG GTTTGCTCAG GAGTGGGGGG CTTCTCATTG GTTAATTCAG GTGTGTGATT TTAGCCCGTT GGGTTACATT AAGGTGTGTA ACCAGGTGGG TGGTACCTGG AGGTCATTCT ATTGGGATAA CGAGAGGAGG AGGGGCTAGA

3800

GGCCCGCGAG ATTTGGGGTA GGCGGAGCCT CAGGAGGGTC CCCTCCATAG

TACCTAGTGG

4000

CCAGGTGGGT GGAGCTAGGT AGGATTCAGC

4100

TTCATGGGGG GGGAGGCCGC CGCAAGGACG

GTGGAGCCTA GAGGTAGGTA TCCATAGGGT TCCATTATCC TAGGTTCCTA CTGGGGTACC CCCCTACCCT

GGTTGAACCA GGAGGGGGAG GATCGGGCTC CGCCCCGATA

TGGAGGTATC CTAAGCTCCG

ACCTTAAGGT GCGCCACCCT

TCCTCCTTCC

CCCCTATATA

GTTTTAATGG TAGAATAACC

GTGGAATAGG GTATTGCAGC TGGGTATATA CCTATAGGTA

TATAGAACCT AGAGGAAGGG AACCCTATAG

CCTTACGGTT GCCTGAGCCC ATCCCCCACC

GGGGTGACGT GGCACCCCGC

GACCGATGCT CGCCACTTCC GAGCCACTAA

CCAGCACCCC

GTGGTGGCTG CCTTTGGGAT GCATCACTTT

4400 4500 4700

CTGGGTACAC AGGTGGGGCG GCAGCCTCTA ACTTTGGCTG TGGCCTCTAT TTCCTCCCTT TCCTAGCCAG

TCCTGCATGT CTACTTGCCT CCTGTGGTGG CAGAGCTTGG CCCTGGGCCC AACCCCCGCC T T C C T G T T T G CTGGCAACTT ACTGGCAGCC AAAATGTCAC

TTGGGAGCCT GTAGGGGCCA ACACCCTTGG

CAAGATTTGT CATGCACACC

4800

TTTGTTTGTG

4900

TGCACACCTG

5000

TTGGTATTGG GTTTCTATTC TTGAGTGTGA

5100

TAATTCCTCT TACCTGTTTA GGGTATTGTG CAATTCTTCA GCCTGCCTAT TTTCAATTTG CCTAAGGTGG

5200

CAATTTAAGA TGTGGTTAAT TAACCATTTT CCTGTCTGAC ACCACTGCAT

GGGCAACCGG GTTCCATGGC ACATTTAGAG ATAAACATAG

TTGCTCATGT GCAGAGGAGG GGGTGTTGGT GTGCAATATA GTTTCTGGAT TCCAAATTGA

ATGTCTTGTC

GTTGGGGGTG CTATTTTCAC TATGGAATTA AATTACTGAC

ATTAGACAGT GGACACCGGG CTATATGTGG GGATGTCTGT GGCTTGTCAT TTCCTCTTAG AAGGTAATCC CAAGCCCTGG

GGCCATGTGT

GAGCAGATTC TAATGGGCGC CCGCCTTCTT TCTCTCTTGT TTTATTAATA GAATCTCAGC CAGGACCTAT

TCAAAGTCTG GTCCTGGGTT CTGAGACCCC

AAGTTTGTAA AAAAATTCAT

TAGGAAGCAC

4300

4600

ACCTGAGACT

TAATCCATGA GCCCCGCCTT

4200

GGACGGAAGC TGGATTTTGG CCAGTCTTCA ATTTTGGGGA G T G G T T T T G T GTGAGCCGGA AGTTGGCAAT GGGGTGAGGG TGGCGCTGGT TAAGCTGACG GTCTCTCACC

AAGCCCCTGC

CCCCCCCCTC

CACGTCCCGG

ACCTCCCAAG

ATGACACACT

TGTAATCCCT CCCCCCCCTA

GTGCCTTACT GACTTGTCAC CTTTGCACAT TTGGTCAGCT

TGGGTCATGA CCTGGCCTGT GCCTTGTCCC ATGGACAATG TCCCTCCAGC

GCCCCCGTTG CTCGCCTTGC CTGCCTCACC

TATAGGTTAT TAACCTAGTG

3900

CCCATCTTAA

CTTCCCTTTA AATTGTGATG

G T T A T T T A T A GAATGATTAT CTAGGTTTGA TAGTCTGAAG GCTGGGCAGA GAATGTTTGT A A T T T T T A T T CACCTTCTTT ACCCCCCACG

AGTATCCAGT TCTAGAAGAT CTCCTGATAT CCCGGGCTGC CATTATTCCC

5500 5600

AGGATGTTAG

5700

C T T T T T T G G T GGGGCTGGTT GTCAGGAAGA GGTTCCAGTG TTGTCCTTTA T T T T T A G A T G T T A G C T T T G T GTTAGGTTAG TATGGGCTGG GTATTCACTA

5800

GTGAAGGCAA CTAACACAGT

TTGAGTGTTA TAGCTTCCTC TTAACTTAAG CAAGAGCTCC

5300 5400

TAGACGTGCT AGTTGTGCCC ACTGGTGTTT ATCCGCTCCC

ACAGAACACA

GGGGGCTGGA TTTGGCAGCA

5900

GCTTCCCCTC CCCTTAACAG

GGGGTCTCGC GGGGTGCCAA

6000

CTTCCCCTTG TTAACTTATA GCATGATAGG TAGGTCACCT AACGTGGAAG CCTGGTGGGT GATCCTTCCT CGGTAGGGAG

6100

GCACTTGTGC T T T T G T T G A T TTTTACCCGT GTATCAGAGT GGGGGATGCT AGCCAATTTA TTGTCGCCTG CCTTCCCCCG

CGCTTAGGGC TGTTGAGCTC AACAGCCCCA

CCTGGGTAAA ATGTATGTTC TAAAGAGTTA CCCAATTATA

ATGGAAGTGG TATTGTTGCC GTTGAAAGAC GGGTGTCCTG GCTCAAGTTC GCACTTCCTA CCAGTGTACT CGATAATGTC GACTGCTGCG AAAGGTTTGG ACCGTCTTCC CTTTAGGAGG GACCCTTAAG

CCAGACAATG

AAATGTCACC

ACAAAACTGT

TGTAGGGTAA CGAAGACCTG

TACAGTGTTA AAGCCTTGTA TCGGAAGTTT GGGCTTCGTC

AGTAGGTGTT GGGGGTCCCA AATCACGAGG

6200 6300

TTAGGCAGGT GCACTTGGCT

6400

TAGTGCCCCT T T T T T T T G C A AATTGGCCTT ATTATTAATT TCTTGTTAAC ACTAATTCTG TTCTATGACC

6500

C T G T G T T T T T CAGATGCCGT TGAACGTGTC ACTGAGCTGA ATTTGGACGC AGCTACTTGA CCTTTGCCCC

TTGGTGGGGC CTTGGTGGGC TGCATAAGGC AGTAGGTTTG AGGTGACCTA CTTGGACCAT

6600 6700

GTGGATCCAG TGTCCTGATC CTGGACCTTG ACTATGAAAC

AATTCTAAAA

GTGCACTCGG AAGTCTCATC

6800

ATCTCCGTTT GTGTGTTTAG TGTGGCCAGT ACGGCCACCC

CTGTGCCACG CCCTGGCATG CTGCTGACAT CTGGCCGCCA ATTTCAGCGG GCCCTTTTCC

6900 7000

CACTTTGTGT TACAGGTGGG CCAAACCTCC

CCCTTGTTCA CCCCATAGCA

AGAATATCAA

AAATGCATCA

CGTGCCTCCA

TAGTCCAGTG TCCAGGGACA

GCGCTGATAA GTGCTGCGTC

AGAAGGGTAG GTTACATGGG TATTTTCCCA TCAGCACCTG ACTGGCCGGT GCAATTAGAG GAGAGGGCAA CAACGCAAGG

SEQUENCE

OF

RAJI EBV

DNA

DELETION

341

C T G T T G T T T T ATTTGGGTTA CAAGAGCTGC

GGCGGTCGAT GGGTTCACTG ATTACGGTTT CCTAGATTGT ACAGATGAAC

GGGTCGTAGA CAGTGTGCTT ACCAGACTTC

CATGGAAGAT GTGAATTTGC TGCTAGCTAT ATGGGTGGTG CTATGGGCTC CCTAGGGACT CATGTAGTGG

GGCTTTGTGA TAGCTAATGA ATGTGGCAGC T G T T G T T T G T ACTGGACCCT

TAGAACTGTC ACAATCTATG

GAATTGGAAA CAGTAACTTG GATTCTGTAA CACTTCATGG GTCCCGTAGT

7100 7200 7300

GACAACTATG CTGAATATCT TGAATATGGG AGGAGGGGGG CTTTGGGTTC CATTGTGTGC CCTTTCCTGG CCAACGTGAG

GGTCCTAGTG TTATAGGGCG

7400

TGGCAGTTTT CTTGAGGGCT AATAACCCGG

CGGTTACAAA GTCATGGTCT

7500

AGTAGTTGTG ACCCTGCAAA

GTGAGGCGGT TGTCACAGGT GCTAGACCCT GGAGTTGAAC CAGTACCACT

GCTACGTGGG GATGAGCAGC CAGGGACTTT GGTTGGCAAG CAGACAGGCG

GGCCACCTCT T T G G T T C T G T ACATATTTTG TTATTGTACA TAACCATGGA

GTTGGCTGTG GTGCACTCCA

GTTTAGTCTA TGCCAATGTT TACCTGCCTT GGGTTACTAT TCCAAACGAC

CACACCTTTG AGGACACCTG

ACCTGGCTTC TGTTGGGTCA GACAGTTTGG TGCGCTAGTT GTGTGCTTAG CAGCAACGCA GCATCCTAAG GAGGGATTCT GGAGTGCCTT TCGCGTGAAG CATGCCCTGA CGCCAATTTG TACATCCTAG

CACCAGGCTG

GCGCATTGGA ACCCCAGAGG AGTGTCCCGG

7600

TCTGGTAAGG GGGCTGGTGC GGACGCCTGT

7700

GAGCCCTGAT CATTCTCGGC TTTTACTGCC

7800

ACTGCCTTAG CAGTGTGGCC C T T T A T T G T G

7900

TGTGCCGCTG GATGAGGGAG

8000

CTCTGGCCAT AGAGTTAGCC CACCCTTGTG TCTCCCTTTG GCCTTTGCGG TGCCAATTTC CGGTGGTTTC CCTTTTCCGC

GACGTACTCG AGTTAGGACT TAATCGCTCC

8100

CCGTTTATCC AATAGCATGT AAGAGAGGTT GCCTAGATTT GGCAACTTTG AGGGAACGTT CCGTGTAGCT GGTGACCTAA CACCCGCCCA

TCACCACCGG

ACAGATTCTG AACTTGTCCT G T G G T G T T T G G T G T G G T T T T GGGGTACGCA GGAGTACGTT GGAATGCTTT GGAGCCGAGA GGGATGGGCC CGCTTGTGCG C T T A T G T G T T ACACGGTGCC

CCGGTGCGGC TGCCCCGTGA CCCGTGGGCC TTACCTTCCT GGCCATCGGG GGACCCTGGT GCTAGGGTCC

8400

CTTGTGTTGC TTTCTGCCAT AGGGGGGAAA GCATCGCCTT CAGAATTGGC TGCTCCGTTG GAACATTTGA GGCCTACTGT ATCCGTGTCC TGACAACATT

8500

CCCCGCAAAC

AATAACCGGC

8200 8300

ATGACATGGG TTAATTTAAA C A T G T T T T G T TTGCTTGGGA ATGCTCTTAG GGCCTGGAAG CTTGTCATTG GATTCATCGT TTCCTGAACT

ACAGGCGTAG GGCCTATTGT AGCAGGCATG TCTTCATTCC TGCGTACCGA ATGGCATGAA GGCACAGCCT

8600

GTTACCATTG GCACCTTTTT TCCATGTAAA

8700

CCTCCGTGAT CCTGGGTCCT TTGGAGACTC AAGTGTGAAT T T G T T T T G G T GTTCGGCGCC AGGGTATCTC GACGTTGGAA TGTCAACTCA ACTTGGCCAC

8800

CTCGATAACC

GGCTCGTGGC TCGTACAGAC

GATTGTTTGG CTCTGTAACT TGCCAGGGAC GGCTGACGAT GTGTTTAGTC TGCCACTTGC ATCCGGCGCT

TTGGTTACTC GGGAGACTAA TGGGGGGTGT GGTATGGCAC AGGCTGGGGG TGAGTCTGGG GATGTCCCTG GGCGTTGCTG CAGCCCATTC GATGAGATGT TCAGGGGTGG CCGGTACCCT TCTTCGGAAA CGCCAACCCA

ACGCTGCCGA

8900 9000

TTTACATAAT ATAAATTGTA AATGCTGCAG TAGTAGGGAT CTGGACGCGC GACCTGCTAC

9100

ATACTCCCGC

9200

CCCAGGCCTC

CCTGGTGAGC CCTTGCCGCT

TTTCCTTCCC CCGTTAATGA AAAGAATGAC

AGTGAGGTTG TGACAGAAGG

9300

ACAGCTTTAT TCAGTTTACA GAGTGCCCTC GGAGGCTACG ATATTCCCGT TAAATGTCTT GTTGATTCTC TCAAAGGTGG GGAGGGAGGA GCTCTCCACA

9400

CCCCGCATTC

GGAGCGTCGC CTCTGGCCCC

GCCCTCTGGG

CTGCTTTCGG CGCCCCTGCG

ACAATGTTCC CTGGCAGCGT GAGCGCGCAG

GATCCCGATG ACAGCAGGCC

CATGCGACTG CTCGCCCCCT

CCCTGCCGTT GGATGTATCT TCTCATGATG GTGCTGATAG AGGGGTCTCC GGCGTAGATG AAAAAGGCCT

9500

GGGCCATGCT CTGGCCGGTC ACGATCGTTA T G G G G T T G T T GGAAATGTTC CGGACCGTCA GCTTGAGGGT CTGGCCCGGC TTCCACTCCT

GTGGGTAGAC CGTGATGAAA

9700

CGCCAGTGGC TCTTAGGGTA GGGGTTGTAA CGGAAGGCAA

9800

GTAGAAGACC GGGTTGGAGG AGTGGGACAC

GACAACGGCC

TTAGGAGGAA TAATCACAGG

GGAGCCAAAG

TAGCGGATGT CTGTGGATTC CCCGTCCCAG

TAATCACATC ATCCAATAGG

GTCATGCCCA

GCGGCGTATG GCGTTTACCG GACACACCTT

GTAATCTTGG AGCTCAGGGG GGCCTCGTAG G T G T T G T T G T ATTCCAGCTC

CCTTGACGTT CAGCGGGCCC TCTCGTTTCA GGTCCGGCGT GTCCACGGAG ACTCGGACGT AGCCCTTACC CCCCGGGAAT

GTGTGAATAC GGGCGTATGA CTTTAGAAAT GGGGGCGTGT GCTGCGCCAG CAGGTAAGGC

AGGCACTCGT CCTGGCTGGT GACGGGAGAG CCACTGAGGA AGATCTGGGG CTCGCTGGTG T T T A G C T T G T CCCCGCTCTG

9600

9900 10000

GGTGCAGGAG CGTGTCAGCT

10100

TGGGCACAAT GTACCAGGTA AACTTCCCTT TCTCTATGAA GGGCTTGAGG AATACCTCGT TGTCTTCCGC TCCAAAGAAC

10200

GCGGCTGCCG TAGGTGGTCG TGTTGAAGGA GAAAGAAGGT AACTTGAAGC TGAGTATCTG GCCCACCGAG GGGCAGGGAG

10400

GCAGCTCTTG GCACTGCGCG TCCAGCTGCA ATACCTGCTT GTTGGTGACG CGGACGTATG AGGGGAAGAT CTCGTACTTC CACACGCCTC TCATGAACGA CGTGTCTGGT TTTTCAGTGG GCCGCAGGCG GCGGAGGCTG TTCCTGAACG ACGAGCGCCG GGACGCTAGT GCTGCATGGG CTCCTCCGGG GTAAGCTTCG

10600

GCCATGGCCG GAGCTCGTCG ACGGGCAAGG TGAGAGTCGG GGGGCGGGCG ACGGTGCGGC CCCAATACAA

10700

GAATGTCGCT CTGCCCGGGC AGAATCTGCA CAGGCTGCGC GGATTCAGGA CGCTTAGCAC AAAAACGCGA

CCGTAAAGTA

GGTAGAGGTA GGGGTTCTTG ACCAATCTGA GATGTCCTGG TCAGAGTGCA TAACGAAGAA

CCCGGCGTCT AGGTTGTCAC TTCGCTCGGC CGGCCAGAAG

AGCGCAAGTC

ACTGTCATTT TTATGGGCGA GTGGGCGGTC CACACGCGCG

GGCGCAGCAC CCATTGGTCG CACGCCGCCT

TGCCGCCCTG GAGATGGCCC TGTGCCATCA CATGTATTTC CTCCTTGGGT GGAACAACGG TTTCAGGGTC ACGAGACCGG

GGTCCATGCT CAGAGAACAG

CCCTCCCGGG

CTCTCCGCTC

CCTGGAATGT AAACCCCCAG AGCAGAACGT

GGCCGCGCCA GAGCAGAGCC

CGGGACTGAC TCAGGGCCAC

ACCTCTCTCC

GGCACCCCAG

CCTGGTCAGC

ACACCTTCAC

TCCTTGTTAG GTTGATAGAA TGTCGGTACC ACGCCACGGG GGGCGGGCCC GCATAGGGAA AAGCCAGGGA

GGCTCAGGCG GCCCCAGACA

AATGAGACAC

TACCTGGCCA

10800 10900 11000 11100

ACGTCCCGTG TCCCAGCCGA

11200 11300 11400

GAGCGATGTG GGCGAGGATG

11500

TCACGGGGGG CGCATAGGAG

11600

CGTGAGGATT TAGGCAACGC AGGCTTGTCT TTATAGTTAC AAACATGGGA

11700

GGCCGCAGCC TGCCCCTCGG CGGCCCGTGC CCCAGCTCCG AGACACAGGT

10500

GCCCTCCTCG

CCCGCGGACG CCGCCTCGCC AGCCCCCGGG CCTTCATGGG CCCGCTTTCT

GCGTGCACCT GGAAGATGCA GCTGGGGTAG ATCTTTACAT CTTTACAGGG CGCAGCGGCC GCCAGACACT AGAGGGAGCC CACCAGCCCG

CGCAGACCCA

CGCAGAGGAA GCATGACCTT GGGGTGGGAC GGGGCAGGCG TGATCCTGGG CGCAATCTTT GCCGATCCCC

CGCAATTTGC CCCCCTGGGC

GGGTATATCT AGGATAGCCG CACCTACACA

AGAGAGGGCG

CGTCGTGCTC CGGGTGGAGG GGGAATAGCG TCCAGGCATC

CGGTGGGCGG CCCGGGCTCC

GCCCTGTCCA CCTGTATGTC CAGGTGCACG GACCCGGAGG CTGCGTCTCG TGACATGGCC AGGCCTGGTG CCAGCCGACC ATCCCCTCGG

GTTAGCTGGT AGAATATCCG

CAAGTCTGGT GCTGGGGCCG ATGTGCAGCG GTTTGTGCCC GCAGTTGTAG

10300

GAAGGGCAGA GTTCACGGCG GGCACCTCCC

CGGCC

11800 11835

FIG. 1. The nucleotide sequence of Raji DNA covenng the B95-8 deletion. The sequence is numbered from the first base of the 4-base redundancy (GGCC) found at the begrnning and end of the sequence. The Pstl repeats are found from bases 1046 to 3562. Horizontal arrows delrneate the boundanes of the complete repeats shown. The left boundary of the repeats containing the Incomplete repeat unit is indicated with a vertical arrow. The sequence is shown without 20 copies of the repeat in the middle of the repeat region. DR is found between bases 3554 and 4609. Reading frame LFl begins 140 bp beyond the right end of the sequence shown (B958 position 152.152) and ends at base 10,566. Reading frame LF2 begrns at base 10,605 and stops at base 931 1. Reading frame LF3, which spans the Psfl repeats, starts at base 3625 and ends at 851. This sequence has been deposrted with the EMBLsequence database.

Northern

blot analysis

RNA samples (2 Fg per track) were denatured using glyoxal and electrophoresed and blotted as described previously (Biggin eta/., 1984). Probes were nick-translated plasmids (Rigby et al., 1977) containing restriction fragments of Raji DNA covering the entire deleted re-

gion of 895-8 or small gel-purified fragments just the predicted open reading frames. Determination

covering

of repeat copy number

A subclone of the Raji EcoRI-C fragment spanning the PSI repeats was digested with Ban1 and Ddel. This

342

PARKER DCBAABCD

ET AL

SIZE (Kb)

-2322 -2027

-

1353

-

107%

-

872

-

603 584

= -

310 281 234 194

FIG. 2. Determination of the repeat copy number of the PSI repeats in Raji DNA. The ladder produced by the partial digest is shown in lane A. Lane B shows the complete digest of the same DNA. Lanes C and D contain size markers. The sizes of the marker fragments are shown on the side of the figure.

digestion removes ail but 30 bp of the DNA flanking the repeats. The remainjng 2.5-kb fragment containing the Psti repeats was purified by electroelution from an agarose gel slice. The DNA was subjected to a partial digestion using Pstl and electrophoresed on a 270 agarose gel. HindIll-digested X DNA and +X174 DNA digested with Haelll were included as size markers. The gel was stained with ethidium bromide and photographed. The repeat copy number was determined by counting the bands in the resulting ladder relative to the size markers. RESULTS DNA sequence The DNA sequence of the Ml3 library of Raji DNA covering the deletion in 695-8 was compiled to give a completely contiguous sequence and is shown in Fig.

I‘

P

1. The sequence in Fig. 1 has a G-C composition of 60.7% which is very close to that of 60% determined for the entire 895-8 genome (Baer et al., 1984). Although Fig. 1 shows only the sequence deleted from B95-8, several kilobases of Raji sequence flanking this were also determined to a slightly lower degree of accuracy (data not shown). The flanking sequences were very similar to the equivalent B95-8 sequence (less than 0.5% variation, 30 of 6670 bases), as was found earlier when comparing most of the adjacent EcoRI Dhet region from both Raji and B95-8 (Hatfull et a/., 1988). There is a redundancy of 4 bases in the Raji sequence around the point of deletion in 895-8. One copy of the redundancy (GGCC) is present in B95-8 at the point of the deletion ~nucleotides 152,009-l 52,012). However, the sequence GGCC is found at both ends of the Raji DNA spanning the deletion. Therefore, the precise position of the missing sequence in B95-8 and the deletion of Raji sequences has been fixed arbitrarily as the first base in the redundancy. The 4 base redundancy (GGCC) was used to start and finish the sequence in Fig. 1. The Raji DSR includes a region of 102-bp repeats, each repeat containing a Fstl site. It was not possible to sequence directly through all of these repeats, so the copy number was determined by partial restriction digestion of the cloned DNA. This Raji DNA clone proved to contain 24.7 copies of the 102-bp Pstl repeat (Fig. 2) and this number of repeat copies was inserted into the sequence data given in Fig. 1 to provide a complete copy of the DNA deleted from 895-8. The repeat region is found between bases 1046 and 3562. in order to identify the maximum number of whole repeat units, the right bounda~ of the repeats was fixed at the sequence CCGGCGGGG which occurs at the left end of the DR sequence and in all copies of the repeat. This fixed boundary permitted identification of 24 complete repeat units moving right to left. The most leftward repeat unit is incomplete when using this method of determining the repeat units. It is missing a G at nucleo-

i

GGTGGCCGGCTGCAGCCGGGTCCGGGGTTCCGGCCCTGGAGCTC~GGGGCGG~CGGGTGGCCCACCGGGTCCGCTGGGTCCGCTGCCCCGCTCCGGCGGGG 10 20 30 40 50 60 70 80 90 CCACCGGCCGACGTCGGCCCAGGCCCCAAGGCCGGCCGGGACCTCGAGCCCCCCGCCGGCCCACCGGGTGGCCCAGGCGACCCAGGCGACGGGGCGAGGCCGCCCC T I

A

P

Q

LRTRPEP S

GPARPAAPHGVPDAPDAAGSRRP W

FIG. 3. Variations found in the sequence of the fstl repeat. Individual point amino acid sequence, with the resulting changes, is shown below the parent than one base, the GC to AG at positions 97-98 in the diagram.

H variations are indicated DNA repeat sequence.

100

P R above the base changed. Only one of the changes

The putative involved more

SEQUENCE

oriLy1 I

OF

R&I

(D R ) I

ti

LF3

LF2

LFI

FIG. 4. A diagrammatic representation of the sequence found in Fig. 1. The predicted open reading frames are shown as boxed arrows The vertical arrows indicate potential polyadenylation sites. The promoter for the LF3 reading frame IS indicated by the angled arrow. The probes used for Northern blot analysis are shown as horizontal arrows labeled A through F. Probes A, B, and C were gelpurified restriction fragments. The other probes were plasmids containing the regions indrcated. The DR element containing the second orilyt is also indicated. The mRNA correspondrng to LF3 is Indicated beneath the boxed arrow.

tide 1058 in Fig. 1 and the repeat stops at nucleotide 1046, resulting in 0.7 of a repeat unit. It was not possible to establish the sequence precisely through all of the repeats and some minor heterogeneity of the Pstl repeats was evident from the sequence database. It is clear that the outer two complete copies of the repeat on each end of the cluster are as indicated in Fig. 1 but thereafter it was not possible to assign the point variations to particular repeat copies. A summary of the point variations detected is shown in Fig. 3, where for simplicity all the variations detected are shown against a single DNA repeat copy. Some of these changes will cause minor alterations in the protein sequence as shown in Fig. 3. So far as we could tell there was no evidence for insertions or deletions in the repeat copies but this cannot be absolutely excluded. A previous report (Laux et a/., 1985) indicated the presence of an Apal restriction site within two of the repeats in M-ABA DNA but an Apal digest of the clones used in this study yielded no fragments indicative of Apal sites (data not shown). The sequence was translated in all possible reading frames on the computer and the long open reading frames are shown in Fig. 4. Although it represents a departure from the systematic nomenclature of 895-8

EBV

DNA

343

DELETION

reading frames, to avoid confusion we have used the same names for these reading frames that other workers (Hitt et a/., 1989) who have interpreted this sequence data have used. The putative amino acid sequences of the three reading frames are shown in Fig. 5. A FASTA search (Pearson and Lipman, 1988) of the NBRF-PIR library (release 24) revealed no significant homologies between these sequences and those in the database. FASTA searches of just the reading frames found in CMV and HSV also failed to detect any homologies. The sequence contains a duplicated region known as DSR which is similar to a region in BarnHI-H containing oriLyt and a series of /Votl repeats (Dambaugh and Kieff, 1982). This left copy of the duplication is known as DS. These duplicated regions contain sequences of 1055 bp which are almost identical and have been called DL and DR (Laux et al., 1985). DL and DR seem to be equivalent to the two copies of oriLyt (HammerSchmidt and Sugden, 1988). Dc is found between bases 52642 and 53697 in B95-8 and the Raji DR between bases 3554 and 4609 in Fig. 1. Northern

blotting

analysis

RNA was obtained from four cell lines containing EBV and analyzed for the presence of mRNA transcribed from the sequence shown in Fig. 1. A 2.8-kb message which was recognized by a probe covering the Pstl repeats was found in P3HR-1 cells induced with TPA and in P3HR-I-superinfected Raji cells (Fig. 6). The same probe recognized a 3.7-kb message in AG876-CR cells. The larger size of this RNA in AG876 most likely results from a higher copy number of fstl repeats in AG876 virus (Heller et al., 198 1; Dambaugh and Kieff, 1982). The RNA from the Pstl repeats has been mapped and reported previously (Laux et a/., 1985) and the presence of the RNA in PAA-treated induced cells confirms that it is an early mRNA. No other RNA was detected in any of the cell lines tested with probes covering the entire deleted region (probes C-F in Fig. 4) or shorter probes that correspond just to the open reading frames (probes A and B in Fig. 4). Total RNA (not poly(A)-selected) was also tested with each probe but no other RNA was detected (data not shown). DISCUSSION This report of the DNA sequence of a region of EBV deleted from the prototype B95-8 strain completes the sequence of all parts of the virus genome found in wildtype EBV. Both 895-8 and Raji are subtype A (Zimber et al., 1986). This region has been mapped (Dambaugh and Kieff, 1982) for the AG876 strain (subtype B) and

PARKER

344 READING

FRAME LFl

MOLECULAR

10 MALPTDTHAU

20 RVEIGTRGLM

110 SLRCLNPHDL

120

210

220 HASSAADQAG

310 410

READING

420 ALFUPAERSD

20 AALASRRSSF

110

310 PNFITELEYN

320

410 IVVESSSLPT

READING

420 FERINKTFNG

10 110 PVGHPAAPRA 210 ADPVGHPAAP 310 DPADPVGHPA 410

20

30 130 PATPRRSGAA

220

230

RAPGPEPRTR

LQPATPRRSG

320 APRAPGPEPR

370 SLUGFTFQEA

280 GLAMSRDAAS 380 ACDQUVLRPR

190 RHSINLTRSE

200 GVGIGKDCAQ

290 GSVHLDIQVD

300 RAEEGUVCOV

390 VUTAHSPIKM

400 TVYNCGHKPL

470 LTPEEPMQH

60 160

250 RVSVDTPDLK

260 REGPLNVKVG

340

350 UKPGQTLKLT

360 VRNISNNPIT

70 LDAQCQELPP

80 CPSVGQILSF

170 ILPGQSDIQL

180 TRSCTQSGDK

270 MTLLDDVIIA

280 FRYNPYPKSH

370 IVTGQSMAQA

90 KLPSFSFNTT

380 FFIYAGDPSI

190 LNTSEPQIFL

100 TYGSRYFTVA 200 SGSPVTSQDE

290 URUDGESTDI

300 RYFGSPVIIP

390 STIMRRYIQR

400 QGCALTLPGN

TL

HPRRSGAADP

120 PGPEPRTRLQ

460

LVKNPYLYLQ

NPVFYVYPQE

MOLECULAR

APGSGLGAHP

360

180

100 HAPMFVTIKT

430 NIVASEGTLG

FRAME LF3

MKRVARGPCL

270

90 KIYPSCIFPV

YAGPPPVAUY

WDTGRGRLAP

VRVTNKQVLQ

150

240

330 ITAVVVSHSS

260

50

KFTUYIVPIR

NAIRRGKGYV

170 SPTSLSLAFP

SRALLURGLG

80 LAAAAPCKDV

= 50498

140

230

160 LRVUGRLSPS

PVDELRPWPK

VWKYEIFPSY

NPRSLFIEKG

HTFPGKVCPV

NTYEAPLSSK

WEIGHT 40

130

220

450

70 VPAVNSALQC

AQGHLQGGTP

TVARPPTLTL

EKPDTSFMRG

SDQDIVLSVL

PPFLKSYARI

350 VPPKEEIHVM

440

30

120

210

340

LTSGELYUGR

RNSLRRLRPT

VFLKPFFVMH

CLPYLLAQHT

250 AGRGOVALSQ

MOLECULAR

10

FLFFGAEDNE

240 EGPGAGEAAS

430 NLDAGRIFYQ

FRAME LF2

MAEAYPGGAH

150 GQAAAAPGGK

FPLHPEHDAV

60 RYGLVGSLUE

140

330 LVTLKDAWTL

50 VSAYEALAVA

RDGAGARAAE 230

320 EGCSLSMDPG

HIGPSTRLGL

130

VPERGRKRAH

= 48693

40 EGPYHKLRLP

ILDIPLLCAP

DHACPVPPQG LLEPGPPTAR

30 FSNCVPLHLP

CLCLICVGAA

WEIGHT

ET AL

WEIGHT

= 94306

40 ADPVGHPAAP

50 RAPGPEPRTR

140 DPADPVGHPA

150 APRAPGPEPR

240 AADPADPVGH

60 LQPATPRRSG 160 TRLQPATPRR

250

260

PAAPRAPGPE

PRTRLQPATP

70 AAOPADPVGH 170 SGAADPADPV 270 RRSGAADPAD

330

340

350

360

370

TRLQPATPRR

SGAAOPADPV

GHPAAPRAPG

PEPRTRLQPA

TPRRSGAADP

420

430

440

180 GHPAAPRAPG 280 PVGHPAAPRA 380 ADPVGHPAAP

190 PEPRTRLQPA

100 RRSGAADPAD 200 TPRRSGAADP

290

300

PGPEPRTRLQ

PATPRRSGAA

390

400

RAPGPEPRTR

LQPATPRRSG

460

470

PAAPRAPGPE

PRTRLQPATP

RRSGAADPAD

PVGHPAAPRA

PGPEPRTRLQ

PATPRRSGAA

DPADPVGHPA

APRAPGPEPR

TRLQPATPRR

510 SGAADPADPV

520 GHPAAPRAPG

530 PEPRTRLQPA

540 TPRRSGAADP

550 ADPVGHPAAP

560 RAPGPEPRTR

570 LQPATPRRSG

580 AADPADPVGH

590 PAAPRAPGPE

600 PRTRLQPATP

610

620

630

640

PVGHPAAPRA

PGPEPRTRLQ

PATPRRSGAA

710 TPRRSGAADP 810 PATPRRSGAA 910 VVIGSEILET

720 ADPVGHPAAP

730

740

RAPGPEPRTR

LQPATPRRSG

820 DPADPVGHPA

830 APRAPGPEPR

650 DPADPVGHPA 750 AADPADPVGH

840 TRLQPATPRR

850 SGAADPADPV

660 APRAPGPEPR 760 PAAPRAPGPE 8-50 GHPAAPELQG

480

90 PRTRLQPATP

AADPADPVGH

RRSGAADPAD

450

80 PAAPRAPGPE

490

670

680

690

TRLQPATPRR

SGAADPADPV

GHPAAPRAPG

770 PRTRLQPATP 870 UVLRPKGTGG

780 RRSGAADPAD 880 DFRGIGVTIN

790 PVGHPAAPRA 890 ULNLHHVYVV

500

700 PEPRTRLQPA 800 PGPEPRTRLQ 900 FHAAYRLEGQ

920 RVPLKQGALL

PLLF

FIG. 5. The putative amino acid sequences of the reading frames found in the Raji sequence of the B95-8 deletion. The reading frames were translated on the computer using the first methionine as the start of the sequence. The predicted molecular weights are also indicated. The first repeat peptide in LF3 corresponding to the Pstl nucleotide repeat begins with PRRS at residue 22 and ends with QPAT at residue 55. The putative LF3 protein contains all of the predicted repeat units.

part of the sequence reported here has been determined previously (Laux et al., 1985) in the M-ABA strain (subtype A). The reported M-ABA sequence which spans the Pstl repeats has a single nucleotide deletion close to the beginning of the LF3 reading frame relative

to our sequence at base 3559. This change makes a substantial difference in the resultant hypothetical protein sequence (because of the unusual sequence composition of the repeats, all three potential reading frames are open). A poly(A)+ RNA has been demon-

SEQUENCE P3HRl U TPA

OF RAJI EBV

AG076-CR PAA

U TPA

M

4-

-

e

FIG. 6. Northern blotting analysis of RNA hybridlzlng to probes spanning the B95-8 deletion. Probe F In Fig. 4 was hybridized to RNA from P3HRl cells uninduced, induced with TPA, or induced withTPA In the presence of PAA. RNA from uninduced and induced AG876CR cells is also shown. Radioactive X HindIll size markers were also Included.

strated to traverse the LF3 frame (Laux et al., 1985) (Fig. 6) and a protein product has been identified for the equivalent Not1 repeats in D& (Nuebling and MuellerLantzsch, 1989) so it is likely that a protein will be expressed from the LF3 frame despite its unusual structure. Other minor differences between the M-ABA and Raji sequences occur in the regions containing no major open reading frames. The remainder of the region contains only two major reading frames (Fig. 4). Northern blotting analysis revealed no corresponding RNAfor these reading frames in any of the B-lymphocyte cell lines tested. The reading frames may be expressed at extremely low levels in these cells, or may be expressed only in epithelial cells. Novel rightward-spliced transcripts have been described in the Cl 5 epithelial tumour line (Hitt et a/., 1989) and it is possible that some exons of these RNAs come from the B95-8 deletion region. The region from bases 4610 to 9310 in Fig. 1 contains no long open reading frames and no easily predicted pattern of splice sites. No RNA was detected with probes from this region. Most of the rest of the EBV genome is very tightly packed with viral genetic functions (Farrell, 1989) so it would be surprising if this unique sequence had no function. If there is intricate splicing of small exons or if there are nontranscribed genetic elements within this region, it is difficult to identify them from the sequence alone. The DSR and D& regions show an extraordinary degree of sequence conservation (Dambaugh and Kieff,

DNA

DELETION

345

1982; Laux et al., 1985) and it seems that one must have evolved from the other. Both can function as origins of viral DNA replication in the replicative cycle (Hammerschmidt and Sugden, 1988) but clearly both origins are not required for replication since B95-8 lacks one and in cell culture replicates as well as other strains. The requirement for two Iytic origins might be manifested only when replication is occurring very efficiently. The significance of the transcription unit to the left of each oriLyt (through the Notl and Pstl repeats) is still mysterious. There is no gap between the DR and the fstl repeats, but there is a gap of 525 bp between the D, and the Notl repeats. Although the LF3 and BHLFl frames, the Pstl and Not1 repeats, and the relevant viral RNAs appear superficially equivalent, the sequences of the LF3 and BHLFl proteins are only moderately related; both have a very high proline and arginine content within the repeats. The three open reading frames in the deletion from B95-8 are evidently not vital for either transformation or replication of the virus in B-cells, as B95-8 is able to do both rather efficiently. Although there are no open reading frames clearly homologous to LFl and LF2 in the remainder of the EBV genome, there could be proteins encoded by the virus which can substitute their function or they may be important only in viva. Although their sequences are different, the proteins encoded by LF3 and BHLFl seem likely to have similar functions because of their related genomic organization. Perhaps B95-8 can function without LF3 because it retains BHLFl, and P3HR-1 (which has a deletion of BHLFl, Bornkamm eta/., 1982) can substitute LF3 forthe missing BHLFl. ACKNOWLEDGMENTS We thank Beverley Griffin, John Arrand, and Lars Rymo for supplying clones of Rajl EBV DNA and Cliona Rooney for AG876-CR cells.

REFERENCES BAER, R., BANKIER, A. T., BIGGIN, M. D., DEININGER, P. L., FARRELL, P. J., GIBSON, T. J., HATFULL, G., eta/. (1984). DNAsequence and expression of the 695-8 Epstein-Barr virus genome. Nature (London) 310,207-211. BANKIER, A. T., and BARRELL, B. G. (1983). “Techniques In the Life Sciences.” Elsevier, Ireland. BIGGIN, M., BODESCOT. M., PERRICAUDET, M., and FARRELL, P. (1987). Epstein-Barr virus gene expression in P3HRl -superinfected Raj cells./. Viral. 61, 3120-3132. BIGGIN, M., FARRELL, P. J., and BARRELL, B. G. (1984). Transcriptton and DNA sequence of the BamHl L fragment of 895-8 EpsteinBarr Virus. EMBOJ. 3, 1083-l 090. BORNKAMM, G. W., HUDEWITZ. J., FREESE. U.. and ZIMBER. U. (1982). Deletion of the nontransforming EBV strain P3HR-1 causes fusion of the large internal repeat to the DSL region. /. Viral. 43, 952-68. CHEN. E. Y., and SEEBURG. P. H. (1985). Supercoil sequencing: A fast and simple method for sequencing plasmid DNA. DNA 4, 165. 170.

346

PARKER

CHOMCZYNSKI. P., and SACCHI, N. (1987). Single-step method of RNA isolation by acid guanidinium thiocyanate-phenol chloroform extraction. Anal. Biochem. 162, 156-l 59. DAMBAUGH, T. R., and KIEFF, E. (1982). Identification and nucleotide sequences of two similar tandem direct repeats in Epstein-Barr virus DNA. J. Viral. 44,823-833. FARRELL, P. 1. (1989). “Epstein-Barr Virus in Genetic Maps” (S. O’Brien, Ed.). Cold Spring Harbor Laboratory, Cold Spring Harbor, NY. HAMMERSCHMIDT, W., and SUGDEN, B. (1988). Identification and characterization of orilyt, a lytic origin of DNA replication of DNA replication of Epstein-Barr virus. Cell 55, 427-433. HATFULL, G., BANKIER, T., BARRELL, B. G., and FARRELL, P. J. (1988). Sequence analysis of Raji Epstein-Barr virus DNA. virology 164, 334-341. HELLER. M., DAMBAUGH, T.. and KIEFF, E. (1981). Epstein-Barr virus DNA. IX. Variation among viral DNAs from producer and nonproducer infected cells. J. Viral. 38, 632-648. HINUMA, Y., KONN. M., YAMAGUCHI, J., WUDARSKI, D., BLAKESLEE, J., and GRACE, J. (1967). lmmunofluorescence and herpes-type virus particles in the P3HR-1 Burkitt lymphoma cell line. J. Vifol. 1, 10451051. Hire. M. M., ALLDAY, M. J.. HARA, T., KARRAN, L., JONES, M. D.. BusSON, P., TURSZ, T., ERNBERG, I., and GRIFFIN, B. E. (1989). EBV gene expression in an NPC related tumour. EMBOJ. 8,2639-2651. KLEIN, G. (1989). “Advances in Viral Oncology,” p. 273. Raven Press, New York. LAUX, G., FREESE, U. K., and BORNKAMM, G. W. (1985). Structure and evolution of two related transcription units of Epstein-Barr virus carrying small tandem repeats. J. Viol. 56,987-995.

ET AL NUEBLING, C. M., and MUELLER-LANTZSCH, N. (1989). Identification and characterization of an Epstein-Barr virus early antigen that is encoded by the Not1 repeats. /. Viral. 63, 4609-46 15. PEARSON, W. R., and LIPMAN, D. J. (1988). Improved tools for biological sequence analysis. froc. A/at/. Acad. SC;. USA 85, 2444-2448. PIZZO. P. A., MAGRATH, I. T., CHAT~OPADHYAY, S. K., BIGGAR, R. J., and GERBER, P. (1978). A new tumor-derived transforming strain of Epstein-Barr virus. Nature (London) 272, 629-631. PULVERTAFT, R. J. V. (1965). A study of malignant tumours in Nigeria by short term tissue culture. 1. C/in. Parho/. 18,261-273. RAAB-TRAUB, N., DAMBAUGH, T., and KEIFF, E. (1980). DNA of EpsteinBarr virus. VIII. B95-8, the previous prototype is an unusual deletion derivative. Cell22, 257-267. RIGBY, P.. DIEI(AMNN, M., RHODES, C., and BERG, P. (1977). Labelling deoxyribonucleic acid to high specific activity in vitro by nick translation with DNA polymerase I. J. Mol. Biol. 113,237-251. SANGER, F., NICKLEN, S., and COULSON, A. R. (1977). DNAsequencing with chain-terminating inhibitors. froc. Nat/. Acad. SC;. USA 74, 5463-5467. STADEN. R. (1982). Automation of the computer handling of gel reading data produced by the shotgun method of DNA sequencing. NucleicAcids Res. 10, 4731-4751. STADEN, R. (1984). Graphic methods to determine the function of nucleic acid sequences. Nucleic Acids Res. 12, 521-538. ZIMBER, U., ALDINGER, H. K.. LENOIR, G. M., VUILLAUME, M., KNEBELDOEBERITZ, L. G., DESRANGES, C., WITTMANN, P., et a/. (1986). Geographical prevalence of two Epstein-Barr virus types. L/iro/ogy 154,56-66.

Sequence-tagged sites (STSs) spanning 4p16.3 and the Huntington disease candidate region.

Viable deletion mutants in the simian virus 40 early region.

Transcription of BK virus DNA by Escherichia coli RNA polymerase: size and sequence analysis of RNA.

The human Y chromosome: overlapping DNA clones spanning the euchromatic region.

The complete DNA sequence of vaccinia virus.

4A Protease Inhibitors Spanning the P2-P1' Region.

Severe psychosocial deprivation in early childhood is associated with increased DNA methylation across a region spanning the transcription start site of CYP2E1.

C delta region.

A nuclear matrix attachment region organizes the Epstein-Barr viral plasmid in Raji cells into a single DNA domain.

Segments of simian virus 40 DNA spanning most of the leader sequence of the major late viral messenger RNA are dispensable.

Nucleotide sequence and transcription of the right early region of bacteriophage PRD1.

Interrupting the early region of polyoma virus DNA enhances tumorigenicity.

Presence of deletion molecules in human wart virus DNA.

Nucleotide sequence around the replication origin of polyoma virus DNA.

New classes of viable deletion mutants in the early region of polyoma virus.

Stable heteroplasmy for a large-scale deletion in the coding region of Drosophila subobscura mitochondrial DNA.

A single base deletion in the 5' noncoding region of Theiler's virus attenuates neurovirulence.

Agroinfection and nucleotide sequence of cloned wheat dwarf virus DNA.

Transcription of polyoma virus DNA in vitro. II. Transcription of superhelical and linear polyoma DNA by RNA polymerase II.

A Novel SRP Recognition Sequence in the Homeostatic Control Region of Heat Shock Transcription Factor σ32.

A sequence pattern that occurs at the transcription initiation region of yeast RNA polymerase II promoters.

A 69-base-pair monkey DNA sequence enhances simian virus 40 replication and transcription through multiple motifs.

Deletion analysis of the 5' untranslated leader sequence of tobacco mosaic virus RNA.

BK virus DNA sequence: extent of homology with simian virus 40 DNA.