Vol. 65, No. 12

JOURNAL OF VIROLOGY, Dec. 1991, p. 6516-6527

0022-538X/91/126516-12$02.00/0 Copyright © 1991, American Society for Microbiology

Identification, Cloning, and Sequencing of a Fragment of Amsacta moorei Entomopoxvirus DNA Containing the Spheroidin Gene and Three Vaccinia Virus-Related Open Reading Frames RICHARD L. HALL AND RICHARD W. MOYER*

Department of Immunology and Medical Microbiology, University of Florida, Box 100266, JHMHC, Gainesville, Florida 32610-0266 Received 21 May 1991/Accepted 28 August 1991

Entomopoxvirus virions are frequently contained within crystalline occlusion bodies, which are composed of primarily a single protein, spheroidin, which is analogous to the polyhedrin protein of baculovirus. The spheroidin gene of Amsacta moorei entomopoxvirus was identified following the microsequencing of polypeptides generated from cyanogen bromide treatment of spheroidin and the subsequent synthesis of oligonucleotide hybridization probes. DNA sequencing of a 6.8-kb region of DNA containing the spheroidin gene showed that the spheroidin protein is derived from a 3.0-kb open reading frame potentially encoding a protein of 115 kDa. Three copies of the heptanucleotide, TTTTTNT, a sequence associated with early gene transcription in the vertebrate poxviruses, and four in-frame translational termination signals were found within 60 bp upstream of the putative spheroidin gene promoter (TAAATG). The spheroidin gene promoter region contains the sequence TAAATG, which is found in many late promoters of the vertebrate poxviruses and which serves as the site of transcriptional initiation, as shown by primer extension. Primer extension experiments also showed that spheroidin gene transcripts contain 5' poly(A) sequences typical of vertebrate poxvirus late transcripts. The 92 bases upstream of the initiating TAAATG are unusually A+T rich and contain only 7 G or C residues. An analysis of open reading frames around the spheroidin gene suggests that the colinear core of "essential genes" typical of the vertebrate poxviruses is absent in A. moorei entomopoxvirus.

Recent studies of the spruce budworm, Choristoneura biennis, EPV (CbEPV) have reported the identification and sequence of the CbEPV spheroidin gene (40), the discovery of amino acid homology between CbEPV spheroidin and a baculovirus glycoprotein (39), and the demonstration that the 5'-noncoding region of a CbEPV gene functions as a late promoter when introduced into vaccinia virus (26). The spheroidin gene of AmEPV presented in this paper does not resemble the CbEPV spheroidin gene (40). This is a surpris-

Entomopoxviruses (EPVs) are poxviruses of insects. A crystalline occlusion body (OB) composed primarily of a single protein, spheroidin (2), protects the virions during transmission from one insect to another. The gene for the highly expressed spheroidin of Amsacta moorei entomopoxvirus (AmEPV) is a candidate insertion site for use as an invertebrate expression vector and a model for the study of the regulation of a highly expressed invertebrate poxvirus gene. The function of spheroidin is analogous to that of the polyhedrin protein of baculovirus, another occluded insect virus. The baculovirus polyhedrin gene has been used as an insertion site during the development of this virus as a generalized viral expression vector system (23). AmEPV was discovered in 1967 (29) and is the type species of genus B of EPVs (22). AmEPV is one of three known EPVs which will replicate in cultured insect cells (11, 17, 35). The major structural protein, spheroidin, has been reported (21) to be 110 kDa in size and to consist of a high percentage of charged and sulfur-containing amino acids. The AmEPV double-stranded DNA genome is about 225 kb long and unusually A+T rich (18.5% G+C) (20). Recently, a series of restriction maps for AmEPV were published (13). It has been suggested that spheroidin may be a member of the same poxvirus protein family which includes the cowpox virus A-type inclusion (ATI) and its homologs (25). The ATI gene has been identified and sequenced (6), and ATI is one of the most highly expressed vertebrate poxvirus gene products (24). The A+T-rich promoter region of the ATI gene could be one factor responsible for the high level of expression (6). Virions are embedded in ATIs much as they are in EPV OBs, but the cowpox virus inclusions are not crystalline. *

ing result, since both viruses belong to genus B, and it was expected that the spheroidins of various EPVs would be conserved, as are the polyhedrins of baculoviruses (31). MATERIALS AND METHODS Growth of virus in caterpillars and cell cultures. The AmEPV used in this study was obtained (13) from Robert R. Granados, Boyce Thompson Institute for Plant Research, Cornell University. The OBs used in this study were purified from infected Estigmene acrea caterpillars (13). Gypsy moth cell line IPLB-LD-652 was obtained from Ed Dougherty, Insect Pathology Laboratory, Agricultural Research Service, U.S. Department of Agriculture, Beltsville, Md. The replication of AmEPV in this cell line has been described previously (10). The cells were maintained at 26 to 28°C in EX-CELL 400 (JRH Biosciences, Lenexa, Kans.) supplemented with 10% fetal bovine serum, 100 U of penicillin, and 100 jig of streptomycin per ml. The AmEPV inoculum for cell culturing was from an AmEPV-infected, freeze-dried E. acrea larva stored at -70°C (13). The larva was crushed and macerated in 5 ml of EX-CELL 400 (with penicillin and streptomycin but without fetal bovine serum) to which 0.003 g of cysteine-HCl had been added to prevent melanization. The debris was pelleted at 200 x g for 5 min, and the

Corresponding author. 6516

VOL. 65, 1991

supernatant was passed through a 0.45-,um-pore-size filter. For the preparation of viral DNA, cells were infected with AmEPV by addition of the inoculum to a preconfluent monolayer of cells (about 0.1 to 1 PFU per cell), with occasional agitation of the dish during the first day. Infected cells were harvested 5 or 6 days postinfection. For routine virus quantitation, 1 ml of an appropriate virus dilution (prepared in unsupplemented EX-CELL 400) was added to a preconfluent monolayer of cells in a 60-mm culture dish, with intermittent agitation over a 5-h adsorption period at 26 to 28°C. The virus inoculum was removed, and 5 ml of a 0.75% SeaPlaque agarose (FMC BioProducts, Rockland, Maine) overlay prepared with 2x EX-CELL 400 and equilibrated at 37°C was added to the monolayer. Plaques were visualized after 5 days of incubation at 26°C by inspection with a stereomicroscope. SDS-PAGE, protein blotting, protein microsequencing, and glycosylation analysis. Sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) of proteins was performed (19) with a 4% acrylamide stacking gel and a 7.5% separating gel. The acrylamide used to prepare spheroidin for protein microsequencing was deionized with AGSO1X8 resin (Bio-Rad, Richmond, Calif.). The gels were polymerized overnight at 4°C. For sample preparation, 2 x Laemmli sample buffer consisting of 125 mM Tris-HCl (pH 6.8), 4% SDS (wt/vol), 10% P-mercaptoethanol (vol/vol), and 20% glycerol (vol/vol) was used. OB suspension samples were diluted 1:1 with 2 x Laemmli sample buffer and boiled for 5 min. Following electrophoretic separation, spheroidin in the unstained gel was transferred to an Immobilon (polyvinylidene difluoride [PVDF]) membrane with a Bio-Rad TransBlot apparatus at 90 V for 2 h in a buffer consisting of 10 mM morpholinepropanesulfonic acid (pH 6.0) and 20% methanol. Spheroidin was visualized on the PVDF membrane by Coomassie blue staining. Bands were cut from the membrane, and protein microsequencing was done with an Applied Biosystems gas-phase sequencer. Cyanogen bromide cleavage was performed on samples of spheroidin eluted from the PVDF membrane. Spheroidin within SDS-polyacrylamide gels was tested for glycosylation by periodic acid-Schiff staining (42). Preparation of AmEPV DNA, restriction enzymes, and gel electrophoresis. AmEPV DNA was prepared from infected IPLB-LD-652 cells by one of two methods. The first method used in situ digestion of infected cells embedded within agarose plugs, after which the released cellular and viral DNAs were separated by pulsed-field electrophoresis (BioRad CHEF-II-DR system). IPLB-LD-652 cells were infected with first-cell-culture-passage AmEPV. Infected cells were harvested 6 days postinfection by centrifugation at 200 x g for 5 min, rinsed, and resuspended in modified Hanks phosphate-buffered saline (PBS), which contained 15 g of glucose per liter but no Ca2" or Mg2+. For embedding of the infected cells in agarose plugs, 1% SeaPlaque GTG agarose (prepared in modified Hanks PBS and equilibrated at 37°C) was mixed 1:1 with infected cells to yield 5 x 106 cells per ml in 0.5% agarose. Digestion to release DNA was done by gentle shaking of the inserts in 1% Sarkosyl-0.5 M EDTA-1 mg of proteinase K per ml at 50°C for 2 days (37). The CHEF-1I-DR parameters for DNA separation were 180 V, a pulse ratio of 1, 50-s initial and 90-s final pulse times, and a run time of 20 to 25 h at 4°C. The separating gel was 1% SeaKem GTG agarose in 0.5x TBE buffer (33). Viral DNA bands were visualized by ethidium bromide staining and electroeluted (1). The recovered DNA was used for plasmid cloning following ethanol precipitation. Restriction endonu-

AmEPV SPHEROIDIN GENE

6517

cleases were purchased from New England BioLabs and Promega and used as suggested by the manufacturers. The second method of viral DNA preparation used the extracellular virus found in the infected-cell-culture supernatant. The supernatant from 10-day-postinfection cell cultures was clarified by centrifugation at 200 x g for 5 min. Virus was collected from the supernatant by centrifugation at 12,000 x g. Viral pellets were resuspended in 6 ml of lx TE (10 mM Tris-HCI-1 mM EDTA, pH 8.0). DNase I and RNase A (10- and 20-,ug/ml final concentrations, respectively) were added, and the mixture was incubated at 37°C for 30 min. The mixture was heated to 50°C for 15 min. SDS and proteinase K (1% and 200 ,ug/ml, respectively) were then added. The sample was incubated at 50°C overnight and extracted three times with buffer-saturated phenol and once with SEVAG (33). The DNA was ethanol precipitated and resuspended in lx TE (pH 8). Preparation of RNA and primer extension reactions. Six 150-mm dishes of subconfluent cells were prepared. The culture media were aspirated, and 2 ml of viral inoculum was added to each dish. The virus concentration was about 0.1 to 1 PFU per cell. The dishes were occasionally agitated during a 3-h adsorption period. At the end of this period, the cells were rinsed with 5 ml of modified PBS (described above), the media were replaced, and the infected cells were incubated for 72 h at 27°C. Total RNA from the infected cells was isolated by the guanidinium thiocyanate-cesium chloride

procedure (4). Primer extension reactions were carried out with primer RM165, a 35-base oligonucleotide (GTTCGAAACAAGTA TTTTCATCTTTTAAATAAATC) beginning and ending 100 and 65 bp downstream, respectively, of the initiating methionine codon found in the TAAATG motif. The primer was end labeled with [y-32P]ATP and T4 polynucleotide kinase and purified on a spun column (33). For annealing, 40 ,ug of total infected-cell RNA and 106 cpm of radiolabeled primer were coprecipitated with ethanol. The pellet was resuspended in 25 p,u of hybridization buffer [80% formamide, 40 mM piperazine-N,N'-bis(2-ethanesulfonic acid) (pH 6.4), 400 mM NaCl, 1 mM EDTA (pH 8.0)], denatured at 72°C for 15 min, and incubated at 30°C for 18 h. For primer extension, the RNA-primer hybrids were ethanol precipitated, resuspended, and used for five individual reactions. Each reaction contained 8 ,ug of total infected-cell RNA, 50 mM Tris-HCI (pH 8.3), 50 mM KCI, 10 mM dithiothreitol, 10 mM MgCl2, 4 U of avian myeloblastosis virus reverse transcriptase (Life Sciences), 8 U of RNasin (Promega), 0.25 mM each deoxynucleoside triphosphate (dNTP), and the appropriate dideoxynucleoside triphosphate (ddNTP), except for a control reaction, which contained no ddNTP. The dNTP/ddNTP ratios were 4:1, 5:1, 5:1, and 2:1, for the C, T, A, and G reactions, respectively. The reactions were carried out at 42°C for 30 min. One microliter of chase buffer (4 ,ud of 5 mM dNTP mixture and 1 RI of 20-U/,l reverse transcriptase) was added to each reaction mixture, which was then incubated for an additional 30 min at 42°C. Reaction products were separated on a sequencing gel (8% acrylamide containing 7 M urea) and visualized by autoradiography. DNA radiolabeling, hybridization, and autoradiography. DNA probes were radiolabeled with [o-32P]dCTP by the random oligonucleotide extension method (5). Specific oligonucleotide probes were end labeled with [y-32P]ATP and T4 polynucleotide kinase (33). Both types of probes were purified by passage through spun columns of Sephadex G-50. Southern transfer was done with Hybond-N (Amersham); the transferred DNA was fixed to the membrane by UV

6518

HALL AND MOYER

J. VIROL.

FIG. 1. AmEPV OBs. Shown is a scanning electron micrograph of OBs from sucrose gradient-purified extracts from infected E. acrea caterpillars. The long axis of the OB is about 4 ,um.

cross-linking. DNA-DNA hybridization (other than with oligonucleotide probes) was done at 65°C with BLOTTO (33) and was followed by two washes at room temperature with 0.3 M NaCl-0.06 M Tris (pH 8)-2 mM EDTA for 5 min, two washes for 15 min each at 65°C but with 0.4% SDS added, and two washes at room temperature with 0.03 M NaCI-0.06 M Tris (pH 8)-0.2 mM EDTA. Hybridization with oligonucleotide probes was done at 37 or 45°C with BLOTTO and was followed by the first wash only (see above). Bacterial strains, cloning, and electroporation. Cloning with plasmids and M13 was accomplished with Escherichia coli SURE (Stratagene, La Jolla, Calif.) and E. coli UT481, respectively. A BglII AmEPV DNA library was prepared by

cloning BglII-cut AmEPV DNA into BamHI-digested, phosphatase-treated plasmid pUC9. A DraI AmEPV DNA library was prepared by cloning into SmaI-digested, phosphatase-treated vector M13mpl9. Ligation and heat shock transformation procedures were as described previously (33). Transformation by electroporation was done with a Bio-Rad Gene Pulser following the instructions provided by the manufacturer. Minipreparations of plasmids were made by an alkaline lysis procedure, and preparations of M13 virus and DNA were made by standard procedures (33). PCRs. Inverse and regular polymerase chain reactions (PCRs) with custom-designed oligonucleotide primers were performed as described previously (18). The specific reac-

VOL. 65, 1991 1

2

AmEPV SPHEROIDIN GENE 3

4

"9i.JAL

0 I*

ow

5

6

7

8

9

AmEPV I C I II

10 11

116 kDa

r,

Hind III map:

B

10 kb

A

I

IIJIG

F

0.5kb

H

B

D

IH |

bP 4!504

B I

18bp

4811

1.88 kb

4891

6768 ,0

G4R

G2R

0.5 kb

0.7 kb

G3L

GSR

|

E

B B B ii

| 4.51 kb 931

GIL 1.4 kb

6519

_.

.....

G6L

3.0 kb

ISpheroidin

0.2 kb

FIG. 2. Purification of spheroidin from purified OBs by PAGE. A 10-,ul suspension containing approximately 2.3 x 105 OBs was solubilized by 1:1 dilution with 2x Laemmli sample buffer as described in Materials and Methods. This sample, after being heated, was loaded in lane 1. Lanes 2 through 10 contain successive 1:1 dilutions of the original OB suspension. Each OB dilution was then individually solubilized before electrophoretic analysis. Lane 11 contains a molecular mass standard. The arrow shows the position of the spheroidin protein.

tion conditions for 34 cycles were as follows: 30 s at 94°C for denaturation, 30 s at 37°C for annealing, and 1.5 min at 72°C for extension. Finally, the samples were incubated at 72°C for 8.5 min to complete extensions. The concentration of each primer was 1 pLM. The initial primer, RM58 (GA5GT7 GA6CC7GA5TA6GT), where 5 represents A or G, 6 represents C or T, and 7 represents A, G, C, or T), was prepared on the basis of the results obtained from protein microsequencing of the 6.2-kDa polypeptide (see below). Inverse PCR was used in conjunction with self-ligation of ClaIdigested AmEPV DNA to prepare a probe to identify clones containing a flanking sequence or to verify the absence of an intervening sequence between adjacent clones. The primer pair used in inverse PCR was RM82-RM83. The sequence of RM82 was TTTCAAATTAACTGGCAACC, and that of RM83 was GGGATGGA'1l-'ITAGATTGCG. The resulting 2.2-kb inverse PCR product was digested with ClaI, and a 1.7-kb fragment was gel purified, radiolabeled, and used as a probe to locate additional clones. Standard PCR with 400 ng of genomic AmEPV DNA as a template was used to prepare a probe to identify a 586-bp DraI clone from nitrocellulose filter replicas (plaque lifts) of an M13 shotgun library of DraI-cut AmEPV fragments. This was done to isolate a clone spanning a central unsequenced region of the spheroidin gene. The standard PCR primers used for this reaction were RM92 (GCCTGGTTGGGTAACACCTC) and RM118 (CTGCTAGATTATCTACTCCG). Sequencing. All DNA sequencing was done by the dideoxy chain termination method (34) with [a-35S]dATP and Sequenase (US Biochemical, Cleveland, Ohio). Double-stranded plasmid sequencing (15) was done with "miniprep" (33) DNA and 1 pmol of universal, reverse, or custom-designed oligonucleotide primer in each sequencing reaction. Standard sequencing reactions with Sequenase were carried out in accordance with the instructions of the supplier, US Biochemical. Nested exonuclease III deletions (16) were used to sequence plasmids pRH512, pRH85, and pRH87. Plasmid pRH512 contained a 4.51-kb BglII insert, whereas plasmids pRH85 and pRH87 contained identical

Vaccinia ORF 17

Capripoxvirus HM3

Vaccinla NTPase I

Amino Acid Homologies

FIG. 3. Location of the spheroidin gene within the AmEPV genome. The Hindlll map of AmEPV shows the location of the spheroidin gene and adjacent ORFs. The sizes of relevant BgIII fragments (in kilobases and base pairs) are shown under the expanded restriction map. The direction of transcription is shown by the arrows. Also shown are the sizes (in kilobases) of the relevant ORFs and restriction sites for HindlIl (H) and BglII (B). Relevant ClaI sites at bases 3485 and 6165 are not shown. The numbering of base pairs corresponds to that shown in Fig. 4.

1.88-kb BglII inserts in opposite orientations. Deletions were made from the universal primer end. For making these deletions, the DNA was cut with EcoRI, filled in with a-thiophosphate dNTPs (28) by use of the Klenow fragment of E. coli DNA polymerase, cut with SmaI, and treated with exonuclease III. Samples were removed every 30 s, religated, and used to transform E. coli SURE cells by electroporation. Sequencing reactions were carried out with the universal primer.

Single-stranded M13 shotgun sequencing was done on plasmid pRH512, which was later shown to contain the 5' half of the spheroidin gene. Plasmid pRH512 was sonicated to produce random fragments, repaired with bacteriophage T4 DNA polymerase, and cloned into SmaI-cut M13mpl9 vector. Plaque lifts were screened with a radiolabeled probe prepared from the 4.5-kb insert found in pRH512 to identify appropriate clones for shotgun sequencing. M13 sequencing was also done on a 586-bp DraI clone covering an area to the right of pRH512. Portions of a 1.7-kb inverse PCR fragment from a Clal fragment self-ligation were sequenced with various customdesigned oligonucleotide primers. Sequenase, 5 pmol of each primer, and 10 to 50 ng of template were used. Prior to being sequenced, PCR products were chloroform extracted and purified on spun columns (33) of Sephacryl S-400. The DNA sequence was assembled and aligned, and consensus sequence was produced (38). Both strands were completely sequenced; the PCR product sequence was confirmed by conventional sequence. Data base searches. A search of the National Biomedical Research Foundation protein data base (7) (version 26) for homology with the deduced amino acid sequences of AmEPV spheroidin and other proteins was made with FASTA (27). TFASTA searches of GenBank (version 65)

6520

J. VIROL.

HALL AND MOYER

*

I

N

S

L

E

S

N

E

K

N

D

1

AGATCTGATGTTCTATATATAGTACAAATTTGTATGATTAATTGATATTTTAAAATTCAAGATATTAAATATTAGATTCTAAACTATTCTTCTCATTATC

101

AATATAACTATCATAATCATTTTTTATTTTACTACATACATTCATAATTCTATTACTATTTTTTTTATACATATCTATTAATTCCATAAACTTTTTATTT

201

TTTATATTAAATATTTCTAATGTATTTTTAAATTCGTCAATACTATTAATATCATATCTAGAAATAAATAATGCACCTCTATAACTACTAGCCAATAAAT

301

CACCAATAAAACTCATAGAATAATATAATTTTTTAAATTCAAATTTAGATTTTATGTTGAAATAAACTATATAATATAAAAATATTATATTAAACATACC

401

ACAATCGGGACTATCATATTGTAATTCAAAAGTATTAAAAAAGTAATAATTTACATTTTTAAATATATCATTTAAATATTCTGATAGTACATCAATGTAT

501

AAATAAGCATAATTAGTATTAGGAGTACTATTGTAGTGTTTATGGCTTTTTATAGTCATATCAGATTCAATAAACATATATTTTTTATTTTGTTTTATAA

601

GTTCTGGTATATAACCACTACTATTAAAAAAGTATGCAGCTTTTTTATCTTTATCAAAGTGTTTATCTATTACGCAACAAGTAAAATGATCATTATAAAT

701

TATAGGAAACATAAAAAATCTTTTTTTATCATTCATTAAAAAAAATTTTACTCTATCTTCAAGTTTATAGCATCTCATAGATGAAGCTACTGTAGCAATA

801

TTTTTATCAGTTTTTTCAAATAAAATCAAATGAAAATAATCATAATCTGTATTAATCATAGTTAATGGATATATACAATTATATATATCTCCCGAACTTA

901

ACCATGTAGATTTATCATGTTTTCTTGGGTAAGCTTTAGGTTTAGGATTAAATCCCAAAGGCGGTATTCCTATTTGAGCATCCAAATCATCATAAATTGT

1001

GGCAAATGTAGAAAAATCTCTTGTTTTGGATAATTCTGATTTTAGAAAAGACTTTCTCATATATACTAATGGAATGCCTTTATATTTTTTAGATGTAATA

1101

AAAGTATTAATATTTATATTTTTATCTTGTAAATATTTTTTTATAGTCCAAAATAGAAAAAATTTTCTTTTAATATTATTTTCAAAATTAATATTATTAA

1201

TATGATTTGGATCTAAAACTAATTCATTATATAATATTTCCAAGTATTTTATAGGTATAAATGTTACTTTACCTCTTGTTTCATCATCATCATCTATTTT

1301

TTCTAATATAGCTATATTTGCATTAGTATTATATTTAATAGGATTTATAAAATATACCATATTATCTATTTTACTAAAAAATAACATAGACATAAAATTA

I

Y

K

I

F

Y

L

L

E

N

D

K

T

I

N P

N

H E

L

I

D

S

A

N

K

M

P

E

T

N

N

Y

P

F

I

I

F

V

I

N

G S

E

M

L

F

G2R > M F L

Y

V

I

M

F

N

D

S

I

N

D

T

T

I

D

I

S

N

D

E

T F

F

S

Y

K

I

S

D

K

A

G

D

Y

K N

R

K

< GCL N M

N

G

I

K

D

N

K

R

T

I

L

I

Y

T

D

I

N

V

I

D

A

Q

P

Y

K

D

A

Y

I

Q

H

S

N

D

N

F

S

C

I

L

K

F

M

V

Y

K

I

R

F

I

P

V

Y

F

G

I

M

I

Y

T

M G

F

V

K

L

L

N

L

K

C

R

I

S

Y

C

C

P

P

M

L

Y

L

P

R

F

K

N

R

W

T

L

K

S

T

L

I

P

I

E

K

F

L

K

I

Y

K

K

L

N

K

M

G

F

N

K

L

M

V

I

E

A

S

S

F

Y

F

I

Y

L

L

I

D

R

Y

N

E

K

E

I

N

S

H

D

R

T

P

K

S

Y

Y

N

N

L

L

D

F

D

V

D

M

G

Y

D

I

L E M F K K N

I

A

I

V

F

D

L

F

Y

K

T

K

K

Y

P

E

F

D

K

L

Q

Y

A

S

D

L

Y

K

K

V

I

E

N

L

A

I

G

I

N

T

R

F

F

I

D

K

F

N

V

K

S

K

L

N

Y

H

A

M

H

P

R

A

N

L

I

K

D

D

K

L

H

F

I

F

D

S

K

K

H

Y

F

Y

I

S

N

I

K

S

R

Y

M

K Y

K

N

S

D

I

K

F

N

R

N

F

F

Y

I

S

E

N

N

F

N

R

E

K

T

F

K

S

F

F

F

T

T

W A

M

S

S

S

T

P

I

F

T

F

M

N

D

K

K

E

V

E

F

L

L

N

G

Y

F

P

Q

T

K

Y

C

S

K

N

Y

Y

N

I

P

I

L

Y

T

S

D

S

A

L

M

I

K

N

E

S

P

D

D

I

F

I

C

Y

D

N

G

D

1401

S

K

N

F

N

F

T

ATACCAGATTCTGGCATTTTTAAATTTTTATTTGGAAATCTTCTAATTTTATTATTCATTATTTATTTAATAAATGTTTCTAGTTTATTTCAATACATTT L

I

I

L

L

I

F

G II

I

G

Y

I

T

L

F

F

V

N

I

D

F

L

N

I

N

N

K

I

Y

I

S

L

1501

TTAATAATAATTTTATTATTTGGTATTATAGGTATTTATATATTAACATTTGTGTTTAATATAGATTTTTTAATAAATAATAATAAAATATATATATTAT

1601

CATATAACGCAACTAATATAAACAATATAAATAATTTAAATTTATACGATTATTCAGATATTATATTTTTGACAAATTTTAACATAAATAATAATCTTTT

1701

AGTAACACAAGCTAATAATTTACAAGATATACCAATATTTAATGTAAATAATATTATATCTAATCAATATAATTTTTATTCAGCGTCTAGTAATAATGTA

Y

V

N

N

T

A

T

Q

N

A

I LL

I

N

N

I

Q

L

R

L

G

N

N

K

N

I

D

I

L

N

I

P

L N

T

L

N

N

F

N

Y

D

V

N

P

R N

Y

N

F

D

S

I

L

I

S

F

L

I

I

N

F

Y

Q

R N

T

L

N

S

T

N

F

N

F

Y

L A

I

S

I

N

A

S

F

V

N

N

N

S

N

N

L

L

N

N

V

E T

1801

AATATATTATTAGGATTAAGAAAAACATTAAATATAAATAGAAATCCATTTTTATTATTTAGAAATACATCTCTAGCTATAGTTTTCAATAATAATGAAA

1901

F H C Y I S S N Q N S D V L D I V S H I E F M K S R Y N K Y V I I CTTTTCACTGTTATATAAGTTCAAATCAAAATAGTGATGTATTAGATATAGTATCACATATAGAATTTATGAAATCTAGATATAATAAATATGTAATTAT

2001

AGGAGAAATACCCGTAAATAATAATATATCTATTAATAATATATTAAATAATTTTGCTATTATAACTAATGTGAGATTAATAGATAAATATAACTCTATA

2 1 01

ATATCATTTTTAAATATCAACGTAGGAACACTTTTTGTCATAAATCCATAATATTTAGTAATAATCACTAACATATTTTTTATTAAAATGAATAAAATAT

E

G

I

S

P

I

F

L

N

V

N

I

N

N

N

V

I

G

S

T

N

I

L

F

N

V

I

I

N

L

N

P

N

F

A

I

I

T

N

V

R

L

I

D

K

Y

N

S

I

*

* K K K K S K P I I K G E L R K D G W V ATATTGTTATTGTCAATATTTTATATCATTTTACAGTCTTATTTTTTTTTTTTGCTTTTAGGTATAATTTTACCTTCTAAACGTTTATCTCCCCAAACAT FIG. 4. Sequence of a 6.8-kb fragment of AmEPV DNA showing the deduced amino acids of spheroidin and adjacent upstream and downstream ORFs. The RM58 binding site and the spheroidin peptide sequences corresponding to those derived from direct protein microsequencing are underlined.

2201

VOL. 65, 1991

AmEPV SPHEROIDIN GENE

D

T

V

S

P

K

N

E

S

T

N

V

Y

D

P

A

N

A

A

N

T

L

D

G

Y

Y

G

P

Y

R

D

N

K

6521

F

2301

CTACAGTAGATGGTTTATTAGATTCTGTGTTATACACATCTGCTGGATTTGCGGCATTTGTATCCAAACCATAATATCCAGGTCTATAATTATCTTTAAA

2401

< G3L V Q S Q S V E E T K L N N F Y G L N N K K S SS M AACTTGGGATTGAGATACTTCTTCAGTTTTTAAATTATTAAAATATCCAAGATTATTTTTTTTTGATGAAGACATAATTGATATTATAATACTTTATAGA

G4R > M S I

2501

F I Y Y I F N N R F Y I Y K R M N T V Q I L V V I L I T T A TATGTCAATATTTATCTACTATATTTTCAACAATAGATTTTATATATATAAAAGAATGAATACTGTACAAATTTTAGTTGTCATATTAATAACAACAGCA

2601

L S F L V F Q L W Y Y A E N Y E Y I L R Y N D T Y S N L Q F A R S A TTATCTTTTCTAGTTTTTCAATTATGGTATTATGCCGAAAATTACGAATATATATTAAGATATAATGATACATATTCAAATTTACAATTTGCGAGAAGCG

2701

N I N F D D L T V F D P N D N V F N V E E K W R C A S T N N N I F CAAATATAAATTTTGATGATTTAACTGTTTTTGATCCCAACGATAATGTTTTTAATGTTGAAGAAAAATGGCGCTGTGCTTCAACTAATAATAATATATT

2801

Y A V S T F G F L S T E S T G I N L T Y T N S R D C I I D L F S R TTATGCAGTTTCAACTTTTGGATTTTTAAGTACAGAAAGTACTGGTATTAATTTAACATATACAAATTCTAGAGATTGTATTATAGATTTATTTTCTAGA

2901

I I K I V Y D P C T V E T S N D C R L L R L L M A N T S * ATTATAAAAATAGTATATGATCCTTGTACTGTCGAAACATCTAACGATTGTAGATTATTAAGATTATTGATGGCCAATACATCATAAATACATTATAATA

(Spheroidin)

3001

G5R > M S N V P L A TTATTATAATATCAATCATAATTTTTATATATATTTTATCTAAAAGGACTTTTTATTTTTTATATATTAATAATAATAAATGAGTAACGTACCTTTAGCA

3101

T K T I R K L S N R K Y E I K I Y L K D E N T C F E R V V D M V V P ACCAAAACAATAAGAAAATTATCAAATCGAAAATATGAAATAAAGATTTATTTAAAAGATGAAAATACTTGTTTCGAACGTGTAGTAGATATGGTAGTTC

3201

L Y D V C N E T S G V T L E S C S P N I E V I E L D N T H V R I K CATTATATGATGTGTGTAATGAAACTTCTGGTGTTACTTTAGAATCATGTAGTCCAAATATAGAAGTAATTGAATTAGACAATACTCATGTTAGAATCAA

3301

V H G D T L K E M C F E L L F P C N V N E A Q V W K Y V S R L L L AGTTCACGGCGATACATTAAAAGAAATGTGTTTTGAATTATTGTTCCCGTGTAATGTAAACGAAGCCCAAGTATGGAAATATGTAAGTCGATTATTGCTA

3401

D N V S H N D V K Y K L A N F R L T L N G K H L K L K E I D Q P L F GATAATGTATCACATAATGACGTAAAATATAAATTAGCTAATTTTAGACTGACTCTTAATGGAAAACATTTAAAATTAAAAGAAATCGATCAACCGCTAT

3501

I Y F V D D L G N Y G L I T K E N I Q N N N L Q V N K D A S F I T TTATTTATTTTGTCGATGATTTGGGAAATTATGGATTAATTACTAAGGAAAATATTCAAAATAATAATTTACAAGTTAACAAAGATGCATCATTTATTAC

3601

I F P Q Y A Y I C L G R K V Y L N E K V T F D V T T D A T N I T L TATATTTCCACAATATGCGTATATTTGTTTAGGTAGAAAAGTATATTTAAATGAAAAAGTAACTTTTGATGTAACTACAGATGCAACTAATATTACTTTA

3701

D F N K S V N I A V S F L D I Y Y E V N N N E Q K D L L K D L L K R GATTTTAATAAATCTGTTAATATCGCAGTATCATTCCTTGATATATATTACGAAGTTAATAATAATGAACAAAAAGATTTATTAAAAGATTTACTTAAGA

3801

Y G E F E V Y N A D T G L I Y A K N L S I K N Y D T V I Q V E R L GATACGGTGAATTTGAAGTCTATAACGCAGATACTGGATTAATTTATGCTAAAAATCTAAGTATTAAAAATTATGATACTGTGATTCAAGTAGAAAGGTT P

3901

V

N

L

K

V

R

A

Y

T

K

D

E

N

G

R

N

L

C

L

M

K

T

T

S

S

T E

V

D

E

E

Y

GCCAGTTAATTTGAAAGTTAGAGCATATACTAAGGATGAAAATGGTCGCAATCTATGTTTGATGAAAATAACATCTAGTACAGAAGTAGACCCCGAGTAT RM58 site

4001

T S N N A L L G T L R V Y K K F D K S H L K I V M H N R G S G N V GTAACTAGTAATAATGCTTTATTGGGTACGCTCAGAGTATATAAAAAGTTTGATAAATCTCATTTAAAAATTGTAATGCATAACAGAGGAAGTGGTAATG

4101

F P L R S L Y L E L S N V K G Y P V K A S D T S R L D V G I Y K L TATTTCCATTAAGATCATTATATCTGGAATTGTCTAATGTAAAAGGATATCCAGTTAAAGCATCTGATACTTCGAGATTAGATGTTGGTATTTACAAATT

4201

N K I Y V D N D E N K I I L E E I E A E Y R C G R Q V F H E R V K AAATAAAATTTATGTAGATAACGACGAAAATAAAATTATATTGGAAGAAATTGAAGCAGAATATAGATGCGGAAGACAAGTATTCCACGAACGTGTAAAA

4301

L N K H Q C K Y T P K C P F Q F V V N S P D T T I H L Y G I S N V C CTTAATAAACACCAATGTAAATATACTCCCAAATGTCCATTCCAATTTGTTGTAAACAGCCCAGATACTACGATTCACTTATATGGTATTTCTAATGTTT

4401

L K P K V P K N L R L W G W I L D C D T S R F I K H M A D G S D D GTTTAAAACCTAAAGTACCCAAAAATTTAAGACTTTGGGGATGGATTTTAGATTGCGATACTTCTAGATTTATTAAACATATGGCTGATGGATCTGATGA

V

FIG. 4-Continued.

6522

HALL AND MOYER

4501

J. VIROL.

L D L D V R L N R N D I C L K Q A I K Q H Y T N V I I L E Y A N T TTTAGATCTTGACGTTAGGCTTAATAGAAATGATATATGTTTAAAACAAGCCATAAAACAACATTATACTAATGTAATTATATTAGAGTACGCAAATACA

4601

P N C T L S L G N N R F N N V F D M N D N K T I S E Y T N F T K S TATCCAAATTGCACATTATCATTGGGTAATAATAGATTTAATAATGTATTTGATATGAATGATAACAAAACTATATCTGAGTATACTAACTTTACAAAAA

4701

R Q D L N N M S C I L G I N I G N S V N I S S L P G W V T P H E A GTAGACAAGACCTTAATAACATGTCATGTATATTAGGAATAAACATAGGTAATTCCGTAAATATTAGTAGTTTGCCTGGTTGGGTAACACCTCACGAAGC

4801

K I L R S G C A R V R E F C K S F C D L S N K R F Y A M A R D L V TAAAATTCTAAGATCTGGTTGTGCTAGAGTTAGAGAATTTTGTAAATCATTCTGTGATCTTTCTAATAAGAGATTCTATGCTATGGCTAGAGATCTCGTA

4901

S L T. F M C N Y V N I E I N E A V C E Y P G Y V I L F A R A I K V I AGTTTACTATTTATGTGTAACTATGTTAATATTGAAATTAACGAAGCAGTATGCGAATATCCTGGATATGTCATATTATTCGCAAGAGCTATTAAAGTAA

5001

TTAATGATTTATTATTAATTAACGGAGTAGATAATCTAGCAGGATATTCAATTTCCTTACCTATACATTATGGATCTACTGAAAAGACTCTACCAAATGA

5101

K Y G G V D K K F K Y L F L K N K L K D L M R D A D F V Q P P L Y AAAGTATGGTGGTGTTGATAAGAAATTTAAATATCTATTCTTAAAGAATAAACTAAAAGATTTAATGCGTGATGCTGATTTTGTCCAACCTCCATTATAT

5201

I S T Y F R T L L D A P P T D N Y E K Y L V D S S V Q S Q D V L Q G ATTTCTACTTACTTTAGAACTTTATTGGATGCTCCACCAACTGATAATTATGAAAAATATTTGGTTGATTCGTCCGTACAATCACAAGATGTTCTACAGG

5301

GTCTGTTGAATACATGTAATACTATTGATACTAATGCTAGAGTTGCATCAAGTGTTATTGGATATGTTTATGAACCATGCGGAACATCAGAACATAAAAT

5401

TGGTTCAGAAGCATTGTGTAAAATGGCTAAAGAAGCATCTAGATTAGGAAATCTAGGTTTAGTAAATCGTATTAATGAAAGTAATTACAACAAATGTAAT

5501

K Y G Y R G V Y E N N K L K T K Y Y R E I F D C N P N N N N E L I S AAATATGGTTATAGAGGAGTATACGAAAATAACAAACTAAAAACAAAATATTATAGAGAAATATTTGATTGTAATCCTAATAATAATAATGAATTAATAT

5601

CCAGATATGGATATAGAATAATGGATTTACATAAAATTGGAGAAATTTTTGCAAATTACGATGAAAGTGAATCTCCTTGCGAACGAAGATGTCATTACTT

5701

GGAAGATAGAGGTCTTTTATATGGTCCTGAATATGTACATCACAGATATCAAGAATCATGTACGCCTAATACGTTTGGAAATAACACAAATTGTGTAACA

5801

AGAAATGGTGAACAACACGTATACGAAAATAGTTGTGGAGATAATGCAACATGTGGAAGAAGAACAGGATATGGAAGAAGAAGTAGGGATGAATGGAATG

5901

ACTATAGAAAACCCCACGTTTATGACAATTGTGCCGATGCAAATAGTTCATCTTCAGATAGCTGTTCAGACAGTAGTAGTAGTAGTGAATCTGAATCTGA

6001

TTCAGATGGATGTTGCGACACAGATGCTAGTTTAGATTCTGATATTGAAAATTGTTATCAAAATCCATCAAAATGTGATGCAGGATGCTAAATGAAATTT

6101

AATATTATATAATATTAACTTACAAGTTATAAAAATCATTAAAATGATTTTTTAAAATGATATTATCGATAGTTGTGATAATGTGCTCTTTTATTTTATT

6201

* E Y Y D D A I AATTGCGATGATTATAATATTATCTTTTAGATATATTTAATATTAATTATAAATCGACTGACAATAATATTTATTCCTATTCATAATAATCATCTGCTAT

6301

ATATATTAATGTATCATTCTCTATTATAAATATAGGTATATTGTCTTTATCAATCATTAATTTTGCTACAGCTGTATTATCTTTATATACTATATTTGTG

6401

D K N L L G K L I T A R D Y D K C Y S I P Y L K I N I I V N S V N TCTTTGTTTAATAAACCTTTTAATATAGTGGCTCTATCATAATCTTTACAATATGATATGGGATATAATTTTATATTAATAATAACATTAGATACGTTCA

Y

N

L

D

L

L

G

E

D

E R

Y

G

D

S

Y

L

I

E

M

C

C

T

D

D

M R

K

V

N

T

E

Y

N

D

E

K

A

I

I

R

C S

I

P

D

I

I

F

A

N

S

E

D

N

K

D

I

E N E A P

I

N

I

E

P

P

D S

C

N

E

D

A

L K A V A T N D

E

Y K Y

I

T N H I

G

E

E C

S

Y

W

L

T N

D

S D

E

*

K Y V

S E

N

C V

D

S

I

C

H

N

E

K

K

C

T R

S

H

N

R

R

N

P

E

Y

S

S

L

S

N

R

S

T

T

N

R

S K

S

G

G

G

C

F

T

K

E

C

E

N

Y

S

T

P

S

N G

C

Q

M

S

S

E

R

T

S

Y

P

R

D Y

C

E

T

R

S

N

G

Y

V

V

C

G

S

Y

D

S

C

S

Y

E

T

S I

N

Q

Y

N

A

D

T D

I

D

D

A L

F

G

C

S

R

N

L

H

I

G

G

L

A

P

I

V

N

F

L

S

S

G

I

H

H

S

L

E

I

S

A

R

G

V

Y

V

S

I

Y

N

D

T

E

R

A

K

G

A

A

E

H

P

Y

N

K

L

D

L

N

T

A

G

V

H

P

M

D

D

M

Y

H

V

I

K

L

Q

K

T

I

L

E

R

R

G

G

N

C

G

N

N

L

Y

R

I

C

A

G

Y

L

T

N

S

R

L

N T

I

I

S

S S

6501

TTTCTTTCATTCTAGTTTTACGTATTGTGTCAAAAATTATTTCATTTTCTGCTGGTTCTATATATTTATATGTGTTATGAATAGATTCGATAGATGATGA

6601

TTTTAATAAATCAAATATAACATTTATTTTACCTTGTTTATCTTTTATAATATCTAATATTTCTTTATCTACAGATTTTCTGTTGTTGGTATATGATATT

K

L

6701

L

F

L

H

D

V N

F

I

V Y

V

N

R Y

I

K

G

E Q P

Q

K

D

K

I

I

D

L

I

L D K H S N L R I

E

K D V S K R N N < G6L

S R

AAAAAATGAACGTTAACATATCTATATTCTTGTGGTAAATCTTTATGAGAATTTAATCTTATAGATCT FIG. 4-Continued.

T Y S

I

VOL. 65, 1991

AmEPV SPHEROIDIN GENE

TAAATACATTATAATATTATTATAATATCAATCATA

NGATC

6523

\ Translational Stop (Upstream ORF)

AITTTTAThTATATTTTATCTAAAAGGACIiTTITKI >

TTTTTAT

Early Transcription Termination Signals TATTAATAATAAITAAATGI

| Translational Stops

|

Consensus

Poxvirus

Late Transcriptional

Start Signal

FIG. 5. Ninety-two-base-pair sequence upstream of the predicted spheroidin gene promoter. Both potential translation and early poxvirus transcription termination signals are indicated. Only 7 G or C residues in the 92-bp sequence upstream of the spheroidin initiating ATG are present. The intercistronic region between the putative translation termination codon of the upstream ORF (G4R) and the initiating ATG of spheroidin is shown. Only potential translational stops nearest the ATG of spheroidin have been underlined.

and a local custom data base containing the entire vaccinia virus (Copenhagen strain) sequence (9) were also performed. Nucleotide sequence accession number. The sequence of the 6,768-bp AmEPV fragment reported in this article will appear in the GenBank data base under accession number M77182. RESULTS

Mapping of the spheroidin gene and sequencing strategy. To localize the spheroidin gene, we solubilized a purified preparation of OBs from infected caterpillars (Fig. 1) and subjected it to SDS-PAGE. The spheroidin protein (-113 kDa) was the predominant protein of the purified OBs (Fig. 2). Several lanes of an OB protein preparation (equivalent to lane 3 in Fig. 2) were electrophoretically separated and blotted to a PVDF membrane. The region of the PVDF membrane containing spheroidin was excised, and direct protein microsequencing was attempted. Microsequencing of the intact protein was unsuccessful, presumably because the N terminus of the protein was blocked. The protein was then treated with cyanogen bromide (CNBr) to generate internal peptide fragments for sequencing. Major polypeptides of 15, 9, 8, and 6.2 kDa were produced. A reliable amino acid sequence was obtained from the 9-, 8-, and 6.2-kDa polypeptides. The 8- and 9-kDa polypeptides represented overlapping partial CNBr cleavage products which together yielded the longest continuous amino acid sequence: Met-Ala-(Asn or Arg)-Asp-Leu-Val-Ser-Leu-LeuPhe-Met-(Asn or Arg)-(?)-Tyr-Val-(Asn?)-Ile-Glu-Ile-AsnGlu-Ala-Val-(?)-(Glu?). The amino acid sequence obtained from the 6.2-kDa fragment was Met-Lys-Ile-Thr-Ser-SerThr-Glu-Val-Asp-Pro-Glu-Tyr-Val-(Thr or Ile)-Ser-(Asn?). A partial sequence for the 15-kDa fragment was also obtained: (Asn?)-Ala-Leu-Phe-(Phe?)-(Asn?)-Val-Phe. All sequences were ultimately located within the spheroidin gene sequence (see Fig. 4, underlining). The sequence derived from the 6.2-kDa CNBr fragment was used to design a degenerate oligonucleotide to use as a hybridization probe to locate the spheroidin gene. The sequence of this probe (RM58) was GA5GT7GA6CC7GA5TA6GT, where 5 represents A or G, 6 represents C or T, and 7 represents A, G, C, or T. This probe hybridized to a 4.4-kb BglII fragment and the EcoRI-D fragment of AmEPV DNA. The probe was then used to select a BglII clone from a BglII library of AmEPV

Poly A

T

TG

AAG

X_

G GA cC CT A

|*T A

TT

AAA Ac AA FIG. 6. Primer extension analysis of the spheroidin transcript. The primer, RM165, was derived from bases 3179 to 3145 (Fig. 4) located 65 bp downstream of the initiating methionine codon found in the TAAATG motif at the 5' terminus of the spheroidin gene. Lanes G, A, T, and C show primer extension products from reactions which contained the appropriate ddNTP. Lane N is a control primer extension reaction in which no ddNTPs were present. Shown on the right is the deduced sequence. The beginning of the poly(A) tract is indicated.

DNA cloned into BamHI-digested plasmid pUC9. A 4.51-kb clone, pRH512 (bases 0 to 4504; Fig. 3), was isolated, radiolabeled, and hybridized back to various AmEPV genomic digests. Hybridization to the BamHI-A, EcoRI-D, HindIII-G and -J, PstI-A, and XhoI-B fragments of AmEPV DNA was observed (14). Oligonucleotide RM58 was then used as a primer for double-stranded plasmid sequencing of pRH512 to generate some initial DNA sequence. When a primer complementary to that sequence was prepared and used to sequence back through the RM58 binding site (bases 3983 to 4002), the sequence generated, when translated, yielded the amino acid sequence generated from microsequencing the 6.2-kDa CNBr polypeptide fragment. These results led us to conclude that clone pRH512 contained at least part of the spheroidin gene. Additional sequencing of pRH512 revealed that there was a single Hindlll site at base 931 (Fig. 3) and that the 3' end of the spheroidin open reading frame (ORF) was truncated. The technique of inverse PCR was then used to isolate adjacent 3' DNA clones. A 1.7-kb PCR fragment generated by inverse PCR amplification of ClaI-digested, self-ligated DNA was sequenced with RM83 (see Materials and Methods) as a primer. The relevant ClaI sites are at positions 3485 and 6165 (Fig. 3 and 4). This fragment was used as a probe to isolate pRH827 (307 bp), pRH85 (1.88 kb), and pRH87 (1.88 kb) from the BglII fragment library. Clones pRH85 and pRH87 represent the same cloned DNA fragment in opposite orientations. Sequencing of the inverse PCR product with custom-designed primers confirmed the expected sequence relationship but also revealed a missing 80 bp between pRH827 and pRH85. This 80-bp DNA fragment was provided by a 586-bp DraI

6524

HALL AND MOYER 10

GlL

20

30

50

40

60

:

J. VIROL. 90

80

70

100

MNNKIRRFPNKNLKMPESGINFMSMLFFSKIDNMVYFINPIKYNTNANIAILEKIDDDDETRGKVTFIPIKYLEILYNELVLDPNHINNINFENNIKRKF

VV I7

MERYTDLVISKIPELGFTNLLCHIYSLAGLCSNIDVSKFLTNCNGYVVEKYDKSTTAGKVSCIPIGMMLELVESGHLSRPNSSD 1220

110

120

1230

130

1250

1240

140

150

1260

160

1280

1270

180

170

1290

190

200

GlL

FLFWTIKKYLQDKNININTFITSKKYKGIPLVYMRKSFLKSELSKTRDFSTFATIYDDLDAQIGIPPLGFNPKPKAYPRKHDKSTWLSSGDIYNCIYPLT

VV I7

ELDQKKELTDELKTRYHSIYDVFELPTSIPLAYFFKPRLREKVSKAIDFSQMDLKIDDL-SRKGIHT-GENPKVVKMKIEPERGAWMSNRSIKNLVSQFA 1300 1310 1320 1330 1340 1350 1360 1370 1380 1390

GlL

210 220 230 240 250 260 270 280 290 M-INTDY-DYFHLILFEK---TDKNIATVASSMRCYKLEDRVKFFLMNDKKRFFMFPIIYNDHFTCCVIDKHFDKDKKAAYFFNSSGYIPELIKQNKKYM

VV I7

YGSEVDYIGQFDMRFLNSLAIHEKFDAFMNKHILSYILKDKIK ---- SSTSRFVMFGFCYLSHWKCVI ---- YDKKQCLVSFYDSGGNIPTEFHHYNNFY 1400

1410

300

1420

310

1440

1430

320

1450 350

340

330

360

1460

1470

1480

370

380

390

GlL

FIESDMTIKSHKHYNSTPNTNYAYLYIDVLSEYLNDIF-KNVNYYFFNTFELQYDSPDCGMFNIIFLYYIVYFNIKSKFEFKKLYYSMSFIGDLLASSYR

VV I7

FYSFSDGFNTNHKHSVLDNTNCD --- IDVLFRFFECTFGAKIGCINVEVNQLL--ESECGMFISLFMILCTRTPPKSFKSLKKVYTFFKFLADKKMTLFK 1490 1500 1510 1520 1530 1540 1550 1560 1570

:

GlL

.

.

....

............ .. :... :x::: ... .:: ::::

:

.

400 410 420 430 440 450 460 GALFISRYDINSIDEFKNTLEIFNIKNKKFMELIDMYKKNSNRIMNVCSKIKNDYDSYIDNEKNSLESNI* *X*:.

*Y

..

VV I7

....X::.

....

.:.

.

.

..

.

..

.

:

:

:.

..

.

..

SILF----NLHDL-----SLDITETDNAGLKEYKRMEKWTKKSINVICDKLTTKLNRIVNDDEXLC* 1600 1590 1610 1620 1630

FIG. 7. Comparative homologies of the AmEPV GlL ORF and the 17 ORF from the vaccinia virus HindIII-I fragment (VV 17). The region of homology corresponds to 23.9% identity over 335 amino acids, as determined by TFASTA. Amino acid identities (:), conservative amino acid substitutions (.), spaces inserted for optimal alignment (-), stop codons (*), and boundaries of the initial homology recognition sequence (X) are shown.

fragment which extended from bases 4543 to 5128 (Fig. 3 and 4) cloned into M13. The orientation of the spheroidin ORF on the physical map is shown in Fig. 3. It is interesting to note that the 1.7-kb inverse PCR fragment only hybridized to the AmEPV HindIII-G fragment (14). The amino acid sequence derived from the 8- and 9-kDa overlapping cyanogen bromidegenerated polypeptides is found from nucleotide positions 4883 to 4957, that derived from the 6.2-kDa polypeptide is found from nucleotides 3962 to 4012, and that derived from the 15-kDa polypeptide is found from nucleotides 4628 to 4651 (underlined in Fig. 4). Therefore, all sequences obtained from protein microsequencing were ultimately found to lie within the putative spheroidin ORF. Spheroidin gene sequence. A combination of M13 shotgun sequencing with standard universal and reverse M13 primers as well as custom-designed primers and exonuclease IIInested deletions was used to extend the sequence 5' and 3' to the spheroidin gene (Fig. 3). The spheroidin ORF (GSR) was initially identified by sequencing back through the RM58 oligonucleotide primer binding region as described above. 10

20

30

40

Examination of the AmEPV spheroidin gene sequence (ORF GSR) revealed a potential ORF of 3.0 kb capable of encoding 1,003 amino acids or a protein of about 115 kDa (Fig. 4), a size expected from earlier studies (21) and the data in Fig. 2. The ORF consists of 29% G+C, in contrast to the 18.5% reported for the entire AmEPV genome (20). Inspection of the 92 bases upstream of the initiating ATG revealed only 7 G or C residues (Fig. 5). We also detected the presence of known vertebrate poxvirus regulatory sequences within the 92 bp 5' of the spheroidin ORF. Included are three TTTT TNT early gene termination signals and TAAATG, which presumably represents a late transcription start signal used to initiate transcription and translation of the spheroidin gene. Several adjacent translation termination codons are also present within the 92 bp upstream of the spheroidin ORF (Fig. 4 and 5). Spheroidin gene transcription. The start site for spheroidin gene transcription was determined. A primer complementary to the spheroidin gene sequence beginning 65 bp downstream of the predicted initiating methionine was prepared and used for a series of primer extension reactions. The 50

60

70

80

90

100

G4R

MSIFIYYIFNNRFYIYKRMNTVQILVVILITTALSFLVFQLWYYAENYEYILRYNDTYSNLQFARSANINFDDLTVFDPNDNVFNVEEKWRCASTNNNIF

HM3

MNAITIFFIILSTVAVCIIIFQLYSIYLNYDNIKEFNSAHSAFEFSKSVNTLSLDRTIKDPNDDIYDPKQKWRCVKLDND-Y

:.~~~~~~~~~~~~~~~~.

130

110

G4R HM3

120

140

130

150

140

160

170

150

~

180

~

190

:X.. :

200

160

YAVSTFGFLSTESTGINLTYTNSRDCIIDLFSRIIKI-VYDPCTVE---TSNDCRLLRLLMANTS* VSVSMFGF-KSNGSEIR-KFKNLESCIDYTFSQSTHSDIKNPCILQNGIKSKECIFLKSMF* 220 240 230 260 250

210

FIG. 8. Comparative homologies of the AmEPV G4R ORF and the HM3 ORF of capripoxvirus. The region of homology consists of a 31.7% identical region over a 142-amino-acid overlap, as determined by FASTA. Symbols are defined in the legend to Fig. 7.

6525

AmEPV SPHEROIDIN GENE

VOL. 65, 1991

A 60

50

40

30

20

10

90

80

70

RSIRLNSHKDLPQEYRYVNVHFLIS-YTNNRKSVDKEILDIIKDKQGKINVIFDLLKSSSIESIHNTYKYIEPAENEIIFDTIRKTRMKEMNVSNVIIN... ... . .. :.X:::: VV NTPase RAIRLNSHVLTPPERRYVNVHFIMARLSNGMPTVDEDLFEIIQSKSKEFVQLFRVFKHTSLEWIHANEKDFSPIDNESGWKTL-VSRAIDLSSNKNITNK

G6L

... ..... . .....

1360

1350

..

1440

160

150

140

130

120

110

100 G6L

...

.

1430

1420

1410

1400

1390

1380

1370

...

.

.. .. ..... .. ...

-IKLYPISYCKDYDRATILKGLLNKDTNIVYKDNTAVAKLMIDKDNIPIFIIENDTLIYIADDYYE*

VV NTPase LIEGTNIWYSNSNRLMSINRGFKGVDGR-VYDVD---GNYLHDMPDNPVIKIHDGKLIYIF* 1500 1470 1480 1490 1460 1450

B 60

50

40

30

20

10

100

90

80

70

G6L

RSIRLNSHKDLPQEYRYVNVHFLISYTNNRKSVDKEILDIIKDKQGKINVIFDLLKSSSIESIHNTYKYIEPAENEIIFDTIRKTRMKEMNVSNVIINIK

CbEPV NPH

RSIRLNSHEYLPINYRYVNVHFIISYSNNRKSVDKEMLDIIKNKQGKINVVFDLLKASSIETIHNMHKYIEPVDNEIIFEIIRKTRMKEMNISNVIINLK 580

570 110

G6L

120

130

140

630

620

610

600

590

660

650

640

160

150

LYPISYCKDYDRATILKGLLNKDTNIVYKDNTAVAKLMIDKDNIPIFIIENDTLIYIADDYYE*

CbEPV NPH LYPITYCKDYDRATILKGFLNKDTNIIYDNDTPVAKLIVDNNNLPIFVIENDILIYITNDYY* 720 690 700 710 680 670

FIG. 9. (A) Comparative homologies of the AmEPV G6L ORF and the NTPase I ORF of vaccinia virus. The regions compared show 31.9% identity over a 160-amino-acid overlap, as determined by TFASTA. Symbols are defined in the legend to Fig. 7. Note that the G6L ORF of AmEPV represents a truncated, incomplete sequence. (B) Alignments of the deduced amino acids of the truncated G6L ORF and the 3' end of the NPH I ORF of CbEPV. The two sequences exhibit 78.4% identity over 162 amino acids, as determined by TFASTA.

results are shown in Fig. 6. Complementarity was observed until the AAA of the upstream TAAATG motif (Fig. 4 and 5), indicating that transcription of the gene initiates within the TAAATG element of the proposed late promoter element. Immediately upstream is a 5' tract of noncoded poly(A) on the transcripts. The average length of the poly(A) is greater than 6 bp. Our results are very similar to those reported for the highly expressed cowpox virus ATI gene (24). Analysis of ORFs adjacent to the spheroidin gene. Analysis of the sequence upstream of the spheroidin gene revealed four additional potential ORFs, GlL, G2R, G3L, and G4R (Fig. 3). No significant homologies were found for the small potential polypeptides encoded by ORF G2R or G3L. ORF GiL, however, exhibited a significant degree of homology to ORF 17 found within the HindIII-I fragment of vaccinia virus (Fig. 7), whose function is unknown (36). ORF G4R showed homology to ORF HM3 of capripoxvirus (Fig. 8). In vaccinia virus, the ORF HM3 homolog was found very near the site of an incomplete ATI gene (8). The partial G6L ORF (Fig. 3 and 9) to the right of the spheroidin gene exhibited good homology to vaccinia virus NTPase I (3, 30). This relationship is shown in Fig. 9A. Much better homology (78.4% identity over 162 amino acids) was found between the partial G6L ORF and NPH I of CbEPV (41), another insect poxvirus. The comparative alignments are shown in Fig. 9B.

and that there was a 5' tract of poly(A), both characteristic of late transcripts (24, 32). AmEPV spheroidin gene elements responsible for such high levels of expression may have significance for the development of a generalized eukaryotic expression vector capable of functioning in either a vertebrate or an invertebrate environment (12, 26). The predicted molecular mass of 115 kDa for AmEPV spheroidin agreed well with our observations (Fig. 2) and the 110 kDa reported previously (21). The spheroidin of CbEPV has an apparent size of 50 kDa (40). We have compared the DNA sequences of the spheroidin genes of AmEPV and CbEPV and found little, if any, significant homology between the two spheroidin sequences. Why these two spheroidin proteins, which presumably serve similar functions in both viruses, should be so different is not clear. This disparity is particularly surprising in view of the striking degree of homology between the NTPase I (NPH I) gene fragments of AmEPV and CbEPV (41) (Fig. 9B).

225 kb

AmEPV C

II

B

I F

A

Identification, cloning, and sequencing of a fragment of Amsacta moorei entomopoxvirus DNA containing the spheroidin gene and three vaccinia virus-related open reading frames.

Entomopoxvirus virions are frequently contained within crystalline occlusion bodies, which are composed of primarily a single protein, spheroidin, whi...
3MB Sizes 0 Downloads 0 Views