Vol. 65, No. 12
JOURNAL OF VIROLOGY, Dec. 1991, p. 6516-6527
0022-538X/91/126516-12$02.00/0 Copyright © 1991, American Society for Microbiology
Identification, Cloning, and Sequencing of a Fragment of Amsacta moorei Entomopoxvirus DNA Containing the Spheroidin Gene and Three Vaccinia Virus-Related Open Reading Frames RICHARD L. HALL AND RICHARD W. MOYER*
Department of Immunology and Medical Microbiology, University of Florida, Box 100266, JHMHC, Gainesville, Florida 32610-0266 Received 21 May 1991/Accepted 28 August 1991
Entomopoxvirus virions are frequently contained within crystalline occlusion bodies, which are composed of primarily a single protein, spheroidin, which is analogous to the polyhedrin protein of baculovirus. The spheroidin gene of Amsacta moorei entomopoxvirus was identified following the microsequencing of polypeptides generated from cyanogen bromide treatment of spheroidin and the subsequent synthesis of oligonucleotide hybridization probes. DNA sequencing of a 6.8-kb region of DNA containing the spheroidin gene showed that the spheroidin protein is derived from a 3.0-kb open reading frame potentially encoding a protein of 115 kDa. Three copies of the heptanucleotide, TTTTTNT, a sequence associated with early gene transcription in the vertebrate poxviruses, and four in-frame translational termination signals were found within 60 bp upstream of the putative spheroidin gene promoter (TAAATG). The spheroidin gene promoter region contains the sequence TAAATG, which is found in many late promoters of the vertebrate poxviruses and which serves as the site of transcriptional initiation, as shown by primer extension. Primer extension experiments also showed that spheroidin gene transcripts contain 5' poly(A) sequences typical of vertebrate poxvirus late transcripts. The 92 bases upstream of the initiating TAAATG are unusually A+T rich and contain only 7 G or C residues. An analysis of open reading frames around the spheroidin gene suggests that the colinear core of "essential genes" typical of the vertebrate poxviruses is absent in A. moorei entomopoxvirus.
Recent studies of the spruce budworm, Choristoneura biennis, EPV (CbEPV) have reported the identification and sequence of the CbEPV spheroidin gene (40), the discovery of amino acid homology between CbEPV spheroidin and a baculovirus glycoprotein (39), and the demonstration that the 5'-noncoding region of a CbEPV gene functions as a late promoter when introduced into vaccinia virus (26). The spheroidin gene of AmEPV presented in this paper does not resemble the CbEPV spheroidin gene (40). This is a surpris-
Entomopoxviruses (EPVs) are poxviruses of insects. A crystalline occlusion body (OB) composed primarily of a single protein, spheroidin (2), protects the virions during transmission from one insect to another. The gene for the highly expressed spheroidin of Amsacta moorei entomopoxvirus (AmEPV) is a candidate insertion site for use as an invertebrate expression vector and a model for the study of the regulation of a highly expressed invertebrate poxvirus gene. The function of spheroidin is analogous to that of the polyhedrin protein of baculovirus, another occluded insect virus. The baculovirus polyhedrin gene has been used as an insertion site during the development of this virus as a generalized viral expression vector system (23). AmEPV was discovered in 1967 (29) and is the type species of genus B of EPVs (22). AmEPV is one of three known EPVs which will replicate in cultured insect cells (11, 17, 35). The major structural protein, spheroidin, has been reported (21) to be 110 kDa in size and to consist of a high percentage of charged and sulfur-containing amino acids. The AmEPV double-stranded DNA genome is about 225 kb long and unusually A+T rich (18.5% G+C) (20). Recently, a series of restriction maps for AmEPV were published (13). It has been suggested that spheroidin may be a member of the same poxvirus protein family which includes the cowpox virus A-type inclusion (ATI) and its homologs (25). The ATI gene has been identified and sequenced (6), and ATI is one of the most highly expressed vertebrate poxvirus gene products (24). The A+T-rich promoter region of the ATI gene could be one factor responsible for the high level of expression (6). Virions are embedded in ATIs much as they are in EPV OBs, but the cowpox virus inclusions are not crystalline. *
ing result, since both viruses belong to genus B, and it was expected that the spheroidins of various EPVs would be conserved, as are the polyhedrins of baculoviruses (31). MATERIALS AND METHODS Growth of virus in caterpillars and cell cultures. The AmEPV used in this study was obtained (13) from Robert R. Granados, Boyce Thompson Institute for Plant Research, Cornell University. The OBs used in this study were purified from infected Estigmene acrea caterpillars (13). Gypsy moth cell line IPLB-LD-652 was obtained from Ed Dougherty, Insect Pathology Laboratory, Agricultural Research Service, U.S. Department of Agriculture, Beltsville, Md. The replication of AmEPV in this cell line has been described previously (10). The cells were maintained at 26 to 28°C in EX-CELL 400 (JRH Biosciences, Lenexa, Kans.) supplemented with 10% fetal bovine serum, 100 U of penicillin, and 100 jig of streptomycin per ml. The AmEPV inoculum for cell culturing was from an AmEPV-infected, freeze-dried E. acrea larva stored at -70°C (13). The larva was crushed and macerated in 5 ml of EX-CELL 400 (with penicillin and streptomycin but without fetal bovine serum) to which 0.003 g of cysteine-HCl had been added to prevent melanization. The debris was pelleted at 200 x g for 5 min, and the
Corresponding author. 6516
VOL. 65, 1991
supernatant was passed through a 0.45-,um-pore-size filter. For the preparation of viral DNA, cells were infected with AmEPV by addition of the inoculum to a preconfluent monolayer of cells (about 0.1 to 1 PFU per cell), with occasional agitation of the dish during the first day. Infected cells were harvested 5 or 6 days postinfection. For routine virus quantitation, 1 ml of an appropriate virus dilution (prepared in unsupplemented EX-CELL 400) was added to a preconfluent monolayer of cells in a 60-mm culture dish, with intermittent agitation over a 5-h adsorption period at 26 to 28°C. The virus inoculum was removed, and 5 ml of a 0.75% SeaPlaque agarose (FMC BioProducts, Rockland, Maine) overlay prepared with 2x EX-CELL 400 and equilibrated at 37°C was added to the monolayer. Plaques were visualized after 5 days of incubation at 26°C by inspection with a stereomicroscope. SDS-PAGE, protein blotting, protein microsequencing, and glycosylation analysis. Sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) of proteins was performed (19) with a 4% acrylamide stacking gel and a 7.5% separating gel. The acrylamide used to prepare spheroidin for protein microsequencing was deionized with AGSO1X8 resin (Bio-Rad, Richmond, Calif.). The gels were polymerized overnight at 4°C. For sample preparation, 2 x Laemmli sample buffer consisting of 125 mM Tris-HCl (pH 6.8), 4% SDS (wt/vol), 10% P-mercaptoethanol (vol/vol), and 20% glycerol (vol/vol) was used. OB suspension samples were diluted 1:1 with 2 x Laemmli sample buffer and boiled for 5 min. Following electrophoretic separation, spheroidin in the unstained gel was transferred to an Immobilon (polyvinylidene difluoride [PVDF]) membrane with a Bio-Rad TransBlot apparatus at 90 V for 2 h in a buffer consisting of 10 mM morpholinepropanesulfonic acid (pH 6.0) and 20% methanol. Spheroidin was visualized on the PVDF membrane by Coomassie blue staining. Bands were cut from the membrane, and protein microsequencing was done with an Applied Biosystems gas-phase sequencer. Cyanogen bromide cleavage was performed on samples of spheroidin eluted from the PVDF membrane. Spheroidin within SDS-polyacrylamide gels was tested for glycosylation by periodic acid-Schiff staining (42). Preparation of AmEPV DNA, restriction enzymes, and gel electrophoresis. AmEPV DNA was prepared from infected IPLB-LD-652 cells by one of two methods. The first method used in situ digestion of infected cells embedded within agarose plugs, after which the released cellular and viral DNAs were separated by pulsed-field electrophoresis (BioRad CHEF-II-DR system). IPLB-LD-652 cells were infected with first-cell-culture-passage AmEPV. Infected cells were harvested 6 days postinfection by centrifugation at 200 x g for 5 min, rinsed, and resuspended in modified Hanks phosphate-buffered saline (PBS), which contained 15 g of glucose per liter but no Ca2" or Mg2+. For embedding of the infected cells in agarose plugs, 1% SeaPlaque GTG agarose (prepared in modified Hanks PBS and equilibrated at 37°C) was mixed 1:1 with infected cells to yield 5 x 106 cells per ml in 0.5% agarose. Digestion to release DNA was done by gentle shaking of the inserts in 1% Sarkosyl-0.5 M EDTA-1 mg of proteinase K per ml at 50°C for 2 days (37). The CHEF-1I-DR parameters for DNA separation were 180 V, a pulse ratio of 1, 50-s initial and 90-s final pulse times, and a run time of 20 to 25 h at 4°C. The separating gel was 1% SeaKem GTG agarose in 0.5x TBE buffer (33). Viral DNA bands were visualized by ethidium bromide staining and electroeluted (1). The recovered DNA was used for plasmid cloning following ethanol precipitation. Restriction endonu-
AmEPV SPHEROIDIN GENE
6517
cleases were purchased from New England BioLabs and Promega and used as suggested by the manufacturers. The second method of viral DNA preparation used the extracellular virus found in the infected-cell-culture supernatant. The supernatant from 10-day-postinfection cell cultures was clarified by centrifugation at 200 x g for 5 min. Virus was collected from the supernatant by centrifugation at 12,000 x g. Viral pellets were resuspended in 6 ml of lx TE (10 mM Tris-HCI-1 mM EDTA, pH 8.0). DNase I and RNase A (10- and 20-,ug/ml final concentrations, respectively) were added, and the mixture was incubated at 37°C for 30 min. The mixture was heated to 50°C for 15 min. SDS and proteinase K (1% and 200 ,ug/ml, respectively) were then added. The sample was incubated at 50°C overnight and extracted three times with buffer-saturated phenol and once with SEVAG (33). The DNA was ethanol precipitated and resuspended in lx TE (pH 8). Preparation of RNA and primer extension reactions. Six 150-mm dishes of subconfluent cells were prepared. The culture media were aspirated, and 2 ml of viral inoculum was added to each dish. The virus concentration was about 0.1 to 1 PFU per cell. The dishes were occasionally agitated during a 3-h adsorption period. At the end of this period, the cells were rinsed with 5 ml of modified PBS (described above), the media were replaced, and the infected cells were incubated for 72 h at 27°C. Total RNA from the infected cells was isolated by the guanidinium thiocyanate-cesium chloride
procedure (4). Primer extension reactions were carried out with primer RM165, a 35-base oligonucleotide (GTTCGAAACAAGTA TTTTCATCTTTTAAATAAATC) beginning and ending 100 and 65 bp downstream, respectively, of the initiating methionine codon found in the TAAATG motif. The primer was end labeled with [y-32P]ATP and T4 polynucleotide kinase and purified on a spun column (33). For annealing, 40 ,ug of total infected-cell RNA and 106 cpm of radiolabeled primer were coprecipitated with ethanol. The pellet was resuspended in 25 p,u of hybridization buffer [80% formamide, 40 mM piperazine-N,N'-bis(2-ethanesulfonic acid) (pH 6.4), 400 mM NaCl, 1 mM EDTA (pH 8.0)], denatured at 72°C for 15 min, and incubated at 30°C for 18 h. For primer extension, the RNA-primer hybrids were ethanol precipitated, resuspended, and used for five individual reactions. Each reaction contained 8 ,ug of total infected-cell RNA, 50 mM Tris-HCI (pH 8.3), 50 mM KCI, 10 mM dithiothreitol, 10 mM MgCl2, 4 U of avian myeloblastosis virus reverse transcriptase (Life Sciences), 8 U of RNasin (Promega), 0.25 mM each deoxynucleoside triphosphate (dNTP), and the appropriate dideoxynucleoside triphosphate (ddNTP), except for a control reaction, which contained no ddNTP. The dNTP/ddNTP ratios were 4:1, 5:1, 5:1, and 2:1, for the C, T, A, and G reactions, respectively. The reactions were carried out at 42°C for 30 min. One microliter of chase buffer (4 ,ud of 5 mM dNTP mixture and 1 RI of 20-U/,l reverse transcriptase) was added to each reaction mixture, which was then incubated for an additional 30 min at 42°C. Reaction products were separated on a sequencing gel (8% acrylamide containing 7 M urea) and visualized by autoradiography. DNA radiolabeling, hybridization, and autoradiography. DNA probes were radiolabeled with [o-32P]dCTP by the random oligonucleotide extension method (5). Specific oligonucleotide probes were end labeled with [y-32P]ATP and T4 polynucleotide kinase (33). Both types of probes were purified by passage through spun columns of Sephadex G-50. Southern transfer was done with Hybond-N (Amersham); the transferred DNA was fixed to the membrane by UV
6518
HALL AND MOYER
J. VIROL.
FIG. 1. AmEPV OBs. Shown is a scanning electron micrograph of OBs from sucrose gradient-purified extracts from infected E. acrea caterpillars. The long axis of the OB is about 4 ,um.
cross-linking. DNA-DNA hybridization (other than with oligonucleotide probes) was done at 65°C with BLOTTO (33) and was followed by two washes at room temperature with 0.3 M NaCl-0.06 M Tris (pH 8)-2 mM EDTA for 5 min, two washes for 15 min each at 65°C but with 0.4% SDS added, and two washes at room temperature with 0.03 M NaCI-0.06 M Tris (pH 8)-0.2 mM EDTA. Hybridization with oligonucleotide probes was done at 37 or 45°C with BLOTTO and was followed by the first wash only (see above). Bacterial strains, cloning, and electroporation. Cloning with plasmids and M13 was accomplished with Escherichia coli SURE (Stratagene, La Jolla, Calif.) and E. coli UT481, respectively. A BglII AmEPV DNA library was prepared by
cloning BglII-cut AmEPV DNA into BamHI-digested, phosphatase-treated plasmid pUC9. A DraI AmEPV DNA library was prepared by cloning into SmaI-digested, phosphatase-treated vector M13mpl9. Ligation and heat shock transformation procedures were as described previously (33). Transformation by electroporation was done with a Bio-Rad Gene Pulser following the instructions provided by the manufacturer. Minipreparations of plasmids were made by an alkaline lysis procedure, and preparations of M13 virus and DNA were made by standard procedures (33). PCRs. Inverse and regular polymerase chain reactions (PCRs) with custom-designed oligonucleotide primers were performed as described previously (18). The specific reac-
VOL. 65, 1991 1
2
AmEPV SPHEROIDIN GENE 3
4
"9i.JAL
0 I*
ow
5
6
7
8
9
AmEPV I C I II
10 11
116 kDa
r,
Hind III map:
B
10 kb
A
I
IIJIG
F
0.5kb
H
B
D
IH |
bP 4!504
B I
18bp
4811
1.88 kb
4891
6768 ,0
G4R
G2R
0.5 kb
0.7 kb
G3L
GSR
|
E
B B B ii
| 4.51 kb 931
GIL 1.4 kb
6519
_.
.....
G6L
3.0 kb
ISpheroidin
0.2 kb
FIG. 2. Purification of spheroidin from purified OBs by PAGE. A 10-,ul suspension containing approximately 2.3 x 105 OBs was solubilized by 1:1 dilution with 2x Laemmli sample buffer as described in Materials and Methods. This sample, after being heated, was loaded in lane 1. Lanes 2 through 10 contain successive 1:1 dilutions of the original OB suspension. Each OB dilution was then individually solubilized before electrophoretic analysis. Lane 11 contains a molecular mass standard. The arrow shows the position of the spheroidin protein.
tion conditions for 34 cycles were as follows: 30 s at 94°C for denaturation, 30 s at 37°C for annealing, and 1.5 min at 72°C for extension. Finally, the samples were incubated at 72°C for 8.5 min to complete extensions. The concentration of each primer was 1 pLM. The initial primer, RM58 (GA5GT7 GA6CC7GA5TA6GT), where 5 represents A or G, 6 represents C or T, and 7 represents A, G, C, or T), was prepared on the basis of the results obtained from protein microsequencing of the 6.2-kDa polypeptide (see below). Inverse PCR was used in conjunction with self-ligation of ClaIdigested AmEPV DNA to prepare a probe to identify clones containing a flanking sequence or to verify the absence of an intervening sequence between adjacent clones. The primer pair used in inverse PCR was RM82-RM83. The sequence of RM82 was TTTCAAATTAACTGGCAACC, and that of RM83 was GGGATGGA'1l-'ITAGATTGCG. The resulting 2.2-kb inverse PCR product was digested with ClaI, and a 1.7-kb fragment was gel purified, radiolabeled, and used as a probe to locate additional clones. Standard PCR with 400 ng of genomic AmEPV DNA as a template was used to prepare a probe to identify a 586-bp DraI clone from nitrocellulose filter replicas (plaque lifts) of an M13 shotgun library of DraI-cut AmEPV fragments. This was done to isolate a clone spanning a central unsequenced region of the spheroidin gene. The standard PCR primers used for this reaction were RM92 (GCCTGGTTGGGTAACACCTC) and RM118 (CTGCTAGATTATCTACTCCG). Sequencing. All DNA sequencing was done by the dideoxy chain termination method (34) with [a-35S]dATP and Sequenase (US Biochemical, Cleveland, Ohio). Double-stranded plasmid sequencing (15) was done with "miniprep" (33) DNA and 1 pmol of universal, reverse, or custom-designed oligonucleotide primer in each sequencing reaction. Standard sequencing reactions with Sequenase were carried out in accordance with the instructions of the supplier, US Biochemical. Nested exonuclease III deletions (16) were used to sequence plasmids pRH512, pRH85, and pRH87. Plasmid pRH512 contained a 4.51-kb BglII insert, whereas plasmids pRH85 and pRH87 contained identical
Vaccinia ORF 17
Capripoxvirus HM3
Vaccinla NTPase I
Amino Acid Homologies
FIG. 3. Location of the spheroidin gene within the AmEPV genome. The Hindlll map of AmEPV shows the location of the spheroidin gene and adjacent ORFs. The sizes of relevant BgIII fragments (in kilobases and base pairs) are shown under the expanded restriction map. The direction of transcription is shown by the arrows. Also shown are the sizes (in kilobases) of the relevant ORFs and restriction sites for HindlIl (H) and BglII (B). Relevant ClaI sites at bases 3485 and 6165 are not shown. The numbering of base pairs corresponds to that shown in Fig. 4.
1.88-kb BglII inserts in opposite orientations. Deletions were made from the universal primer end. For making these deletions, the DNA was cut with EcoRI, filled in with a-thiophosphate dNTPs (28) by use of the Klenow fragment of E. coli DNA polymerase, cut with SmaI, and treated with exonuclease III. Samples were removed every 30 s, religated, and used to transform E. coli SURE cells by electroporation. Sequencing reactions were carried out with the universal primer.
Single-stranded M13 shotgun sequencing was done on plasmid pRH512, which was later shown to contain the 5' half of the spheroidin gene. Plasmid pRH512 was sonicated to produce random fragments, repaired with bacteriophage T4 DNA polymerase, and cloned into SmaI-cut M13mpl9 vector. Plaque lifts were screened with a radiolabeled probe prepared from the 4.5-kb insert found in pRH512 to identify appropriate clones for shotgun sequencing. M13 sequencing was also done on a 586-bp DraI clone covering an area to the right of pRH512. Portions of a 1.7-kb inverse PCR fragment from a Clal fragment self-ligation were sequenced with various customdesigned oligonucleotide primers. Sequenase, 5 pmol of each primer, and 10 to 50 ng of template were used. Prior to being sequenced, PCR products were chloroform extracted and purified on spun columns (33) of Sephacryl S-400. The DNA sequence was assembled and aligned, and consensus sequence was produced (38). Both strands were completely sequenced; the PCR product sequence was confirmed by conventional sequence. Data base searches. A search of the National Biomedical Research Foundation protein data base (7) (version 26) for homology with the deduced amino acid sequences of AmEPV spheroidin and other proteins was made with FASTA (27). TFASTA searches of GenBank (version 65)
6520
J. VIROL.
HALL AND MOYER
*
I
N
S
L
E
S
N
E
K
N
D
1
AGATCTGATGTTCTATATATAGTACAAATTTGTATGATTAATTGATATTTTAAAATTCAAGATATTAAATATTAGATTCTAAACTATTCTTCTCATTATC
101
AATATAACTATCATAATCATTTTTTATTTTACTACATACATTCATAATTCTATTACTATTTTTTTTATACATATCTATTAATTCCATAAACTTTTTATTT
201
TTTATATTAAATATTTCTAATGTATTTTTAAATTCGTCAATACTATTAATATCATATCTAGAAATAAATAATGCACCTCTATAACTACTAGCCAATAAAT
301
CACCAATAAAACTCATAGAATAATATAATTTTTTAAATTCAAATTTAGATTTTATGTTGAAATAAACTATATAATATAAAAATATTATATTAAACATACC
401
ACAATCGGGACTATCATATTGTAATTCAAAAGTATTAAAAAAGTAATAATTTACATTTTTAAATATATCATTTAAATATTCTGATAGTACATCAATGTAT
501
AAATAAGCATAATTAGTATTAGGAGTACTATTGTAGTGTTTATGGCTTTTTATAGTCATATCAGATTCAATAAACATATATTTTTTATTTTGTTTTATAA
601
GTTCTGGTATATAACCACTACTATTAAAAAAGTATGCAGCTTTTTTATCTTTATCAAAGTGTTTATCTATTACGCAACAAGTAAAATGATCATTATAAAT
701
TATAGGAAACATAAAAAATCTTTTTTTATCATTCATTAAAAAAAATTTTACTCTATCTTCAAGTTTATAGCATCTCATAGATGAAGCTACTGTAGCAATA
801
TTTTTATCAGTTTTTTCAAATAAAATCAAATGAAAATAATCATAATCTGTATTAATCATAGTTAATGGATATATACAATTATATATATCTCCCGAACTTA
901
ACCATGTAGATTTATCATGTTTTCTTGGGTAAGCTTTAGGTTTAGGATTAAATCCCAAAGGCGGTATTCCTATTTGAGCATCCAAATCATCATAAATTGT
1001
GGCAAATGTAGAAAAATCTCTTGTTTTGGATAATTCTGATTTTAGAAAAGACTTTCTCATATATACTAATGGAATGCCTTTATATTTTTTAGATGTAATA
1101
AAAGTATTAATATTTATATTTTTATCTTGTAAATATTTTTTTATAGTCCAAAATAGAAAAAATTTTCTTTTAATATTATTTTCAAAATTAATATTATTAA
1201
TATGATTTGGATCTAAAACTAATTCATTATATAATATTTCCAAGTATTTTATAGGTATAAATGTTACTTTACCTCTTGTTTCATCATCATCATCTATTTT
1301
TTCTAATATAGCTATATTTGCATTAGTATTATATTTAATAGGATTTATAAAATATACCATATTATCTATTTTACTAAAAAATAACATAGACATAAAATTA
I
Y
K
I
F
Y
L
L
E
N
D
K
T
I
N P
N
H E
L
I
D
S
A
N
K
M
P
E
T
N
N
Y
P
F
I
I
F
V
I
N
G S
E
M
L
F
G2R > M F L
Y
V
I
M
F
N
D
S
I
N
D
T
T
I
D
I
S
N
D
E
T F
F
S
Y
K
I
S
D
K
A
G
D
Y
K N
R
K
< GCL N M
N
G
I
K
D
N
K
R
T
I
L
I
Y
T
D
I
N
V
I
D
A
Q
P
Y
K
D
A
Y
I
Q
H
S
N
D
N
F
S
C
I
L
K
F
M
V
Y
K
I
R
F
I
P
V
Y
F
G
I
M
I
Y
T
M G
F
V
K
L
L
N
L
K
C
R
I
S
Y
C
C
P
P
M
L
Y
L
P
R
F
K
N
R
W
T
L
K
S
T
L
I
P
I
E
K
F
L
K
I
Y
K
K
L
N
K
M
G
F
N
K
L
M
V
I
E
A
S
S
F
Y
F
I
Y
L
L
I
D
R
Y
N
E
K
E
I
N
S
H
D
R
T
P
K
S
Y
Y
N
N
L
L
D
F
D
V
D
M
G
Y
D
I
L E M F K K N
I
A
I
V
F
D
L
F
Y
K
T
K
K
Y
P
E
F
D
K
L
Q
Y
A
S
D
L
Y
K
K
V
I
E
N
L
A
I
G
I
N
T
R
F
F
I
D
K
F
N
V
K
S
K
L
N
Y
H
A
M
H
P
R
A
N
L
I
K
D
D
K
L
H
F
I
F
D
S
K
K
H
Y
F
Y
I
S
N
I
K
S
R
Y
M
K Y
K
N
S
D
I
K
F
N
R
N
F
F
Y
I
S
E
N
N
F
N
R
E
K
T
F
K
S
F
F
F
T
T
W A
M
S
S
S
T
P
I
F
T
F
M
N
D
K
K
E
V
E
F
L
L
N
G
Y
F
P
Q
T
K
Y
C
S
K
N
Y
Y
N
I
P
I
L
Y
T
S
D
S
A
L
M
I
K
N
E
S
P
D
D
I
F
I
C
Y
D
N
G
D
1401
S
K
N
F
N
F
T
ATACCAGATTCTGGCATTTTTAAATTTTTATTTGGAAATCTTCTAATTTTATTATTCATTATTTATTTAATAAATGTTTCTAGTTTATTTCAATACATTT L
I
I
L
L
I
F
G II
I
G
Y
I
T
L
F
F
V
N
I
D
F
L
N
I
N
N
K
I
Y
I
S
L
1501
TTAATAATAATTTTATTATTTGGTATTATAGGTATTTATATATTAACATTTGTGTTTAATATAGATTTTTTAATAAATAATAATAAAATATATATATTAT
1601
CATATAACGCAACTAATATAAACAATATAAATAATTTAAATTTATACGATTATTCAGATATTATATTTTTGACAAATTTTAACATAAATAATAATCTTTT
1701
AGTAACACAAGCTAATAATTTACAAGATATACCAATATTTAATGTAAATAATATTATATCTAATCAATATAATTTTTATTCAGCGTCTAGTAATAATGTA
Y
V
N
N
T
A
T
Q
N
A
I LL
I
N
N
I
Q
L
R
L
G
N
N
K
N
I
D
I
L
N
I
P
L N
T
L
N
N
F
N
Y
D
V
N
P
R N
Y
N
F
D
S
I
L
I
S
F
L
I
I
N
F
Y
Q
R N
T
L
N
S
T
N
F
N
F
Y
L A
I
S
I
N
A
S
F
V
N
N
N
S
N
N
L
L
N
N
V
E T
1801
AATATATTATTAGGATTAAGAAAAACATTAAATATAAATAGAAATCCATTTTTATTATTTAGAAATACATCTCTAGCTATAGTTTTCAATAATAATGAAA
1901
F H C Y I S S N Q N S D V L D I V S H I E F M K S R Y N K Y V I I CTTTTCACTGTTATATAAGTTCAAATCAAAATAGTGATGTATTAGATATAGTATCACATATAGAATTTATGAAATCTAGATATAATAAATATGTAATTAT
2001
AGGAGAAATACCCGTAAATAATAATATATCTATTAATAATATATTAAATAATTTTGCTATTATAACTAATGTGAGATTAATAGATAAATATAACTCTATA
2 1 01
ATATCATTTTTAAATATCAACGTAGGAACACTTTTTGTCATAAATCCATAATATTTAGTAATAATCACTAACATATTTTTTATTAAAATGAATAAAATAT
E
G
I
S
P
I
F
L
N
V
N
I
N
N
N
V
I
G
S
T
N
I
L
F
N
V
I
I
N
L
N
P
N
F
A
I
I
T
N
V
R
L
I
D
K
Y
N
S
I
*
* K K K K S K P I I K G E L R K D G W V ATATTGTTATTGTCAATATTTTATATCATTTTACAGTCTTATTTTTTTTTTTTGCTTTTAGGTATAATTTTACCTTCTAAACGTTTATCTCCCCAAACAT FIG. 4. Sequence of a 6.8-kb fragment of AmEPV DNA showing the deduced amino acids of spheroidin and adjacent upstream and downstream ORFs. The RM58 binding site and the spheroidin peptide sequences corresponding to those derived from direct protein microsequencing are underlined.
2201
VOL. 65, 1991
AmEPV SPHEROIDIN GENE
D
T
V
S
P
K
N
E
S
T
N
V
Y
D
P
A
N
A
A
N
T
L
D
G
Y
Y
G
P
Y
R
D
N
K
6521
F
2301
CTACAGTAGATGGTTTATTAGATTCTGTGTTATACACATCTGCTGGATTTGCGGCATTTGTATCCAAACCATAATATCCAGGTCTATAATTATCTTTAAA
2401
< G3L V Q S Q S V E E T K L N N F Y G L N N K K S SS M AACTTGGGATTGAGATACTTCTTCAGTTTTTAAATTATTAAAATATCCAAGATTATTTTTTTTTGATGAAGACATAATTGATATTATAATACTTTATAGA
G4R > M S I
2501
F I Y Y I F N N R F Y I Y K R M N T V Q I L V V I L I T T A TATGTCAATATTTATCTACTATATTTTCAACAATAGATTTTATATATATAAAAGAATGAATACTGTACAAATTTTAGTTGTCATATTAATAACAACAGCA
2601
L S F L V F Q L W Y Y A E N Y E Y I L R Y N D T Y S N L Q F A R S A TTATCTTTTCTAGTTTTTCAATTATGGTATTATGCCGAAAATTACGAATATATATTAAGATATAATGATACATATTCAAATTTACAATTTGCGAGAAGCG
2701
N I N F D D L T V F D P N D N V F N V E E K W R C A S T N N N I F CAAATATAAATTTTGATGATTTAACTGTTTTTGATCCCAACGATAATGTTTTTAATGTTGAAGAAAAATGGCGCTGTGCTTCAACTAATAATAATATATT
2801
Y A V S T F G F L S T E S T G I N L T Y T N S R D C I I D L F S R TTATGCAGTTTCAACTTTTGGATTTTTAAGTACAGAAAGTACTGGTATTAATTTAACATATACAAATTCTAGAGATTGTATTATAGATTTATTTTCTAGA
2901
I I K I V Y D P C T V E T S N D C R L L R L L M A N T S * ATTATAAAAATAGTATATGATCCTTGTACTGTCGAAACATCTAACGATTGTAGATTATTAAGATTATTGATGGCCAATACATCATAAATACATTATAATA
(Spheroidin)
3001
G5R > M S N V P L A TTATTATAATATCAATCATAATTTTTATATATATTTTATCTAAAAGGACTTTTTATTTTTTATATATTAATAATAATAAATGAGTAACGTACCTTTAGCA
3101
T K T I R K L S N R K Y E I K I Y L K D E N T C F E R V V D M V V P ACCAAAACAATAAGAAAATTATCAAATCGAAAATATGAAATAAAGATTTATTTAAAAGATGAAAATACTTGTTTCGAACGTGTAGTAGATATGGTAGTTC
3201
L Y D V C N E T S G V T L E S C S P N I E V I E L D N T H V R I K CATTATATGATGTGTGTAATGAAACTTCTGGTGTTACTTTAGAATCATGTAGTCCAAATATAGAAGTAATTGAATTAGACAATACTCATGTTAGAATCAA
3301
V H G D T L K E M C F E L L F P C N V N E A Q V W K Y V S R L L L AGTTCACGGCGATACATTAAAAGAAATGTGTTTTGAATTATTGTTCCCGTGTAATGTAAACGAAGCCCAAGTATGGAAATATGTAAGTCGATTATTGCTA
3401
D N V S H N D V K Y K L A N F R L T L N G K H L K L K E I D Q P L F GATAATGTATCACATAATGACGTAAAATATAAATTAGCTAATTTTAGACTGACTCTTAATGGAAAACATTTAAAATTAAAAGAAATCGATCAACCGCTAT
3501
I Y F V D D L G N Y G L I T K E N I Q N N N L Q V N K D A S F I T TTATTTATTTTGTCGATGATTTGGGAAATTATGGATTAATTACTAAGGAAAATATTCAAAATAATAATTTACAAGTTAACAAAGATGCATCATTTATTAC
3601
I F P Q Y A Y I C L G R K V Y L N E K V T F D V T T D A T N I T L TATATTTCCACAATATGCGTATATTTGTTTAGGTAGAAAAGTATATTTAAATGAAAAAGTAACTTTTGATGTAACTACAGATGCAACTAATATTACTTTA
3701
D F N K S V N I A V S F L D I Y Y E V N N N E Q K D L L K D L L K R GATTTTAATAAATCTGTTAATATCGCAGTATCATTCCTTGATATATATTACGAAGTTAATAATAATGAACAAAAAGATTTATTAAAAGATTTACTTAAGA
3801
Y G E F E V Y N A D T G L I Y A K N L S I K N Y D T V I Q V E R L GATACGGTGAATTTGAAGTCTATAACGCAGATACTGGATTAATTTATGCTAAAAATCTAAGTATTAAAAATTATGATACTGTGATTCAAGTAGAAAGGTT P
3901
V
N
L
K
V
R
A
Y
T
K
D
E
N
G
R
N
L
C
L
M
K
T
T
S
S
T E
V
D
E
E
Y
GCCAGTTAATTTGAAAGTTAGAGCATATACTAAGGATGAAAATGGTCGCAATCTATGTTTGATGAAAATAACATCTAGTACAGAAGTAGACCCCGAGTAT RM58 site
4001
T S N N A L L G T L R V Y K K F D K S H L K I V M H N R G S G N V GTAACTAGTAATAATGCTTTATTGGGTACGCTCAGAGTATATAAAAAGTTTGATAAATCTCATTTAAAAATTGTAATGCATAACAGAGGAAGTGGTAATG
4101
F P L R S L Y L E L S N V K G Y P V K A S D T S R L D V G I Y K L TATTTCCATTAAGATCATTATATCTGGAATTGTCTAATGTAAAAGGATATCCAGTTAAAGCATCTGATACTTCGAGATTAGATGTTGGTATTTACAAATT
4201
N K I Y V D N D E N K I I L E E I E A E Y R C G R Q V F H E R V K AAATAAAATTTATGTAGATAACGACGAAAATAAAATTATATTGGAAGAAATTGAAGCAGAATATAGATGCGGAAGACAAGTATTCCACGAACGTGTAAAA
4301
L N K H Q C K Y T P K C P F Q F V V N S P D T T I H L Y G I S N V C CTTAATAAACACCAATGTAAATATACTCCCAAATGTCCATTCCAATTTGTTGTAAACAGCCCAGATACTACGATTCACTTATATGGTATTTCTAATGTTT
4401
L K P K V P K N L R L W G W I L D C D T S R F I K H M A D G S D D GTTTAAAACCTAAAGTACCCAAAAATTTAAGACTTTGGGGATGGATTTTAGATTGCGATACTTCTAGATTTATTAAACATATGGCTGATGGATCTGATGA
V
FIG. 4-Continued.
6522
HALL AND MOYER
4501
J. VIROL.
L D L D V R L N R N D I C L K Q A I K Q H Y T N V I I L E Y A N T TTTAGATCTTGACGTTAGGCTTAATAGAAATGATATATGTTTAAAACAAGCCATAAAACAACATTATACTAATGTAATTATATTAGAGTACGCAAATACA
4601
P N C T L S L G N N R F N N V F D M N D N K T I S E Y T N F T K S TATCCAAATTGCACATTATCATTGGGTAATAATAGATTTAATAATGTATTTGATATGAATGATAACAAAACTATATCTGAGTATACTAACTTTACAAAAA
4701
R Q D L N N M S C I L G I N I G N S V N I S S L P G W V T P H E A GTAGACAAGACCTTAATAACATGTCATGTATATTAGGAATAAACATAGGTAATTCCGTAAATATTAGTAGTTTGCCTGGTTGGGTAACACCTCACGAAGC
4801
K I L R S G C A R V R E F C K S F C D L S N K R F Y A M A R D L V TAAAATTCTAAGATCTGGTTGTGCTAGAGTTAGAGAATTTTGTAAATCATTCTGTGATCTTTCTAATAAGAGATTCTATGCTATGGCTAGAGATCTCGTA
4901
S L T. F M C N Y V N I E I N E A V C E Y P G Y V I L F A R A I K V I AGTTTACTATTTATGTGTAACTATGTTAATATTGAAATTAACGAAGCAGTATGCGAATATCCTGGATATGTCATATTATTCGCAAGAGCTATTAAAGTAA
5001
TTAATGATTTATTATTAATTAACGGAGTAGATAATCTAGCAGGATATTCAATTTCCTTACCTATACATTATGGATCTACTGAAAAGACTCTACCAAATGA
5101
K Y G G V D K K F K Y L F L K N K L K D L M R D A D F V Q P P L Y AAAGTATGGTGGTGTTGATAAGAAATTTAAATATCTATTCTTAAAGAATAAACTAAAAGATTTAATGCGTGATGCTGATTTTGTCCAACCTCCATTATAT
5201
I S T Y F R T L L D A P P T D N Y E K Y L V D S S V Q S Q D V L Q G ATTTCTACTTACTTTAGAACTTTATTGGATGCTCCACCAACTGATAATTATGAAAAATATTTGGTTGATTCGTCCGTACAATCACAAGATGTTCTACAGG
5301
GTCTGTTGAATACATGTAATACTATTGATACTAATGCTAGAGTTGCATCAAGTGTTATTGGATATGTTTATGAACCATGCGGAACATCAGAACATAAAAT
5401
TGGTTCAGAAGCATTGTGTAAAATGGCTAAAGAAGCATCTAGATTAGGAAATCTAGGTTTAGTAAATCGTATTAATGAAAGTAATTACAACAAATGTAAT
5501
K Y G Y R G V Y E N N K L K T K Y Y R E I F D C N P N N N N E L I S AAATATGGTTATAGAGGAGTATACGAAAATAACAAACTAAAAACAAAATATTATAGAGAAATATTTGATTGTAATCCTAATAATAATAATGAATTAATAT
5601
CCAGATATGGATATAGAATAATGGATTTACATAAAATTGGAGAAATTTTTGCAAATTACGATGAAAGTGAATCTCCTTGCGAACGAAGATGTCATTACTT
5701
GGAAGATAGAGGTCTTTTATATGGTCCTGAATATGTACATCACAGATATCAAGAATCATGTACGCCTAATACGTTTGGAAATAACACAAATTGTGTAACA
5801
AGAAATGGTGAACAACACGTATACGAAAATAGTTGTGGAGATAATGCAACATGTGGAAGAAGAACAGGATATGGAAGAAGAAGTAGGGATGAATGGAATG
5901
ACTATAGAAAACCCCACGTTTATGACAATTGTGCCGATGCAAATAGTTCATCTTCAGATAGCTGTTCAGACAGTAGTAGTAGTAGTGAATCTGAATCTGA
6001
TTCAGATGGATGTTGCGACACAGATGCTAGTTTAGATTCTGATATTGAAAATTGTTATCAAAATCCATCAAAATGTGATGCAGGATGCTAAATGAAATTT
6101
AATATTATATAATATTAACTTACAAGTTATAAAAATCATTAAAATGATTTTTTAAAATGATATTATCGATAGTTGTGATAATGTGCTCTTTTATTTTATT
6201
* E Y Y D D A I AATTGCGATGATTATAATATTATCTTTTAGATATATTTAATATTAATTATAAATCGACTGACAATAATATTTATTCCTATTCATAATAATCATCTGCTAT
6301
ATATATTAATGTATCATTCTCTATTATAAATATAGGTATATTGTCTTTATCAATCATTAATTTTGCTACAGCTGTATTATCTTTATATACTATATTTGTG
6401
D K N L L G K L I T A R D Y D K C Y S I P Y L K I N I I V N S V N TCTTTGTTTAATAAACCTTTTAATATAGTGGCTCTATCATAATCTTTACAATATGATATGGGATATAATTTTATATTAATAATAACATTAGATACGTTCA
Y
N
L
D
L
L
G
E
D
E R
Y
G
D
S
Y
L
I
E
M
C
C
T
D
D
M R
K
V
N
T
E
Y
N
D
E
K
A
I
I
R
C S
I
P
D
I
I
F
A
N
S
E
D
N
K
D
I
E N E A P
I
N
I
E
P
P
D S
C
N
E
D
A
L K A V A T N D
E
Y K Y
I
T N H I
G
E
E C
S
Y
W
L
T N
D
S D
E
*
K Y V
S E
N
C V
D
S
I
C
H
N
E
K
K
C
T R
S
H
N
R
R
N
P
E
Y
S
S
L
S
N
R
S
T
T
N
R
S K
S
G
G
G
C
F
T
K
E
C
E
N
Y
S
T
P
S
N G
C
Q
M
S
S
E
R
T
S
Y
P
R
D Y
C
E
T
R
S
N
G
Y
V
V
C
G
S
Y
D
S
C
S
Y
E
T
S I
N
Q
Y
N
A
D
T D
I
D
D
A L
F
G
C
S
R
N
L
H
I
G
G
L
A
P
I
V
N
F
L
S
S
G
I
H
H
S
L
E
I
S
A
R
G
V
Y
V
S
I
Y
N
D
T
E
R
A
K
G
A
A
E
H
P
Y
N
K
L
D
L
N
T
A
G
V
H
P
M
D
D
M
Y
H
V
I
K
L
Q
K
T
I
L
E
R
R
G
G
N
C
G
N
N
L
Y
R
I
C
A
G
Y
L
T
N
S
R
L
N T
I
I
S
S S
6501
TTTCTTTCATTCTAGTTTTACGTATTGTGTCAAAAATTATTTCATTTTCTGCTGGTTCTATATATTTATATGTGTTATGAATAGATTCGATAGATGATGA
6601
TTTTAATAAATCAAATATAACATTTATTTTACCTTGTTTATCTTTTATAATATCTAATATTTCTTTATCTACAGATTTTCTGTTGTTGGTATATGATATT
K
L
6701
L
F
L
H
D
V N
F
I
V Y
V
N
R Y
I
K
G
E Q P
Q
K
D
K
I
I
D
L
I
L D K H S N L R I
E
K D V S K R N N < G6L
S R
AAAAAATGAACGTTAACATATCTATATTCTTGTGGTAAATCTTTATGAGAATTTAATCTTATAGATCT FIG. 4-Continued.
T Y S
I
VOL. 65, 1991
AmEPV SPHEROIDIN GENE
TAAATACATTATAATATTATTATAATATCAATCATA
NGATC
6523
\ Translational Stop (Upstream ORF)
AITTTTAThTATATTTTATCTAAAAGGACIiTTITKI >
TTTTTAT
Early Transcription Termination Signals TATTAATAATAAITAAATGI
| Translational Stops
|
Consensus
Poxvirus
Late Transcriptional
Start Signal
FIG. 5. Ninety-two-base-pair sequence upstream of the predicted spheroidin gene promoter. Both potential translation and early poxvirus transcription termination signals are indicated. Only 7 G or C residues in the 92-bp sequence upstream of the spheroidin initiating ATG are present. The intercistronic region between the putative translation termination codon of the upstream ORF (G4R) and the initiating ATG of spheroidin is shown. Only potential translational stops nearest the ATG of spheroidin have been underlined.
and a local custom data base containing the entire vaccinia virus (Copenhagen strain) sequence (9) were also performed. Nucleotide sequence accession number. The sequence of the 6,768-bp AmEPV fragment reported in this article will appear in the GenBank data base under accession number M77182. RESULTS
Mapping of the spheroidin gene and sequencing strategy. To localize the spheroidin gene, we solubilized a purified preparation of OBs from infected caterpillars (Fig. 1) and subjected it to SDS-PAGE. The spheroidin protein (-113 kDa) was the predominant protein of the purified OBs (Fig. 2). Several lanes of an OB protein preparation (equivalent to lane 3 in Fig. 2) were electrophoretically separated and blotted to a PVDF membrane. The region of the PVDF membrane containing spheroidin was excised, and direct protein microsequencing was attempted. Microsequencing of the intact protein was unsuccessful, presumably because the N terminus of the protein was blocked. The protein was then treated with cyanogen bromide (CNBr) to generate internal peptide fragments for sequencing. Major polypeptides of 15, 9, 8, and 6.2 kDa were produced. A reliable amino acid sequence was obtained from the 9-, 8-, and 6.2-kDa polypeptides. The 8- and 9-kDa polypeptides represented overlapping partial CNBr cleavage products which together yielded the longest continuous amino acid sequence: Met-Ala-(Asn or Arg)-Asp-Leu-Val-Ser-Leu-LeuPhe-Met-(Asn or Arg)-(?)-Tyr-Val-(Asn?)-Ile-Glu-Ile-AsnGlu-Ala-Val-(?)-(Glu?). The amino acid sequence obtained from the 6.2-kDa fragment was Met-Lys-Ile-Thr-Ser-SerThr-Glu-Val-Asp-Pro-Glu-Tyr-Val-(Thr or Ile)-Ser-(Asn?). A partial sequence for the 15-kDa fragment was also obtained: (Asn?)-Ala-Leu-Phe-(Phe?)-(Asn?)-Val-Phe. All sequences were ultimately located within the spheroidin gene sequence (see Fig. 4, underlining). The sequence derived from the 6.2-kDa CNBr fragment was used to design a degenerate oligonucleotide to use as a hybridization probe to locate the spheroidin gene. The sequence of this probe (RM58) was GA5GT7GA6CC7GA5TA6GT, where 5 represents A or G, 6 represents C or T, and 7 represents A, G, C, or T. This probe hybridized to a 4.4-kb BglII fragment and the EcoRI-D fragment of AmEPV DNA. The probe was then used to select a BglII clone from a BglII library of AmEPV
Poly A
T
TG
AAG
X_
G GA cC CT A
|*T A
TT
AAA Ac AA FIG. 6. Primer extension analysis of the spheroidin transcript. The primer, RM165, was derived from bases 3179 to 3145 (Fig. 4) located 65 bp downstream of the initiating methionine codon found in the TAAATG motif at the 5' terminus of the spheroidin gene. Lanes G, A, T, and C show primer extension products from reactions which contained the appropriate ddNTP. Lane N is a control primer extension reaction in which no ddNTPs were present. Shown on the right is the deduced sequence. The beginning of the poly(A) tract is indicated.
DNA cloned into BamHI-digested plasmid pUC9. A 4.51-kb clone, pRH512 (bases 0 to 4504; Fig. 3), was isolated, radiolabeled, and hybridized back to various AmEPV genomic digests. Hybridization to the BamHI-A, EcoRI-D, HindIII-G and -J, PstI-A, and XhoI-B fragments of AmEPV DNA was observed (14). Oligonucleotide RM58 was then used as a primer for double-stranded plasmid sequencing of pRH512 to generate some initial DNA sequence. When a primer complementary to that sequence was prepared and used to sequence back through the RM58 binding site (bases 3983 to 4002), the sequence generated, when translated, yielded the amino acid sequence generated from microsequencing the 6.2-kDa CNBr polypeptide fragment. These results led us to conclude that clone pRH512 contained at least part of the spheroidin gene. Additional sequencing of pRH512 revealed that there was a single Hindlll site at base 931 (Fig. 3) and that the 3' end of the spheroidin open reading frame (ORF) was truncated. The technique of inverse PCR was then used to isolate adjacent 3' DNA clones. A 1.7-kb PCR fragment generated by inverse PCR amplification of ClaI-digested, self-ligated DNA was sequenced with RM83 (see Materials and Methods) as a primer. The relevant ClaI sites are at positions 3485 and 6165 (Fig. 3 and 4). This fragment was used as a probe to isolate pRH827 (307 bp), pRH85 (1.88 kb), and pRH87 (1.88 kb) from the BglII fragment library. Clones pRH85 and pRH87 represent the same cloned DNA fragment in opposite orientations. Sequencing of the inverse PCR product with custom-designed primers confirmed the expected sequence relationship but also revealed a missing 80 bp between pRH827 and pRH85. This 80-bp DNA fragment was provided by a 586-bp DraI
6524
HALL AND MOYER 10
GlL
20
30
50
40
60
:
J. VIROL. 90
80
70
100
MNNKIRRFPNKNLKMPESGINFMSMLFFSKIDNMVYFINPIKYNTNANIAILEKIDDDDETRGKVTFIPIKYLEILYNELVLDPNHINNINFENNIKRKF
VV I7
MERYTDLVISKIPELGFTNLLCHIYSLAGLCSNIDVSKFLTNCNGYVVEKYDKSTTAGKVSCIPIGMMLELVESGHLSRPNSSD 1220
110
120
1230
130
1250
1240
140
150
1260
160
1280
1270
180
170
1290
190
200
GlL
FLFWTIKKYLQDKNININTFITSKKYKGIPLVYMRKSFLKSELSKTRDFSTFATIYDDLDAQIGIPPLGFNPKPKAYPRKHDKSTWLSSGDIYNCIYPLT
VV I7
ELDQKKELTDELKTRYHSIYDVFELPTSIPLAYFFKPRLREKVSKAIDFSQMDLKIDDL-SRKGIHT-GENPKVVKMKIEPERGAWMSNRSIKNLVSQFA 1300 1310 1320 1330 1340 1350 1360 1370 1380 1390
GlL
210 220 230 240 250 260 270 280 290 M-INTDY-DYFHLILFEK---TDKNIATVASSMRCYKLEDRVKFFLMNDKKRFFMFPIIYNDHFTCCVIDKHFDKDKKAAYFFNSSGYIPELIKQNKKYM
VV I7
YGSEVDYIGQFDMRFLNSLAIHEKFDAFMNKHILSYILKDKIK ---- SSTSRFVMFGFCYLSHWKCVI ---- YDKKQCLVSFYDSGGNIPTEFHHYNNFY 1400
1410
300
1420
310
1440
1430
320
1450 350
340
330
360
1460
1470
1480
370
380
390
GlL
FIESDMTIKSHKHYNSTPNTNYAYLYIDVLSEYLNDIF-KNVNYYFFNTFELQYDSPDCGMFNIIFLYYIVYFNIKSKFEFKKLYYSMSFIGDLLASSYR
VV I7
FYSFSDGFNTNHKHSVLDNTNCD --- IDVLFRFFECTFGAKIGCINVEVNQLL--ESECGMFISLFMILCTRTPPKSFKSLKKVYTFFKFLADKKMTLFK 1490 1500 1510 1520 1530 1540 1550 1560 1570
:
GlL
.
.
....
............ .. :... :x::: ... .:: ::::
:
.
400 410 420 430 440 450 460 GALFISRYDINSIDEFKNTLEIFNIKNKKFMELIDMYKKNSNRIMNVCSKIKNDYDSYIDNEKNSLESNI* *X*:.
*Y
..
VV I7
....X::.
....
.:.
.
.
..
.
..
.
:
:
:.
..
.
..
SILF----NLHDL-----SLDITETDNAGLKEYKRMEKWTKKSINVICDKLTTKLNRIVNDDEXLC* 1600 1590 1610 1620 1630
FIG. 7. Comparative homologies of the AmEPV GlL ORF and the 17 ORF from the vaccinia virus HindIII-I fragment (VV 17). The region of homology corresponds to 23.9% identity over 335 amino acids, as determined by TFASTA. Amino acid identities (:), conservative amino acid substitutions (.), spaces inserted for optimal alignment (-), stop codons (*), and boundaries of the initial homology recognition sequence (X) are shown.
fragment which extended from bases 4543 to 5128 (Fig. 3 and 4) cloned into M13. The orientation of the spheroidin ORF on the physical map is shown in Fig. 3. It is interesting to note that the 1.7-kb inverse PCR fragment only hybridized to the AmEPV HindIII-G fragment (14). The amino acid sequence derived from the 8- and 9-kDa overlapping cyanogen bromidegenerated polypeptides is found from nucleotide positions 4883 to 4957, that derived from the 6.2-kDa polypeptide is found from nucleotides 3962 to 4012, and that derived from the 15-kDa polypeptide is found from nucleotides 4628 to 4651 (underlined in Fig. 4). Therefore, all sequences obtained from protein microsequencing were ultimately found to lie within the putative spheroidin ORF. Spheroidin gene sequence. A combination of M13 shotgun sequencing with standard universal and reverse M13 primers as well as custom-designed primers and exonuclease IIInested deletions was used to extend the sequence 5' and 3' to the spheroidin gene (Fig. 3). The spheroidin ORF (GSR) was initially identified by sequencing back through the RM58 oligonucleotide primer binding region as described above. 10
20
30
40
Examination of the AmEPV spheroidin gene sequence (ORF GSR) revealed a potential ORF of 3.0 kb capable of encoding 1,003 amino acids or a protein of about 115 kDa (Fig. 4), a size expected from earlier studies (21) and the data in Fig. 2. The ORF consists of 29% G+C, in contrast to the 18.5% reported for the entire AmEPV genome (20). Inspection of the 92 bases upstream of the initiating ATG revealed only 7 G or C residues (Fig. 5). We also detected the presence of known vertebrate poxvirus regulatory sequences within the 92 bp 5' of the spheroidin ORF. Included are three TTTT TNT early gene termination signals and TAAATG, which presumably represents a late transcription start signal used to initiate transcription and translation of the spheroidin gene. Several adjacent translation termination codons are also present within the 92 bp upstream of the spheroidin ORF (Fig. 4 and 5). Spheroidin gene transcription. The start site for spheroidin gene transcription was determined. A primer complementary to the spheroidin gene sequence beginning 65 bp downstream of the predicted initiating methionine was prepared and used for a series of primer extension reactions. The 50
60
70
80
90
100
G4R
MSIFIYYIFNNRFYIYKRMNTVQILVVILITTALSFLVFQLWYYAENYEYILRYNDTYSNLQFARSANINFDDLTVFDPNDNVFNVEEKWRCASTNNNIF
HM3
MNAITIFFIILSTVAVCIIIFQLYSIYLNYDNIKEFNSAHSAFEFSKSVNTLSLDRTIKDPNDDIYDPKQKWRCVKLDND-Y
:.~~~~~~~~~~~~~~~~.
130
110
G4R HM3
120
140
130
150
140
160
170
150
~
180
~
190
:X.. :
200
160
YAVSTFGFLSTESTGINLTYTNSRDCIIDLFSRIIKI-VYDPCTVE---TSNDCRLLRLLMANTS* VSVSMFGF-KSNGSEIR-KFKNLESCIDYTFSQSTHSDIKNPCILQNGIKSKECIFLKSMF* 220 240 230 260 250
210
FIG. 8. Comparative homologies of the AmEPV G4R ORF and the HM3 ORF of capripoxvirus. The region of homology consists of a 31.7% identical region over a 142-amino-acid overlap, as determined by FASTA. Symbols are defined in the legend to Fig. 7.
6525
AmEPV SPHEROIDIN GENE
VOL. 65, 1991
A 60
50
40
30
20
10
90
80
70
RSIRLNSHKDLPQEYRYVNVHFLIS-YTNNRKSVDKEILDIIKDKQGKINVIFDLLKSSSIESIHNTYKYIEPAENEIIFDTIRKTRMKEMNVSNVIIN... ... . .. :.X:::: VV NTPase RAIRLNSHVLTPPERRYVNVHFIMARLSNGMPTVDEDLFEIIQSKSKEFVQLFRVFKHTSLEWIHANEKDFSPIDNESGWKTL-VSRAIDLSSNKNITNK
G6L
... ..... . .....
1360
1350
..
1440
160
150
140
130
120
110
100 G6L
...
.
1430
1420
1410
1400
1390
1380
1370
...
.
.. .. ..... .. ...
-IKLYPISYCKDYDRATILKGLLNKDTNIVYKDNTAVAKLMIDKDNIPIFIIENDTLIYIADDYYE*
VV NTPase LIEGTNIWYSNSNRLMSINRGFKGVDGR-VYDVD---GNYLHDMPDNPVIKIHDGKLIYIF* 1500 1470 1480 1490 1460 1450
B 60
50
40
30
20
10
100
90
80
70
G6L
RSIRLNSHKDLPQEYRYVNVHFLISYTNNRKSVDKEILDIIKDKQGKINVIFDLLKSSSIESIHNTYKYIEPAENEIIFDTIRKTRMKEMNVSNVIINIK
CbEPV NPH
RSIRLNSHEYLPINYRYVNVHFIISYSNNRKSVDKEMLDIIKNKQGKINVVFDLLKASSIETIHNMHKYIEPVDNEIIFEIIRKTRMKEMNISNVIINLK 580
570 110
G6L
120
130
140
630
620
610
600
590
660
650
640
160
150
LYPISYCKDYDRATILKGLLNKDTNIVYKDNTAVAKLMIDKDNIPIFIIENDTLIYIADDYYE*
CbEPV NPH LYPITYCKDYDRATILKGFLNKDTNIIYDNDTPVAKLIVDNNNLPIFVIENDILIYITNDYY* 720 690 700 710 680 670
FIG. 9. (A) Comparative homologies of the AmEPV G6L ORF and the NTPase I ORF of vaccinia virus. The regions compared show 31.9% identity over a 160-amino-acid overlap, as determined by TFASTA. Symbols are defined in the legend to Fig. 7. Note that the G6L ORF of AmEPV represents a truncated, incomplete sequence. (B) Alignments of the deduced amino acids of the truncated G6L ORF and the 3' end of the NPH I ORF of CbEPV. The two sequences exhibit 78.4% identity over 162 amino acids, as determined by TFASTA.
results are shown in Fig. 6. Complementarity was observed until the AAA of the upstream TAAATG motif (Fig. 4 and 5), indicating that transcription of the gene initiates within the TAAATG element of the proposed late promoter element. Immediately upstream is a 5' tract of noncoded poly(A) on the transcripts. The average length of the poly(A) is greater than 6 bp. Our results are very similar to those reported for the highly expressed cowpox virus ATI gene (24). Analysis of ORFs adjacent to the spheroidin gene. Analysis of the sequence upstream of the spheroidin gene revealed four additional potential ORFs, GlL, G2R, G3L, and G4R (Fig. 3). No significant homologies were found for the small potential polypeptides encoded by ORF G2R or G3L. ORF GiL, however, exhibited a significant degree of homology to ORF 17 found within the HindIII-I fragment of vaccinia virus (Fig. 7), whose function is unknown (36). ORF G4R showed homology to ORF HM3 of capripoxvirus (Fig. 8). In vaccinia virus, the ORF HM3 homolog was found very near the site of an incomplete ATI gene (8). The partial G6L ORF (Fig. 3 and 9) to the right of the spheroidin gene exhibited good homology to vaccinia virus NTPase I (3, 30). This relationship is shown in Fig. 9A. Much better homology (78.4% identity over 162 amino acids) was found between the partial G6L ORF and NPH I of CbEPV (41), another insect poxvirus. The comparative alignments are shown in Fig. 9B.
and that there was a 5' tract of poly(A), both characteristic of late transcripts (24, 32). AmEPV spheroidin gene elements responsible for such high levels of expression may have significance for the development of a generalized eukaryotic expression vector capable of functioning in either a vertebrate or an invertebrate environment (12, 26). The predicted molecular mass of 115 kDa for AmEPV spheroidin agreed well with our observations (Fig. 2) and the 110 kDa reported previously (21). The spheroidin of CbEPV has an apparent size of 50 kDa (40). We have compared the DNA sequences of the spheroidin genes of AmEPV and CbEPV and found little, if any, significant homology between the two spheroidin sequences. Why these two spheroidin proteins, which presumably serve similar functions in both viruses, should be so different is not clear. This disparity is particularly surprising in view of the striking degree of homology between the NTPase I (NPH I) gene fragments of AmEPV and CbEPV (41) (Fig. 9B).
225 kb
AmEPV C
II
B
I F
A