JOURNAL

OF

BACTERIOLOGY, Nov. 1991, p. 6919-6926

Vol. 173, No. 21

0021-9193/91/216919-08$02.00/0 Copyright C 1991, American Society for Microbiology

A Bacteroides ruminicola 1,4-3-D-Endoglucanase Is Encoded in Two Reading Frames OSAMU MATSUSHITA,' JAMES B. RUSSELL,' AND DAVID B. WILSON2* USDA Agricultural Research Service,' and Section of Biochemistry, Molecular and Cell Biology, Cornell U(niversity,2 Ithaca, New York 14853 Received 7 May 1991/Accepted 19 August 1991

Escherichia coli transformed with a plasmid containing a Bacteroides ruminicola endoglucanase (carboxymethyl cellulase [CMCase]) gene produced three immunologically cross-reacting CMCases which had molecular weights of 40,500, 84,000, and 88,000, while B. ruminicola produced CMCases with molecular weights of 82,000 and 88,000. The two B. ruminicola enzymes (purified from culture supernatants) had different N-terminal amino acid sequences, but each enzyme was encoded by the same gene (three independent clones had the same DNA sequence). The 88,000-molecular-weight CMCase (88K CMCase) gene appeared to contain two open reading frames which overlapped for 18 bp and were -1 out of frame, and each open reading frame contained several stop codons near the overlap region. The two 88K CMCase open reading frames had enough DNA to produce a protein of 106K, but the mobility of the enzyme in sodium dodecyl sulfate gels gave a value which was 20% lower. On the basis of the -1 frame shift and the large deviation in theoretical versus actual size, it appears that an unusual event (e.g., ribosomal hopping or RNA splicing) is involved in either the translation or the transcription of the 88K B. ruminicola CMCase gene. The 82K CMCase was completely encoded in the second reading frame, and its size was in agreement with the DNA sequence.

Bacteroides ruminicola is a noncellulolytic, cellodextrinutilizing bacterium that is often present in high numbers in the rumen (5). In a previous paper, we reported the cloning and sequencing of a 1,4-P-D-endoglucanase (carboxymethyl cellulase [CMCase]) gene from B. ruminicola into Escherichia coli (13). E. coli containing the cloned gene produced a CMCase having a molecular weight of 40,500. However, when an antiserum prepared against homogeneous CMCase purified from E. coli was used to probe Western blots (immunoblots) run on concentrated B. ruminicola culture supernatant, the stained bands had molecular weights of 82,000 and 88,000. Both of these cross-reacting proteins possessed CMCase activity. Because B. ruminicola did not produce the 40,500-molecular-weight CMCase (40.5K CMCase), it was clear that in our initial work we had isolated only part of the B. ruminicola CMCase gene. In this report, we describe the purification and N-terminal sequences of the two B. ruminicola CMCases as well as the cloning and sequencing of the rest of the CMCase gene which was present upstream of the 6.0-kb EcoRI fragment in our original pC3 plasmid (13). The results of these experiments show that the 88K protein is encoded by two overlapping reading frames that are -1 base out of frame from each other and that the second reading frame encodes the 82K CMCase. Furthermore, the 40.5K CMCase is encoded by the 3' end of the second reading frame (see Fig. 2).

Media and culture conditions. B. ruminicola was grown in the basal medium described by Caldwell and Bryant (6) supplemented with 0.4% sugar at 39°C. Transformed E. coli cells were selected on LB plates containing 10 g of Bactotryptone, 5 g of Bacto-yeast extract (Difco Laboratories, Detroit, Mich.), 5 g of NaCl, and 15 g of agar per liter, supplemented with 100 ,ug of ampicillin (Sigma Chemical Co., St. Louis, Mo.) per ml, 0.2 mM isopropyl-P-D-thiogalactopyranoside (IPTG), and 0.005% 5-bromo-4-chloro-3indolyl-,B-D-galactopyranoside (X-Gal). DNA manipulations. DNA manipulations were carried out as described earlier (13). Southern and colony hybridization. A nonradioactive DNA labeling and detection kit (Genius; Boehringer-Mannheim Biochemicals) was used for hybridization procedures. To prepare a DNA probe, pC3 plasmid DNA was completely digested with HindIII and EcoRI and a 0.70-kb fragment (see Fig. 2) was isolated by agarose gel electrophoresis and electroelution. The DNA (approximately 1 p.g) was labeled with digoxigenin-11-dUTP by random priming as described by the supplier. For Southern hybridization, B. ruminicola chromosomal DNA was digested with HindIII at 37°C overnight. The digest (0.5 ,ug) was applied to a 0.8% agarose gel which was electrophoresed at 110 V for 3 h. Transfer of the DNA onto a nitrocellulose membrane, hybridization, and immunological detection were performed as described before (13). For colony hybridization, E. coli DHSa cells carrying recombinant plasmids were grown on LB plates containing ampicillin (50 ,ugIml) until small colonies formed; then the plates were chilled at 4°C for 1 h. A nitrocellulose membrane (BA85; Schleicher & Schuell, Inc., Keene, N.H.) was placed on each plate and left for 5 min; then the membranes were removed, placed on filter papers wet with 0.5 M NaOH containing 1.5 M NaCl, and kept for 15 min at room temperature. The membranes were dried on dry filter papers for 5 min at room temperature, neutralized with 1.0 M Tris hydrochloride (pH 8.0) containing 1.5 M NaCl in the same

MATERIALS AND METHODS Bacterial strains, plasmids, and bacteriophage. B. ruminicola B14 was obtained from M. P. Bryant, University of Illinois, Urbana (5). E. coli DH5Sa (3, 10) was used as the host strain for all recombinant plasmids except for Western blotting experiments, in which E. coli JM109 (13) was used. Plasmid pC3 contains a 6.0-kb insert of B. ruminicola DNA in pUC18 and codes for a 40.5K CMCase in E. coli (13). *

Corresponding author. 6919

6920

MATSUSHITA ET AL.

manner as in the denaturation step, baked at 80°C for 2 h, hybridized with the 0.7-kb probe, and developed immunologically. Cloning of the upstream fragment for pC3. B. ruminicola B14 chromosomal DNA was prepared from cells grown in basal medium supplemented with 0.4% sucrose (13). DNA (20 gug) was completely digested with HindIII and electrophoresed on two 0.8% agarose gels. The region containing fragments of 2.5 to 2.7 kb was cut from each gel. DNA was electroeluted and ligated with pUC18 DNA linearized with HindIII. E. coli DHSx cells were transformed with the ligation mix and selected on LB plates containing ampicillin, IPTG, and X-Gal. White colonies were examined by colony hybridization as described before to find ones that contained the fragment upstream of pC3. To ensure that the same fragment was cloned from B. ruminicola B14 grown on cellobiose, another cloning experiment was performed in essentially the same way as described above, except chromosomal DNA was prepared from 20 ml of a cellobiosegrown B. ruminicola culture by using hexadecyltrimethyl ammonium bromide, as described by Wilson (17). PCR amplification. Two synthetic oligonucleotides (5'-CC AGGTTCTTCAAATCAG-3', primer 1; 5'-CTACAGGATT AGAATGAGCGATAT-3', 24mer primer) were prepared on a DNA synthesizer (model 380B; Applied Biosystems, Foster City, Calif.). Chromosomal DNA obtained from B. ruminicola was purified by CsCl-ethidium bromide centrifugation as described by Wilson (17) and digested with HindIll. A sequencing template was prepared by polymerase chain reaction (PCR) in two steps as described by Dorit and Ohara (9). In the first step, a symmetric reaction was carried out with the HindIII-digested chromosomal DNA (500 ng) and the primers (0.5 ,uM each). The reaction conditions were basically the same as those described by Coen and Scharf (8), except that 25 reaction cycles were used and the annealing step was at 37°C. The product was washed on a Centricon 30 (Amicon) to remove unincorporated primers and nucleotides, precipitated by ethanol, and dissolved in 20 ,ul of TE (10 mM Tris hydrochloride [pH 8.0], 1 mM EDTA). In the second step, 1 ,ul of this DNA solution was used in an asymmetric reaction in the presence of 0.5 ,uM primer 1 and 5 nM 24mer primer. The primers and nucleotides were removed on a Centricon 30 and then subjected to ethanol precipitation, and the DNA was dissolved in 40 ,ul of TE buffer. Six microliters of this sample was used for a standard Sequenase reaction, using the 24mer primer. Western blotting. E. coli JM109(pUC18), JM109(pC3), and JM109(pC43) were grown in LB broth containing 50 g.g of ampicillin per ml. B. ruminicola B14 was grown in basal medium supplemented with 0.4% cellobiose. Cells were harvested in mid-log phase by centrifugation at 4,000 x g, resuspended in 1/20 volume of 50 mM potassium ph%sphate (KPi; pH 6.5) buffer, and disrupted in a French pressure cell at 20,000 lb/in2. The B. ruminicola culture supernatant was concentrated 20-fold by using a Centricon 10. The samples were mixed with loading dye, boiled for 3 min, and applied to a 12% sodium dodecyl sulfate (SDS)-polyacrylamide gel as described by Laemmli (12). Proteins were electrophoretically transferred to a sheet of nitrocellulose membrane (BA85; Schleicher & Schuell) (16). The CMCases were identified by indirect enzyme-linked immunological detection, using rabbit antibody raised against the CMCase produced in E. coli (13).

J. BACTERIOL.

RESULTS Purification of the B. ruminicola endoglucanases and determination of their N-terminal amino acid sequences. An overnight B. ruminicola B14 culture grown in basal medium containing 0.4% cellobiose was harvested by centrifugation, and the culture supernatant (2 liters) was concentrated to approximately 20 ml by ultrafiltration, using a polysulfone membrane (PTTK; Millipore Co., Bedford, Mass.). The concentrate was diluted sevenfold with 1 mM KPi (pH 6.5) and reconcentrated by using the same ultrafiltration membrane; this step was repeated three times. The concentrated supernatant was applied to a hydroxyapatite column (1.5 by 10.0 cm; Hypatite C; Clarkson Chemical Co., Williamsport, Pa.). A linear gradient of 1 to 200 mM KPi buffer (pH 6.5; 12.5 bed volumes) was used to elute proteins, and the CMCase activity was eluted in a single peak at approximately 40 mM KPi. To purify the 88K CMCase, this peak was concentrated with a Centricon 30 and applied to a size exclusion column (Superose 12 [Pharmacia, Piscataway, N.J.]; bed volume, 25 ml) equilibrated with 25 mM Tris

m~

0.

FIG. 1. B. ruminicola CMCases. The molecular weights of the purified CMCases were estimated by SDS-polyacrylamide gel electrophoresis. The same samples were also used for the determination of the N-terminal amino acid sequences. (a) 88K CMCase; (b) 82K CMCase. Lanes: 1, molecular weight (103) standards (numbers on the left indicate molecular weight); 2, CMCase; 3, prestained standards (numbers on the right indicate molecular weight).

ENDOGLUCANASE OF B. RUMINICOLA

VOL. 173, 1991 --

-

--

-

6921

-

88K CMCase 1 82K ClUCase

=

140.5K CMCaS I

H,ndlll Bgill 0

E

Hindill B91119 Kpnl9 Bg/ll 5 5 E E..

..m

ScaI

Seal

.

Hindill

¢rl R EQ

Kpnl

Hindlll Kpnl CIa I

a

a

1 1 Pstl

1 Pstl

Scal Pstl

B1gIll

1EcORl

pC3 15.96 kbl pC4 12.61 kbl:

pC43

.

pC42

1 kb {

-

Probe

FIG. 2. Cloning and sequencing of pC4. A closed box indicates the EcoRI-HindIII DNA probe used for identifying plasmid pC4. Plasmid pC4 and its subclones, pC42, pC45, and pC47 (shown below), were used as sequencing templates to obtain the DNA sequences of the CMCase gene. Arrows indicate the sites to which the sequencing primers annealed and the direction of the elongation reaction.

hydrochloride (pH 7.5)-100 mM NaCI, and the CMCase was eluted at 0.46 bed volume. The 82K CMCase was purified from the hydroxyapatite peak by chromatography on a Q-Sepharose (Pharmacia) column (1.5 by 15.0 cm). Proteins were eluted with a 300-ml linear gradient (0.01 to 1 M NaCl in 50 mM Tris hydrochloride, pH 8.0), and the CMCases were eluted at approximately 400 mM NaCl. The active fractions were concentrated with a Centricon 30, and the buffer was replaced with 50 mM Tris hydrochloride (pH 8.0)-10 mM NaCl. The concentrated, desalted sample was applied to a Pharmacia fast protein liquid chromatographic MonoQ column and eluted with a 15-ml linear gradient of 0.01 to 0.6 M NaCI in 50 mM Tris hydrochloride (pH 8.0). Fractions containing CMCase activity were concentrated, and the buffer was replaced with 50 mM KPi (pH 6.5) on a Centricon 30. This sample was applied to the MonoQ column and eluted with a 15-ml linear gradient of 0.01 to 0.6 M NaCI in 50 mM pyridine hydrochloride (pH 5.0). The 82K CMCase was eluted at 210 mM NaCl and barely separated from the 88K CMCase. The purity of the CMCases was tested by SDS-gel electrophoresis (Fig. 1). From this gel, the molecular weights of the CMCases were estimated to be 82,000 and 88,000. The N-terminal amino acid sequences of the CMCases were determined by automated Edman degradation (model 470A; Applied Biosystems) and were Ser-Asn-Pro-Val-Asp-Val(Ser or Val)-Gly-Val-Phe for the 82K enzyme and Ala-Asp-

Lys-Ser-Leu-Ile-Ile-Thr-Phe-Asn-Asp-Asn-Thr-Thr-Gln-IlePhe-Ala-Leu-(Ser or His) for the 88K CMCase. Sequence analysis of the pC3 upstream region. A set of nested deletion plasmids (pC3101, pC3104, pC3105, and pC3106) and their parental plasmid, pC3 (13), were used as sequencing templates to determine the DNA sequence of the fragment upstream of the CMCase gene (Fig. 2). The sequences obtained were aligned by their overlaps to obtain a single continuous sequence. The open reading frame that

encoded the 40.5K CMCase, expressed in E. coli (13), extended to the EcoRI site at the 3' end of the insert DNA in pC3. The N-terminal amino acid sequence of the 82K CMCase was found in this sequence (positions 1198 to 1227 in Fig. 3). There were no differences between the N-terminal sequence of the protein (10 residues) and that predicted by the DNA sequence. However, the 88K CMCase N-terminal sequence was not found in the pC3 insert. On the basis of these results, it appeared that the N terminus of the 88K CMCase was encoded in DNA upstream of the pC3 insert. Cloning of an upstream fragment. With a DNA probe made from a 0.70-kb EcoRI-HindIII fragment of pC3, Southern hybridization was used to determine the size of the HindIII fragment containing the upstream region (Fig. 2). B. ruminicola was grown on sucrose or cellobiose, and the chromosomal DNA was isolated, digested with HindIII, and separated on a 0.8% agarose gel. Both DNA preparations gave the same 2.6-kb band which hybridized with the probe, and this fragment was then cloned to produce pC4. The pC4 plasmid contains a 2.6-kb HindIII insert, and the rightmost 0.70-kb EcoRI-HindIII region of this insert has the same restriction sites as the leftmost 0.70-kb region of pC3 (Fig. 2). To ensure that no mutations were introduced during these cloning procedures, another cloning experiment was carried out with B. ruminicola chromosomal DNA prepared from a culture grown with cellobiose as the carbon source. Two colonies hybridized to the 0.7-kb probe, and these plasmids were examined for their restriction sites, which were identical to those in pC4. The orientation of the insert was the same in all three plasmids, and the two new plasmids were designated pC4a and pC4b. DNA sequencing of pC4. Three subclones were constructed from pC4 to obtain the DNA sequence which was upstream from pC3 (Fig. 2), and in each case universal primers were used as templates for dideoxy sequencing. Synthetic primer 1 (5'-CCTAATTGGGTACTATCC-3'; po-

J. BACTERIOL.

MATSUSHITA ET AL.

6922

1

GCTTCTATAGCTCCTGCTACTTATGATTTTATCTTTATTACTCCAGAAGAAACTAAATTTACTGTAGGTAAGATCATTGTAAATTCTGCA

90

91

GAGAAGACTATTTGGACAGGTAGCTTTGATTGTGCAAATTGGGCAGGCAATCAAGATCTAGCTTGGGGTGGATATGATTGGTCTACTGTT

180

181

AAAGCTGGTCAAGTACTAACTTTCTATTTTACAGCAAACGATACAAGTGCTAGTTGGGTATGTATTAGTGTTCGTCAGGGAGATAAATGG

270

271

GACAATTTACCTAATGCACAGATTGATTTGACTGGATCTTCAACGGTTGCTACTTATACATTGACAGAAACAGCTCTTAATCAGTTAGTC

360

361

AATAATGGTGGTCTCGTTGTTACAGGAAAGGATGTTACAATCACTAAGATTACATTGAAGTAATTCTCTATGTTTTTATAAAATTGGTGG

450

451

CGTGATATATTAATATTACGCCACATTTTACTTGATTAAAACTTTTATTATGAGGAAGTTTATATTAGCTTTTATTTTTGCAATCTGTGT

540

,

L

....

.

.

.

R

M

R A

K

A

A

D

K

S

L I T

F

T

N

D

D

K

M

T

I

V

A

T

I

G

E

D

V

T

N

E

C

V

BR

.

N T

T OT

F

AL

S D

L

P

L

T

S

I

N

G

M

C

V

D

L

S

S

I

V

N

T

K

G

N

K

V

S

T

T

E

F

L

T Q

I N

T

N

V

F

A Q

S

A

D

D

Y

T

V

V

A

E

G

N

S

G

K

Y

Y

L

D

K

V

R

T

F

T

F

G

P

I

I

L

S

G

S

S

N

R

I

Q

E

L

N

L

V

T

F

P

L

S

Q

Q

E

N

K

F

N

G

D

G

T

F

G

V

T

F

N

I

A

H

S

M

K

K

F

I

F

S

L

L

A

T

L

I

S

L

Y

K

Y

L

R

L

V

Y

990

F

A

D

G

T

S

Q

S

Y

K

V

M

E

F

P

1080

N

.. N

Y

M

T

G

F

D

D

F

G

Q

I

N

V

W

N

I

S

D

N P

V

DV

S

G

V

F

L

A

D

A

S

A

N

D

A

A

G

N

K

I

L

S

G

M

M

A

H

V

A

W

N

H

D

E

A

1170

V

1260

K

AAACTCTATAAGTATCTGCGGCTTGTTTATGGTAATAAGATCTTATCGGGTATGATGGCTCATGTAGCTTGGAATCATGATGAGGCCGAT K

900

V

G

AAGAGTGTAACCTTTAATATCGCTCATTCTAATCCTGTAGATGTTTCAGGTGTTTTTTTGGCTGATGCTTCTGCTAACGATGCAGCAAAG K S

810

I

ATTAAGTTTAATGGTGATGGTACCTTTGGCAATTATATGACTGGCTTTGATGATTTCGGTCAGATCAATGTTTGGAACATCAGTGATGTG I K

720

E

*

K

*

Z

630

I

N

TCAGTTTTTGCACAATCTGCAGATGATACTTATGTAGTCAATAAGGCAGATGGGACTTCGCAATCCTATAAGGTGATGGAATTCCCAAAT S

1261

L

G

K

*

1171

I

A

ATATGTTATTAATACAAATGGTAAATCTGTTAAAATAACCAAGAAATGAAAAAGTTTATCTTTTCTCTTTTAGCAACTCTGATTTCTTTT Y

1081

F

TTTTTCTATCAATGGAATGTGTGTTTCTTTGGATAGTACCCAATTAGGTTCGTTTACAGTTCTTAATTTGGAGTCGTTACCTCAGGGAGT F

991

I

AAGTACGGGAATTGAAGATGTTACAAATGAAAAATTGACTTTTGAGGGCAATTCGCTGATAATTCCAGGTTCTTCAAATCAGATTCGTAT S

901

.

F

TAAAATGCAAAATGATAAGATGACTATTGTTGCAGGTTTAACAACTGCAGAGTATGATTTGTATAAAGTCAGAACATTTACATTTGGAGA K M Q N

811

A

pst:I

.

721

L

GTTTCGAGCAAAAGCTGCAGACAAGTCATTGATTATTACGTTTAATGATAATACCACACAGATTTTTGCCTTATCAGATCTTCCTAATAT F

631

I

F

.

.

.

*

541

K

1350

D

FIG. 3. DNA sequence of the pC4 insert. Amino acid sequences printed in boldface and underlined show the N-terminal amino acid sequences determined for the 82K and 88K CMCases.

sitions 858 to 841) was used to fill a sequence gap, and the sequence of the insert (pC4) is shown in Fig. 3. For the overlapping region between pC3 and pC4, sequences of 350 bp from the right HindIII end and 60 bp from the EcoRI site were determined. These sequences were identical to their counterparts in pC3 (13). The HindIII site at position 2071 in Fig. 3 corresponds to the HindIII site at -80 in Fig. 5 of reference 13. The N-terminal amino acid sequence of the 88K CMCase was encoded by the DNA from nucleotides 557 to 616 in

pC4, and there were no differences between the N-terminal sequence determined from the protein and that predicted by the DNA sequence. The open reading frame containing the 88K N-terminal sequence encodes a peptide with a molecular weight of about 18,000. To ensure that the DNA sequence of the region including the overlap between the two open reading frames was correct, the complementary strand was also sequenced by using a combination of another 18mer synthetic primer, primer 1, and three different nucleotide mixtures (Fig. 4). This 130-base-long sequence was identical

VOL. 173, 1991

1351

AAAATACATGTGCTTACAGGTAAATATCCTGCTATCAATTGCTATGATTTTATTCATATAGCTGTACCTAATCAAGGCTCTAATGGATGG K

1441

K

Y

P

A

I

N

C

Y

D

F

I

H

I

A

V

P

N QG

S

N

G

N

Y

N

D

I

T

P

V

T

E

W

A

D

A

G

G

I

V

S

L

M

W

H

F

N

V

P Q

N

T

T

I

G

A

D

G

S

G

Q

G

I

N

S

S

Q

T

T

F

K

A

S

H

A

L

V

S

W

E

N

K

F

F

M

E

Q

M

E

N

V

A

N

V

I

L

K

L

Q

D

A

G

I

V

A

L

P

F

H

E

A

A

G

N

A

T

L

K

S

G

A

N

W

G

K

A

W

F

W

W

G

E

D

G

V

Y

K

Q

L

W

H

T

M

F

N

Y

F

S

N

K

G

I

H

N

L

I

W

E

W

T

S

Q

N

G

D

S

D

I

Y

N

N

D

D

D

W

Y

P

G

D

A

Y

V

D

I

I

G

R

D

L

Y

T

A V

Q Q Y

S

E

Y

S Q

L K G R

Y P

S K M

I

6923 1440

W

1530

N

1620

G

1710

W

1800

P

1890

N

1980

G

ACTACGGCTGTTCAGCAGTATTCTGAGTATAGTCAACTTAAGGGTCGTTATCCATCTAAGATGATTGCTCTCGCAGAGTGTGGTGTTAAT T

2071

G

TATAATGGTGATTCTGATATATATAATAATGATGATGACTGGTATCCAGGTGATGCTTATGTAGATATTATCGGTCGTGATCTCTATGGT Y

1981

T

GATGTATATAAGCAACTATGGCACACGATGTTCAATTACTTTTCCAATAAAGGTATTCATAATTTGATTTGGGAGTGGACTTCTCAGAAT D

1891

L

81ndIll CGTCCGTTCCATGAAGCTGCCGGTAATGCAACTTTAAAGTCTGGTGCAAACTGGGGAAAAGCTTGGTTCTGGTGGGGCGAAGATGGTCCT R

1801

V

ACTTGGGAGAATAAATTTTTCATGGAACAAATGGAAAATGTGGCTAACGTTATACTTAAACTTCAAGATGCTGGTATTGTTGCTCTTTGG T

1711

H

GAAAATACAACTATTGGGGCAGATGGTTCTGGACAGGGTATTAACTCTAGTCAAACTACTTTCAAGGCTAGTCATGCTCTTGTTTCTGGT E

1621

I

ATTAATTATAATGATATTACTCCGGTTACTGAATGGGCTGATGCTGGTGGTATTGTTAGTCTGATGTGGCATTTCAATGTGCCTCAAAAT I

1531

ENDOGLUCANASE OF B. RUMINICOLA

2070

A L A E C G V N

AATTCTACTATTACAGCTGATGTTGAACAAGCTT N

S

T

I

T

A

D

V

E

Q

A

FIG. 3-Continued.

to the complementary sequence predicted by the original sequence. As a further check, the insert DNAs in both plasmids pC4a and pC4b were completely sequenced, using forward and reverse universal primers, the two synthetic 18mer primers, and the 24mer primer used for PCR; both sequences were

identical to that of pC4. Direct sequencing of B. ruminicola chromosomal DNA by PCR. To ensure that cloning artifacts did not alter the DNA sequence, a region of B. ruminicola chromosomal DNA (from 785 to 1,210 bases) was amplified in vitro. The singlestrand product was then sequenced by a chain termination method, using Sequenase. The latter sequence was identical to the sequence determined from the plasmid DNAs (Fig. 5). Reconstruction of the entire CMCase gene and its expression in E. coli. pC3 DNA which had been completely digested with KpnI was separated on a 0.8% agarose gel, and a 3.1-kb KpnI fragment containing the 40.5K CMCase gene was isolated by electroelution. This fragment was then ligated with pC42, which was linearized with KpnI. The resultant plasmid, pC43, encoded the entire CMCase gene. Cell extracts from E. coli clones and from B. ruminicola were prepared as described above. The culture supernatant of B. ruminicola was concentrated by ultrafiltration. Western blotting and immunological detection showed that E. coli containing pC43 produced two larger CMCases, but in lesser amounts than the 40.5K CMCase (Fig. 6). The largest E. coli CMCase had the same molecular weight as the 88K B. ruminicola CMCase, while the next largest appeared to be

slightly larger (84K) than the comparable B. ruminicola CMCase. DISCUSSION The simplest explanation for the presence of two different reading frames in the gene encoding the 88K CMCase is a sequence error. Since there are multiple stop codons at the 3' end of the initial reading frame and at the 5' end of the second reading frame, the only place one or two errors in the sequence could produce a single reading frame is in the 18-bp overlap region. We are confident that there is no error in this region: (i) it was sequenced in three independent clones; (ii) there were no differences between the sequences determined for both strands; and (iii) the sequence of a PCR product from B. ruminicola chromosomal DNA is identical to the sequence of cloned DNA. Another possibility would be the presence of multiple copies of the CMCase gene. However, our previous work (13) showed that only a single fragment from either an EcoRI or HindIII digest of B. ruminicola DNA hybridized to the 1.2-kb HindIII fragment from pC3. Another reason to propose that an unusual event is occurring is the difference between the molecular weight of the 88K CMCase and the molecular weight predicted from the DNA sequence of the entire gene, 106,000. SDS-gel electrophoresis sometimes gives aberrant molecular weights, but the CMCase is a soluble protein with an acid isolectric point. Because SDS-gel electrophoresis almost always gives accurate estimates for such proteins, it is

6924

=~ a

J. BACTERIOL.

MATSUSHITA ET AL.

ear...- un a:

_-_

urn

..

~

rn

-

_ _

_

_0rw 5t e

_M

___n-.

-

.a

as _

*

~_n

_-e .rU_r =

_

C0

em

-

_= 0

-

0

X'

-,

0

S

-

-

e

-

-

* Um

'l

FIG.4.Sqecdldesfrteoelpigrgo ewe thetw oenredig raes.Plsmd C4an pimr 2 asnhtc l8mer) wer usdt.euneti ein h olwn uloie n h eunc ecins GP CdGP(-eza weeue dexgansn rihshae,ad IT Nmeson h et indicate~ ~ oiin th~ ~ ftencetdsi i.3

FGW.Drc eunin fB uiioacrmsmlDA a mlfe ytecmiaino hoooa N ymticada symti C..Tepoutwssqecdb uin ytei 2mrpie addT.Nmer ntelf ndct h oiin fncetdsi i.3 h

ENDOGLUCANASE OF B. RUMINICOLA

VOL. 173, 1991 1

2

3

4

5

6 - 211

-107

-69

-46

-29

FIG. 6. Western blotting of E. coli and B. ruminicola CMCases. Lanes: 1, extract of E. coli JM109(pUC18); 2, extract of E. coli JM109(pC3); 3, extract of E. coli JM109(pC43); 4, extract of B. ruminicola B14; 5, concentrated culture supernatant of B. ruminicola B14; 6, prestained molecular weight standards (numbers on the right indicate molecular weights [103]).

unlikely that the 20% difference in molecular weight is an artifact. Cavicchioli et al. (7) reported recently that the CMCase of another ruminal bacterium, Fibrobacter succinogenes, also had a CMCase with two reading frames that differed by a -1 frame shift. At this time we have no explanation for an unusual coding mechanism in two CMCase genes from ruminal bacteria. The only sequence similarity between the overlap regions of the two CMCase genes is the sequence AUGAA, which codes for Met-Lys in each gene. The fact that E. coli produces the 88K CMCase protein when the entire gene is present shows that the unusual event also occurs in E. coli. However, the major protein produced from the CMCase gene in E. coli is a 40.5K protein which B. ruminicola does not produce. In a previous paper, we showed that the 40.5K CMCase gene contains three closely linked E. coli promoters and a ribosome binding site that are all present in the second reading frame (13). On the basis of these results, it seems that the 82K CMCase gene resulted from the fusion of two genes. The 88K protein may have resulted from a second gene fusion that created a frameshifting sequence allowing ribosomes initiating at the AUG in the first reading frame to translate correctly the second reading frame. There are three reasons why we believe that the C-terminal ends of the 40.5K, 82K, and 88K CMCases are identical and are encoded by the 3' end of the second reading frame. First, plasmid pC43 produces three enzymes in E. coli, showing that the 4.2 kb of B. ruminicola DNA present in this plasmid encodes all three enzymes. Second, all three enzymes show similar activities on CMC and xylan. Third, the 82K and 88K proteins react strongly with an antibody which was prepared against homogeneous 40.5K CMCase. The DNA encoding the N terminus of the 88K CMCase is preceded by a region that encodes a potential signal sequence, and the AUG initiating the signal sequence is

6925

preceded by a weak potential ribosome binding site (ACTTGAT). This sequence is 14 bases upstream from the initiating codon, and 5 of 7 bases are complementary to the 3' end of B. ruminicola 16S rRNA. Since the sequence preceding the N terminus of the 88K CMCase fits the consensus sequence for signal peptidase cleavage in E. coli, it seems likely that the 88K CMCase results from cleavage of an initial translation product by signal peptidase. The N-terminal sequence of the 82K CMCase is located 84 amino acids downstream from the probable initiation codon in the second reading frame (nucleotide 946). This initiation codon is followed by a potential signal sequence, but the specificity of the known signal peptidase would not allow cleavage at the N terminus of the 82K CMCase. This means that either the 82K protein is produced by proteolytic cleavage of the 88K CMCase or, more likely, an additional cleavage besides the removal of the signal sequence is required to produce the 82K CMCase. The 84K CMCase produced by E. coli could result from the absence of the protease needed for this cleavage. A possible mechanism for producing the 88K protein is programmed frameshifting (2, 15), which would require the deletion of about 180 amino acids from the combined reading frames as well as a -1 frame shift to bring the second reading frame in phase. Frameshifting must occur between the end of the N-terminal sequence of the 88K CMCase and the stop codon at position 947. There is a potential "slippery run" AAAAT at position 932 just upstream of the stop,codon, but at this time we do not know whether it is required for the frame shift. There are several examples of frameshifting in procaryotes (11, 18). In the case of gene 60 in phage T4, 50 nucleotides are untranslated, and it is proposed that they are present in a stem-loop structure so that the ribosome can move directly in translating the protein (11). In our case an even larger loop of about 500 nucleotides would be needed. Another possible explanation is RNA splicing; we have not examined the CMCase mRNA to determine whether splicing has occurred. The role of the N-terminal domains present in the 82K and 88K CMCases is not clear since the 40.5K protein has the same activity in hydrolyzing CMC, xylan, and swollen cellulose as the other two enzymes. Furthermore, none of the enzymes bind to cellulose, so that the N-terminal segment does not contain a cellulose binding domain (1, 14). A computer search of the protein data bank did not turn up any proteins showing significant homology to the amino acid sequence of the N-terminal domain of the 82K or 88K CMCase. ACKNOWLEDGMENTS We thank Chieko Matsushita, Diana irwin, Elyse Jung, Gui Fang Lao, Jon Stewart, and Kathy Dusinberre for invaluable discussions and assistance in preparing the manuscript. We also thank Theodore Thannhauser for protein sequencing and preparing synthetic oligonucleotides and John Telford for photographic services. This work was partly supported by grant 85-CRCR-1-1880 from the U.S. Department of Agriculture and by the United Dairy Forage Research Center, Madison, Wis. 0. Matsushita was supported by U.S. Department of Agricultural postdoctoral fellowship RA-87-23. REFERENCES 1. Abuja, P. M., M. Schmuck, I. Pilz, P. Tomme, M. Claeyssens, and H. Esterbauer. 1988. Structural and functional domains of cellobiohydrolase 1 from Trichoderma reesei. Eur. Biophys. J. 15:339-342. 2. Atkins, J. F., R. B. Weiss, and R. F. Gesteland. 1990. Ribosome gymnastics-degree of difficulty 9.5, style 10.0. Cell 62:413-423. 3. Bethesda Research Laboratories. 1986. BRL pUC host: E. coli

6926

MATSUSHITA ET AL.

DH5a` competent cells. Bethesda Res. Lab. Focus 8:9. 4. Birnboim, H. C., and J. Doly. 1979. A rapid alkaline extraction procedure for screening recombinant plasmid DNA. Nucleic Acids Res. 7:1513-1523. 5. Bryant, M. P., N. Small, C. Bouma, and H. Chu. 1958. Bacteroides ruminicola n. sp. and Succinimonas amylolytica the new genus and species: of succinic acid-producing anaerobic bacteria of the bovine rumen. J. Bacteriol. 76:15-23. 6. Caldwell, D. R., and M. P. Bryant. 1966. Medium without rumen fluid for nonselective enumeration and isolation of rumen bacteria. Appl. Microbiol. 14:794-801. 7. Cavicchioli, R., P. D. East, and K. Watson. Submitted for publication. 8. Coen, D. M., and S. J. Scharf. 1990. Enzymatic amplification of DNA by the polymerase chain reaction: standard procedures and optimization, p. 15.1.1-15.1.7. In F. M. Ausubel, R. Brent, R. E. Kingston, D. D. Moore, J. G. Seidman, J. A. Smith, and K. Struhl (ed.), Current protocols in molecular biology, vol. 2. John Wiley & Sons, Inc., New York. 9. Dorit, R. L., and 0. Ohara. 1990. Direct DNA sequencing of polymerase chain reaction products, p. 15.2.1-15.2.11. In F. M. Ausubel, R. Brent, R. E. Kingston, D. D. Moore, J. G. Seidman, J. A. Smith, and K. Struhl (ed.), Current protocols in molecular biology, vol. 2. John Wiley & Sons, Inc., New York. 10. Hanahan, D. 1983. Studies on transformation of Escherichia coli with plasmids. J. Mol. Biol. 166:557-580. 11. Huang, W. M., S.-Z. Ao, S. Casjens, R. Orlandi, R. Zeikus, R.

J. BACTERIOL.

12. 13. 14.

15.

16.

17.

18.

Weiss, D. Winge, and M. Fang. 1988. A persistent untranslated sequence within bacteriophage T4 DNA topoisomerase gene 60. Science 239:1005-1012. Laemmli, U. K. 1970. Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature (London) 227:680-685. Matsushita, 0., J. B. Russell, and D. B. Wilson. 1990. Cloning and sequencing of a Bacteroides ruminicola B14 endoglucanase gene. J. Bacteriol. 172:3620-3630. Ong, E., J. M. Greenwood, N. R. Gilkes, G. Kilburn, R. C. Miller, and R. A. J. Warren. 1989. The cellulose-binding domains of cellulases: tool for biotechnology. Trends Biotechnol. 7:239-243. Parker, J. 1989. Errors and alternatives in reading the universal genetic code. Microbiol. Rev. 53:273-298. Towbin, H., T. Staehelin, and J. Gordon. 1979. Electrophoretic transfer of proteins from polyacrylamide gels to nitrocellulose sheets: procedure and some applications. Proc. Natl. Acad. Sci. USA 78:4350-4354. Wilson, K. 1990. Preparation of genomic DNA from bacteria, p. 2.4.1-2.4.5. In F. M. Ausubel, R. Brent, R. E. Kingston, D. D. Moore, J. G. Seidman, J. A. Smith, and K. Struhl (ed.), Current protocols in molecular biology, vol. 1. John Wiley & Sons, Inc., New York. Wong, S., and A. T. Abdelal. 1990. Unorthodox expression of an enzyme: evidence for an untranslated region within carA from Pseudomonas aeruginosa. J. Bacteriol. 172:630-642.

A Bacteroides ruminicola 1,4-beta-D-endoglucanase is encoded in two reading frames.

Escherichia coli transformed with a plasmid containing a Bacteroides ruminicola endoglucanase (carboxymethyl cellulase [CMCase]) gene produced three i...
2MB Sizes 0 Downloads 0 Views