199

Biochimica et Biophysica Acta, 1131 (1992) 199- 202 © 1992 Elsevier Science Publishers B.V. All rights reserved 0167-4781/92/$05.00

BBAEXP 90356

Short Sequence-Paper

Identification of six open reading frames from a region of the Azotobacter uinelandii genome likely involved in dihydrogen metabolism Jack Chien Chen and Leonard E. Mortenson Department of Biochemistry, Unicersity of Georgia, Athens, GA (USA) (Received 10 March 1992)

Key words: DNA sequencing; Hydrogenase; Ni, FeS cluster; Protein assembly; Electron carrier; (A. uinelandii)

We reported earlier the identification of two Azotobacter uinelandii open reading frames (ORFs), ORF1 and ORF2, downstream from the hydrogenase structural genes (Chen, J.C. and Mortenson, L.E. (1992) Biochim. Biophys. Acta 1131, 122-124). Sequencing of 6008 base pairs of DNA immediately downstream from ORF2 revealed six additional ORFs (ORF3 through ORF8). All six ORFs are transcribed from the same DNA strand as that of the ORF1 and ORF2. Deduced amino acid sequences of ORF3 through ORF5, and those of ORF4, ORF5, ORF7 and ORF8 have strong homology with genes required for dihydrogen (H e) metabolism in Rhodobacter capsulatus and in Escherichia coli, respectively. ORF4, ORF5, ORF6 and ORF8 would encode for polypeptides containing one or more 'Cys-X-X-Cys' motifs. The predicted products of ORF5 and ORF6 each contain a histidine-rich region, and the product of ORF5 also includes a 'Cys-Thr-Val-Cys-Gly-Cys' region near its aminoterminus. Implications of these findings with respect to metal binding, transport and incorporation, to hydrogenase assembly and to H 2 metabolism are discussed.

Hydrogenases catalyze the reversible oxidation of H 2 and occur in diverse prokaryotes and eukaryotes [1]. Hydrogenase from the gram-negative, obligately aerobic, dinitrogen (Nz)-fixing bacterium Azotobacter uinelandii is a heterodimeric, membrane-bound, Niand Fe-containing enzyme [2] that functions as a H eoxidizing enzyme in vivo. The structural genes for A. uinelandii hydrogenase have been cloned and sequenced [3]. Studies from several organisms with established hydrogenase structural genes revealed that additional genes a n d / o r genetic loci are required for production of active enzymes a n d / o r for enzymatic activities coupled to respiration [4-8]. Many of these supporting g e n e s / g e n e t i c loci are located downstream from the structural genes.

Correspondence to: L.E. Mortenson, Department of Biochemistry, University of Georgia, Athens, GA 30602, USA. The sequence data reported in this paper have been submitted to the EMBL/Genbank Data Libraries under the accession number X63650.

We reported earlier the identification of two closely spaced A. vinelandii ORFs (ORF1 and ORF2). These two ORFs are located downstream from the structural genes for hydrogenase [9]. The remaining 5757 base pairs (bp) of cloned A. uinelandii DNA on pJCC1 and pJCC8 [9], immediately downstream from the end of ORF2, were subsequently sequenced using techniques reported earlier [9] with slight modifications when needed [10]. Five complete ORFs (ORF3 through ORF7) and one incomplete O R F ( O R F 8 ' ) were identified from his sequence using techniques reported earlier [3]. To complete the sequencing of ORF8, the 2.7 kb B a m H I fragment was first isolated from cosmid pALM21 [3] and ligated to plasmid p G E M - 7 Z f ( + ) (Promega). The ligated products then were transformed into E. coli strain JM109 [11]. The resulting plasmids (pJCC143 and pJCC146) carried opposite orientations of the 2.7 kb DNA insert with respect to the vector. For sequencing the remaining portion of ORF8 ('ORF8) present in pJCC143 and pJCC146, several oligonucleotides were used. An oligonucleotide identical to a sequence within O R F 8 ' was used to sequence pALM21 directly to ensure the continuity between O R F 8 ' and ' O R F 8 (251 bp).

200

1 5' - G C G C C C T G G A G G C A C T G G C C G G A C G C C T G C A C G T G G A G G T G C G C c T G c ~ C G G T G T C A T C c G c G C C G T c G A T A C C C G A C T G C A G C G C C C G C T G c C G C A G A T T T C C c G C C T G C T C G T C G

121 = , o A c c o c o o A ~ c o G c ~ o c ~ J c G ~ G G c % G c ~ V G G G ~ G ~ ~ o G ~ G I ~ o ~ G ~ G ~ 361 G G c c ~ c ~ G G ~ ~ G ~ G ~ c c G c ~ G G ~ o G ~ c ~ G v G T o L o ~ v o G ~ I ~ c c G ~ G ~

841 ~ G ~ c G ~ G ~ c G ~ o ~ G T o ~ o G G G v ~ G ~ ~ G c ~ d o c ~ G ~ T ~ o ~ T I c ~ R V C L D D G A V G T W Q L L A P T D W N F H A D G P L R 961 G G G A G G T C G A G G C G C T G C T G C G C G A A C T G A T C C T C G C G C T C G A T C C C T G C G T C GC T T T C G A G G T G A A G A T C G T C C A T G C A T G A A A T G T C E V E A L L R E L I L A L D P C V A F E V K I V H A *

R R L C G V R V GATCGCCGAGGGCATCGTCCAGCT

A A G TCTCGAA

108~~°°c~°~°~G~A~T~c~Tc~°°°c~°~°~°°~AT~°~°~T~°~Tc°~G~G~Ti~GcG~Gc~GGG~G 1201G~:Iv~cGc%~oGG~T:~GGGG~ccG~c~G~GGLIoGI~IT:~::~o~ocG~G:GcG~G:cL~GG 3~2~~ o ~ G ~ G , ~ G ~ o G ~ G G T ~ o ~ G ~ T o ~ G ~ G G ~ G T ~ o G ~ o ~ o ~ o G ~ o G o G % G d c ~ T ~ c ~441~ ° ~ G L ~ c ~ c c G ~ J c ° ~ c ~ ° c c ~ ° ° c ~ ° ~ d G ~ c ~ G ~ ° c ~ c ° L G ° ~ ~ ° ~ c ~ ° ~ ~6~ ~ G ~ T ~ c ~ d ° ~ ° ~ ° G c ~ J T ~ i ~ J ~ T c ~ ° ~ G G ~ c G A ~ c ~ ° ~ G ~ i ~ i ~ d T G ~ c G ~ T ~ c ~ 1081~ c ~ G ° ~ c G c ~ G G ~ ° ~ G G G ~ T G d c ~ G G G ~ G c ~ c ~ c G ~ ° ~ G c ~ c ~ ° G L ° G L ~ A ° ~ c ~ G G ~ c G ~ ~80~~ G G v ~ G T ~ c i ~ L i ~ ° ~ c i ~ ° G G G ~ G c ~ i ° G ~ c G ~ G ~ T G ~ G ~ G ~ G ° ~ ° ~ ° G d ~ d ~ G ° ~ ° ~ 2041 G ~ c ~ A ~ L i c A ~ c ~ ° ~ G ° L ~ G ~ c ~ G ~ G G ~ c ~ c % ° ~ A T ~ T c ~ G T ~ G c ~ ° i ~ G ~ c L ~ ° G ~ L c G G ~ T ~ G d ~ F H A A D L M L L N K 2161 GTGTCTCGGCGCGCAGCGGCGAGGGCATGGGCGAG

T D L L P H L D F D V E A C I A Y A R R V N P D I E V I R TGGCTGGCC TGGATCGAGCGTCAGCGCGGCACCCGCC TGCGCGCGCGGATCGACGCGC TCAGGGAACAGGC TCGGGCC C TGGAGG

2641 AGAACTCTTCGATCCGCACGACCGCCGCTGGCGCCATCCCT

G 2881

F

E

L

C

P

E

C

CGGCCGGCGCCTGTCCGGCGACC

R

R

E

Y

E

TCATCAACTGCACCCACTGCGGCCCGCGC

D

P

A

CCATCGAACGGGCCCTGGCGGCCC

D

R

R

F

H

A

Q

TACAGCCTGATCCGCCGCC TGCCCTACGACC GCGTCCAGACCAGC CTGGC

P

I

A

C

P

A

C

G

P

R

L

C

C

E

D

A

S

TCGCGCGCGGCGAAATTC TCGCCCTGCGCGGGG TCGGCGGC TTCCATCTGGCC TGCGATGCGCGCAACGCCGG

3001~ I ~ o ~ G ~ A c ~ c % G c G G c ~ c G h c G c G ~ G G ~ G c ~ G ~ o G ~ c o ~ J ~ o ~ o ~ G ~ o ~ o ~ G G~ccG~ 31~ o ~ j = G ~ c G G ~ G ~ G G G G G G G ~ c ~ A T G ~ T ~ c G ~ c G G G ~ o ~ T c ~ c c ~ o G c G ~ c I , ~ G ~ G ~ c G 3~4~~ G o o G G ~ J G / c ~ h ~ G c % I ~ o ~ c G G ~ G o o G ~ c ~ o G ~ I ~ G o ~ G G o G ~ c ~ o ~ G G G G ~ c I T ~ 3361~ G w c ~ c I G o ~ o o G ~ c ~ I ~ I ~ A G ~ o o ~ T ~ G c G ~ G ~ d ~ G G G ~ o c ~ I G ~ T ~ T ~ G ~ToG~oGGo~ 3601

TGCCTGGCTGTCCCCGCACAACGGCGATC

TGGACGATGCCGATACC TGCCGCGCCCTGGAGCGCACGGTGGCACGGCTCGGCGAGTTGC

TCGGCATCCGTCCCGAGCGCGTCGC

C TGCGA

4561 T C A G G C C T G G G C G G T G C T G C T G G C G G C G G A C A T C C C A T C G A C A C G A G G A C A A G C G T G A T G T G C C T G G C C A T T C C G G T C C G C A T C G A G G A A C T G C T C G A C G A G C A G A G T G C G G T C G C C T G C

4801 ACCC TGGCGC TGCTCGCCGAGTTGGGCCGATTGGCCGAGGCCGAGCAGGCCGCGCAGGGAGAGGCGCCATGAAGTACG T

L

A

L

L

A

E

L

G

B

L

A

E

A

E

Q

A

A

Q

G

E

A

P

T C G A C G A A T T C C G C G A C G G C G C C A C C G C C'CG C C A G C T C G C C G

*

~61 ~ 7 T ~ T ~ A ~ ° T ~ c ~ c ~ ~ ° ? ~ ? ° ~ c ~ 7 ~ ° ~ c [ T ~ ° ~ ? ~ ° ~ c ~ ° ~ y c ~ ~81 A ° c T ° ~ c ~ ° ~ % ~ T ~ 7 ° ~ T c ~ ° ~ % ' ° ~ T ~ v ~ T ~ : ~ = c ~ c ~ c ~ ~ ° ~ 7 % ~ ° 7 c ~ y ~ y ~ 6001 CCCGATGA- 3 ' R *

201 TABLE I

Predicted properties of the six ORF-encoded polypeptides a ORF

Number of amino acids

Molar mass (Da)

Theoretical isoelectric point

3 4 5 6 7 8

342 113 303 755 84 379

36359 12610 33 185 80528 8983 41227

7.76 4.75 6.73 7.62 4.28 8.33

,t For the nt positions of each ORF, see legend of Fig. 1.

Complete nucleotide (nt) and deduced amino acid sequences are shown in Fig. 1. A CTG codon (leucine) is tentatively designated as the first codon for ORF3, while each of the other five ORFs begins with a typical A T G codon (methionine). Except for the space between the end of ORF5 and the beginning of ORF6 where a 59 base gap was found, all other adjacent ORFs are either closely spaced or briefly overlapped. At least one putative ribosomal binding site [12] was found in front of each ORF. All six ORFs are transcribed from the same DNA strand as that of the ORF1 and ORF2 [9]. No DNA sequences similar to the E. coli consensus promoter sequences [13], - 3 5 (TTGACA) and - 1 0 (TATAAT), were found. Similarly, no sequences similar to the catabolite repressor protein-binding site (AANTGTGAN2TNaCA; Ref. 14) and to the conserved nil promoter upstream sequence (TGTN~0ACA; Ref. 15) were found. A sequence (the reverse complementary sequence of nt No. 1811 to No. 1826 in Fig. 1) similar to the consensus sequence of R N A polymerase-~r54-dependent promoters (TGGCACNsTTGCA; Ref. 16) was found. However, no meaningful ORFs of the reverse complementary strand of Fig. 1 were found. Several transcription-terminating sequences were identified using the T E R M I N A T O R program of the U W G C G software [17]. Most of the sequences are GC-rich (data not shown). A summary of predicted properties of the six putative polypeptides is listed in Table I. The encoded products of ORF4, ORF5, ORF6 and ORF8 have one or more 'Cys-X-X-Cys' motifs. This stretch of amino acids can play a role in the binding of Fe-S clusters [18]. The predicted product of ORF5 has a cysteine-rich motif near its amino-terminus followed by a histidine-

rich stretch of 13 amino acids. A histidine-rich region was also found in the encoded polypeptide of ORF6. Two 'Cys-X-X-Cys-X 18-Cys-X-X-Cys' motifs, separated by a stretch of 24 amino acids, were identified near the amino-terminus of the deduced amino acids sequence of ORF6. These motifs resemble the zinc-finger structures of the 'C4' subgroup [19] and of several eukaryotic protein kinases [20]. Several amino acids are conserved at identical positions in the two 'X18' structures. An amino acids sequence, G L G A D G R , was identified from ORF6 amino acids No. 511 to No. 517 that is similar to a consensus sequence found among ATP-requiring enzymes [21]. The former sequence is flanked by additional glycine-rich sequences, suggesting the need for further structural flexibility in that region. The polypeptide encoded by ORF7 has the highest percentage of hydrophobic residues among the six polypeptides (L + I + M + V is 32.1% of the total number of amino acids of ORF7). No signal peptide sequence similar to the one observed in one of the hydrogenase structural genes in A. tqnelandii [3] was identified from the six polypeptide sequences. A survey of genes involved in H 2 metabolism found in other organisms revealed that several of the deduced amino acids sequences of the six A. r,inelandii ORFs shared strong homology with some of the genes in E. coli [5] and R. capsulatus [6], yet none of the former ORFs are similar to any hydrogenase structural genes at the amino acids level. Furthermore, a survey of the SwissProtein bank database (release 20.0, 12/91) using the FASTA program of the U W G C G software [17] revealed that the deduced amino acid sequences of ORF4 and ORF7 shared sequence homology with those of the OrfB and OrfC from Proteus ~'ulgaris, respectively. The latter Orfs are adjacent to the fumarate reductase operon, but have no established functions [22]. A summary of the comparisons is shown in Table II. Based on high amino acid sequence homology between several of the six putative gene products and proteins involved in H 2 metabolism in other bacteria, functions for these products in H 2 metabolism are proposed. It is likely that the products affect production of hydrogenase by altering the transcription (or translation) of hydrogenase structural genes or by affecting hydrogenase holoprotein assembly. Alternatively, some of them may affect H 2 oxidation indirectly

Fig. 1. Nucleotide and deduced amino acid sequences of the six A. cinelandii ORFs. Putative ribosomal binding sites (RBS) are underlined. ORF3: nt 15 (based on the position of the first possible RBS) to 1043, ORF4: nt 1036 to 1377, ORF5: nt 1380 to 2291, ORF6: nt 2351 to 4618, ORF7: nt 4618 to 4872, ORF8: nt 4869 to 6008. The 'Cys-X-X-Cys' motifs, the cysteine- and histidine-rich regions and the potential ATP binding site are also underlined. Stop codon is shown as an ' * ' sign, and the B a m H I site at nt No. 5752 defines the boundary of O R F 8 ' and ' O R F 8 in the text. Because nt 1 to nt 1035 include regions of 75% to 80% GC, cautious interpretation of the data is advised (nucleotide 1 is the nt immediately 3' to the end of a 744 nt sequence reported earlier [9]. The distance between the end of ORF2 and the tentative beginning of ORF3 is therefore 14 nt).

202 TABLE ll Summary o f deduced antino acid sequence homology comparisons ~

rinelandii

Species gene/ORF

Identity (%)

Similarity (%) "

ORF4 ORF5 ORF7 ORF8

E. coli hypA c hypB dx hypC hypD d

37.2 63.6 31.3 47.11

62.8 76.7 62.7 68.5

5 5 5 5

ORF3 ORF4 ORF5

R. capsulatus ORF2 33.2 ORF3 c 53. l ORF4 ~'g 59.3

54.8 711.8 74.4

6 6 6

ORF4 ORF7

P. t'ulgaris OrfB " OrfC

66.4 63.9

22 22

A.

Reference

additional evidence is required before such roles can be assigned. Sequencing of DNA downstream from ORF8, and genetic and biochemical characterizations of these six potential gene and gene products are underway. We thank Drs. R.L. Robson, A.L. Menon and R. Jones for research consultations, and Dr. A.L. Menon for generously providing pALM21 prior to publication. This research was supported by a grant from the Franklin College of Arts and Sciences, University of Georgia, and by a grant from the U.S. Department of Energy (grant No. DE-FG09-86ER13614). References

35.4 34.9

;' The GAP program of the UWGCG software [17] was used for comparisons. Parameters were: gap weight: 3.111) and gap length weight: 0.1/). b Percent of similarity includes percent of identity. The 'CXXC' motifs mentioned in the text are among the conserved residues. d The "CTTCGC' motif is the corresponding motif of its A. l,inelandii counterparts mentioned in the text. No regions of highly histidine-rich residues were observed. The 'CTVCGC" motif mentioned in the text is completely conserved. Three histidine-rich regions, each with alternating -HX- residues, were seen near the amino-terminal part of the O R F (amino acids 44 54, 611-70 and 82-94).

by serving as electron acceptors and donors in a specific electron transport chain from H 2 to dioxygen, which is similar to the possible functions of polypeptide encoded by ORF1 [9]. Specifically, the cysteine motifs probably are involved in transport of metals (Fe, Ni), in metal cluster synthesis, and in assembly and insertion of metal clusters into hydrogenase holoprotein. These cysteine motifs may also serve as ligands to metals in the redox centers of the proteins. A Ni-related function is proposed for the encoded product of ORF5, since mutants of the E. coli hyp B gene (the homolog of ORF5, see Table II) were complemented by a high Ni ion concentration in the medium [5]. The histidine-rich regions in products encoded by ORF5 and ORF6 may also be involved in Ni assimilation, since a similar histidine-rich region was found in an encoded product of one of the accessory genes required for urease (a Ni-containing enzyme) activity of Proteus mirabilis [23] and Klebsiella aerogenes [24]. Although the polypeptide encoded by ORF6 includes amino acids similar to DNA- and ATP-binding domains of other proteins,

1 Adams, M.W.W., Mortenson, L.E. and Chen, J.-S (1981) Biochim. Biophys. Acta 594, 105-176. 2 Seefeldt, L.C. and Arp, D.J. (1986) Biochimie 68, 25-34. 3 Menon, A.L., Stults, L.W., Robson, R.L. and Mortenson, L.E. (1990) Gene 96, 67-74. 4 Friedrich, B. (1990) FEMS Microbiol. Rev. 87, 425-430. 5 Lutz, S., Jacobi, A., Schlensog, V., Bohm, R., Sawers, G. and Bock, A. (1991) Mol. Microbiol. 5, 123-135. 6 Xu, H.-W. and Wall, J.D. (1991) J. Bacteriol. 173, 2401-2405. 7 Tibelius, K.H. and Yates, M.G. (19891 FEMS Microbiol. Len. 65, 53-58. 8 Ford, C.M., Garg, H., Garg, R.P., Tibelius, K.H., Yates, M.G., Arp, D.J. and Seefeldt, L.C. (1990) Mol. Microbiol. 4, 999-1008. 9 Chen, J.C. and Mortenson, L.E. (1992) Biochim. Biophys. Acta 1131, 122-124. 10 Fawcett, T.W. and Bartlett, S.G. (1990) BioTechnology 9, 46-48. 11 Yanisch-Perron, C., Vieira, J. and Messing, J. (1985) Gene 33, 103-119. 12 Stormo, G.D. (1986) in Maximizing Gene Expression (Reznikoff, W. and Gold, L., eds.), pp. 195-223. Butterworth Publishers, Boston. 13 Rosenberg, M. and Court, D. (1979) Annu. Rev. Genet. 13, 319-353. 14 Ebright, R.H. (1982) In Molecular Structure and Biological Activity (Griffin, J.P. and Duax, W.L., eds.), pp. 91-99, Elsevier, New York. 15 Gussin, G.N., Ronson, C.W. and Ausubel, F.M. (1986) Annu. Rev. Genet. 20, 567-591. 16 Moren, E. and Buck, M. (1989) J. Mol. Biol. 211/, 65-77. 17 Devereux, J., Haeberli, P. and Smithies, O. (1984) Nucleic Acids Research 12, 387-395. 18 Beinert, H. (1990) FASEB J. 4, 2483-2491. 19 Evans, R.M. and Hollenberg, S.M. (1988) Cell 52, 1-3. 20 Schaeffer, E., Smith, D., Mardon, G., Quinn, W. and Zuker, C. (1989) Cell 57, 403-412. 21 Walker, J.E., Saraste, M., Runswick, M.J. and Gay, N.J. 11982) EMBO J. 1,945-951. 22 Cole, S.T. (19871 Eur. J. Biochem. 167, 481-488. 23 Jones, B.D. and Mobley, H.L.T. 11989) J. Bacteriol. 171, 64146422. 24 Mulrooney, S.B. and Hausinger, R.P. (1990) J. Bacteriol. 172, 5837-5843.

Identification of six open reading frames from a region of the Azotobacter vinelandii genome likely involved in dihydrogen metabolism.

We reported earlier the identification of two Azotobacter vinelandii open reading frames (ORFs), ORF1 and ORF2, downstream from the hydrogenase struct...
403KB Sizes 0 Downloads 0 Views