Biochimica et BiophysicaActa, 1049 (1990) 227-230 Elsevier

227

BBAEXP 90179

B B A Report - Short S e q u e n c e - P a p e r

Nucleotide sequences of the recta-cleavage pathway enzymes 2-hydroxymuconic semialdehyde dehydrogenase and 2-hydroxymuconic semialdehyde hydrolase from Pseudomonas CF600 Ingrid Nordlund and Victoria Shingler The Unitfor Applied Celland MolecularBiology, The Universityof Ume& Umed(Sweden) (Received 11 April 1990)

Key words: Aromatic catabolism; Nucleotide sequence; Biodegradation; (Pseudomonas)

The nucleotide sequence of a 2493 base pair (bp) region, spanning the coding regions for the meta-cleavage pathway enzymes 2-hydroxymuconic semialdehyde dehydrogenase ( H M S D ) and 2-hydroxymuconic semialdehyde hydrolase (HMSH), was determined. The deduced protein sequence for H M S D is 486 amino acid residues long with an M r of 51682. I-IMSD has homology with a number of aldehyde dehydrogenases from various eukaryotic sources. The deduced protein sequence for H M S H is 283 amino acids long with an M r of 30 965. The amino acid composition of this enzyme is similar to that of isofunctional enzymes from toluene and m-cresol catabolic pathways.

Soil bacteria exhibit extensive metabolic diversity enabling them to degrade a large variety of aromatic compounds. Although these bacteria employ a range of enzymes for the initial attack on the different aromatic substrates, the catabolic pathways tend to converge on just a few central intermediates such as catechol or substituted catechols. These key intermediates serve as substrates for ring-fission dioxygenases, products of which can be further metabolised by two distinct sets of enzymes: those of the ortho-cleavage pathway and those of the meta-cleavage pathway (reviewed in Ref. 1). The meta-cleavage pathway is versatile, being used in the dissimilation of a range of substituted catechols. Part of this metabolic versatility is due to divergence of the meta-pathway after ring cleavage of catechol to form 2-hydroxymuconic semialdehyde. 2-Hydroxymuconic semialdehyde (HMS) can be converted to 2Abbreviations: bp, base pair(s); dmpB, gene encoding catechol 2,3-dioxygenase; dmpC, gene encoding HMSD; dmpD, gene encoding HMSH; HMSD, 2-hydroxymuconic semialdehyde dehydrogenase; HMSH, 2-hydroxymuconic semialdehyde hydrolase; Mr, relative molecular weight; ORF, open reading frame. The sequence data in this paper have been submitted to the EMBL/Genbank Data Libraries under accession number X52805

Correspondence: V. Shingler, The Unit for Applied Cell and Molecular Biology, The University of UmeL S-901 87 Umea, Sweden.

oxopent-4-enoate by the action of H M S hydrolase ( H M S H ) or by the action of three sequential enzymes, the first of which is H M S dehydrogenase ( H M S D ) [2]. We have undertaken a detailed study of the metacleavage pathway of phenol-utilizing Pseudornonas CF600 [3]. The phenol catabolic pathway has been cloned, the genes for the ring-cleavage enzymes catechol 2,3-dioxygenase (dmpB), H M S D (dmpC) and H M S H (dmpD) have been mapped, and their protein products have been identified [4,5]. The nucleotide sequence of the catechol 2,3-dioxygenase dmpB gene of this pathway has previously been reported [4]. Here we report the nucleotide sequences of dmpC and dmpD encoding H M S D and H M S H , respectively. D N A spanning the coding regions of H M S D and H M S H was cloned into Stratagene's sequencing vector pbluescript S K ( + ) and ordered deletions were generated as described in Stratagene's E x o / M u n g D N A sequencing manual. The nucleotide sequence of both strands was determined directly from plasmids by the method of Sanger et al. [6]. The sense strand of 2493 bp is shown in Fig. 1 along with translation of two open reading frames (ORF) that lie in an operon structure together with the catechol 2,3-dioxygenase gene. The first O R F is located in the defined [5] region for H M S D , and the predicted amino acid sequence indicates that the protein contains 486 amino acids with M r 51 682. This molecular weight corresponds to the estimate of

0167-4781/90/$03.50 © 1990 Elsevier Science Publishers B.V. (Biomedical Division)

228 ~GGGC~GG L C K

CCATTTTTTA CCACGACCGG GTGCTC~CG I F Y H D R V L N E R

A

GA~GCGCAG

TTT~GCAGA

90

ATGAAAGAGA TC~GCACTT CATT~CGGC GCCTTCGTCG GCTCGGCTAG CGGCCGCACC TTCGAGGACG TC~CCCCGC M K E I K H F I N G A F V G S A S G R T F E D V N P A 2 7

180

T~TGGCCAG GTGATTGCCC N G Q V I A ~GGGGC~G W G R GG~TGC~G E C L

GACCGTG~C T *

L

ACCTGAAAGC

CCGGCTCGCC

AGCTGGTCGC A G R A

GCCG~GTCG E V D A

ACGCAGCGGT A V Q A

GCAGGCCGCA A R A A

CGCGCCGCGC

TG~GGCCC 270 L K G P 5 7

ATGAGCGTGA GCGAGCGGGC GGAGATCCTG M S V S E R A E I L H R

CACCGCGTGG V A D G

CCGACGGCAT I T A R

CACCGCACGT F D E F

TTCGACGAGT L E A ~

TCCTCG~GC 7

GATACCGGCA T G

D

A~G~G~C L K N

GTGTTCACGA R V H E

~CGTTTCCT L T V

F

AGCCC~GTC K P K S

GCTCGCCAGC CATATCGACA TCCCGCGCGG TGCGGCC~C L A S H I D I P R G A A N

TTC~GGTCT TCGCCGACCT F K V F A D L I I 7

450

GTCGCCACCG ~GCCTTCGA A T E A F E M

A

GATGGCCACG CCGGATGGTA GTGGCGCGAT CAACTACGCG GTGCGCCGCC T P D G S G A I N Y A V R R P K G V I 4

GATCGGGGTG ATCAGCCCGT GG~C~GCC I G V I S P W N L P L

GCTACTGCTG ATGAC~GGA AGGTCGGCCC GGCCCTGGCC TGCGGC~CA CGGTGGTGGT L M T W K V G P A I A C G N T V V V I 7 7

630

L

CGCG~GCTC A L L G

720

C~GCC~CG K P S

V

GAGGAAACCC E T

E

GCATGGCTTC GGCCCGGA~ H G F G P D S

CA~GACTAC P L T T

A

CGGCCGGCGC C~C~CACC G A F L T E H

CC~GGGGGT 7

360

GGCGAGGTGA TGCAGGCGGC CGGCGTGCCG GCTGGGGTGT AC~CGTGGT V M Q A A G V P A G V Y N V V 2 0 7 i GAGCACCCGG ACGTC~CGC CATCACC~C ACCGGCGAAA CCCGCACCGG P D V N A I T F T G E T R T G 2 3 7

540

E

CGAGGCGATC ATGCGCGCGG CGGCC~GGG TGTGCGGCCG GTGTCC~fCG ~CTGGGCGG C~G~CGCC E A I M R A A A K G V R P V S F E L G G K N A G I V F

810

GGCATCGTCT TCGCCGACTG D C 2 6 7

900

CGACCTGGAC AAGGCCATCG AAGGCAGCAT GCGGTCGGTG TTCGCCAACG GCGGCCAGGT CTGCCTGGGC ACCGAGCGCC TGTATGTCGA D L D K A I E G S M R S V F A N G G Q V C L G T E R L Y V E 2 9 7

990

GCGGCCGATC ~CGATGAGT R P I F D E F CGGCCCG~G G P L

I

V

~GTCGCCCG A R L

K

A

CCTC~GGCC GGCGCCGAGA GCCTGGTGAT AGGTACGCCG GATGACCCGC AGGCC~CTT A G A E S L V I G T P D D P Q A N F 3 2 7

ATCAGCCTGC AGCACCGCGA G~GGTCCTC TCCTATTACC AG~GGCGGT TGATGAGGGC GCCACAGTGG TCACCGGCGG S L Q H R E K V L S Y Y Q K A V D E G A T V V T G G 3 5 7

1080

1170

CGGCGTGCCC GAGATGCCGG CCGAA~GGC TGGCGGCGCC TGGGTGC~C CGACCATCTG GACCGGC~G GCCGACGGTG G V P E M P A E L A G G A W V Q P T I W T G L A D G A A V V 3 8

CCGCGGTGGT 7

CACCGAGG~ T E E

TGCC~ATGG 1350 L P Y G 4 1 7

C~GGCGGCG L A A

ATCTTCGGGC I F G

T

ACCAT~GGA I W T

C~GCTGCCA P C C H

E

TATCCGCCCG I R P F

TTCGACCGCG D R E E

AGG~G~GC E A V E

CGTCGAG~G L A N S

GCC~TAGCC

CCGAG~CAC ~CGCGCGCG CACCGCGTCG CCGGGC~CT GGAGGCCGGC ATCGTCTGGG ~CAG~G N T S R A H R V A G Q L E A G I V W V N S W 4 4 7

G~C~GCGC F L R

GAT~GCGTA CCGCCTTCGG D L R T A F G G

CGGCAGC~G S K Q S

CG~CTGAAA E L K

~CAT~GCG N I C V

TG~GTTGTG K L *

AGGCAGCCAT G~TGCACCC M N A P

GCA~CGCAC G I R T

C~CCTGCAT N L H

GACAGCGGCG D S G A

CAGGCTTCCC G F P L

CAGTCGGGTA G I G R

GCTGATGATG M M I H

TAGGCCGGGA AGGGGGTGTG CA~CG~GG E G G V H S L E F Y T 4

Q

CAG~CAGCC N S P

E

ATCCACGGCT G S G P

7

AG~ACAC 7

CGGAAATCGG CCGCGAGA~ ATCGCCGCCG I G R E I I A A CCGGCCCCGG G V T A

CGTGACCGCC W A N

TGGGCC~

1260

1440

1530

1620 486 17 1710 47

GGCGC~GGT GATGCCAGAG W R L V M P E

CTGGCC~GA GCCGCCGGGT GATAG~CCG GACATGCTCG GCTTCGGCTA CAGCGAGCGC L A K S R R V I A P D M L G F G Y S E R P A D

CCGGCCGATG

1800 77

CCCACTACAA A Q Y N

CCGTGATGTC R D V W

TGGGTCGATC ACGCGGTCGG CGTGCTCGAT GCGCTCGAGA TTGAACAGGC CGACCTGGTT V D H A V C V L D A L E I E Q A D L V G N S

GGC~CTCCT

1890 107

TCGGTGGCGG F G G G

CATAGCCCTG I A L A

GCGTTGGCCA TCCGCCATCC CGAGCGCGTG CGCCGGTTGG TGCTGATGGG CAGCGCCGGG L A I R H P E R V R R L V L M G S A G V S F

GTCAGC~CC

1980 137

CGATCACCGA P I T E

GGGCCTCGAC G L D A

GCGGTCTGGG GCTAC~CCC CTCGTTCGCC GAGATGCGCC GTCTGCTGGA CATCTTCGCC V W G Y N P S F A E M R R L L D I F A F D R

TTCGACCGCA

2070 167

ATCTGGTG~ N L V N

CGACG~CTG D E L A

GCCGAGTTGC E L R Y

CGCGGCGATG

TTCCCGGCGC F P A

2160 197

CGCGCCAGCG p R Q R

~GGGTCGAC W V D G

GGCCTGGCCA GCGCCGAGGC GGCGATCCGC GCC~GCCTC L A S A E A A I R A L P H E T

ACGAAACCCT GGTGATCCAT V I H G R E

GGCCGCGAGG

2250 227

ACCAGATCAT D Q I I

CCCGCTGC~ P L Q

TGCACGTGTT CGGCC~TGC GGCCACTGGA L H V F G Q C G H W

2340 257

CGCAAATCGA T Q I E

ACACGCCGCA H A A R

CG~TCGCTA F A S L

GCCTGGTCGG V G D F

GGATTTCCTC GCCGAGGCGG ACGCCGCTGC L A E A D A A A I S *

~TGGACAAG

ATTTTGATCA

ACGAGCTCGG

CGACGAGCTG

TACCAGGCAA

T

GCTACCAGGC Q A S I

CAGCATCCGT CCAGGCTTCC ACG~TCCTT R P G F H E S F A A M

ACCTCGCTGA CCCTGGCCGA CTGGATCGCC CGTGCGCAAC S L T L A D W I A R A Q

TGGTCAATCG

L

GGG

CATTTCCTGA

GAGAGACG~

2430 283 2493

Fig. 1. Nucleotide sequence of the H M S D ~ d H M S H coding reDons. A ~ n o acid sequences are ~ v e n in their one letter code with asterisks indicating stop c ~ o n s . The end of the preceding gene of the pathway, encoding catechol 2,3-dioxygenase [4], is also shown. Underlined sequences indicate putative ribosome binding sites in regions c o m p l e m e n t a ~ to the Y end of the 16 S r R N A of E. coli and P. aeruginosa [7].

M r 50 000 for the H M S D gene product [5]. The second O R F is in the defined region for H M S H , and the predicted a ~ n o acid sequence is 283 residues long with

an M~ of 30 965. This molecular weight is very s i ~ l a r to the M~ estimate of 30000 for the H M S H gene product [5].

229

2 3

16 a a . . . v F y N Q I ~ I N N E ~ D A ~ S g K T F ~ f ~ N P S ~ E ~ I ~ q ~ e ~ D K ~ D V ~ K A R A A P ~ J G ~ B D ~ S e R G ~ 15 aa ..E Q P L G L ~ I N N E F ~ K G ~ G K T ~ V I N P S ~ E E ~ T S W H E ~ T E ~ R A A R - - E G P W R ~ V T P ~ E R G I L / I ~

89 86

~ - - ~ T ~ D E ~ S ~H~PRGAAN FKV~D__~~FE~DGSGA IN ~ ~ 155 ~LA[~-~i~R~RTV~A~I~%DNGKP~VI S Y IJvDIJdMVI~CLRYYAG~- - - - ~ D ~ G ~ T I I~iDGD F F S ~ T R H E P v G V ~ I i~ 169 ~ D~I~DNGK~TM~~S IG~LRYYAG~- -- - ~ ~ T N P E T L T ~ T R H E P V G V C G O I I PW~ 165 iKEAGFP~G%~/V~RGFG R-~AGAALmdSH~DVDKVAFTGSTBvGklm 253 IK E ~ P A G V ~ _ ~ T A G A A ~ K V A F T G S ~ 249 Pn I ~ S ~ A ~ W ~ a i~F ~ G Q C C ~ G S R ~ ~ V Q E ~ ¥ d E F ~ R ~ ~ V G N P F ~ S g PN IVRD~_A__~I~I~A~W~GIFY__~.~OCfX~AGSI~II ~ ~ K N I ~ / G N P F E Q

~ L E L G G K S

338 D 334

s~KkEGA~LC~GG~pAA ..... D ~ Y F i Q P T V F ~ D ~ M ~ K ~ E IF G P ~ Q ~ g F ~ T ~ E ~ G K~__~.~GATg]A~_q_qDRHG..... N E ~ Y F I O P T V F T D ~ T S E ~ } ~ _ A ~ E E I F G ~ V A y ~ I K *

S

*

418 414

*

~ G ~ W F L R D L R T ~ ~ ~ ~ ~N~A~AGTVWV~C~DV F G A ~ P F G G ~ S G ~ G R E 1~ A ~ T ~ P Q K N S ~ I ~ G ~ IS ~ O S G~GRE L ~ S ~ N ~ _ ~ I ~ Y R ~ G D A L F A

486 500 497

Fig. 2. Amino acid sequence comparison of H M S D with aldehyde dehydrogenases from various sources. (1) H M S D from Pseudomonas CF600. (2) A consensus sequence of horse and human, cytosolic and mitochondrial aldehyde dehydrogenases [9], lower case letters indicate where the residue is present in two or three of the four sequences, o indicates the position where no c o m m o n residue was found. (3) Aldehyde dehydrogenase from Aspergillus nidulans [10]. Identical residues are boxed with dashes indicating the position and size of spaces introduced into the sequence. Asterisks mark three glycine residues with the same spacing as an N A D fingerprint [12] and a star indicates the location of a conserved cysteine residue, at position 302, that is believed to be important for the catalytic activity of aldehyde dehydrogenases [11].

TABLE I

Amino acid composition of 2-hydroxymuconic semialdehyde hydrolases Amino acid

Alanine Arginine Aspartic acid Cystine Glutamic acid Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Proline Serine Threonine Tryptophan Tyrosine Valine Total No. of amino acid residues Estimated M r

Pseudomonas c CF600

P. putida d NCIB10015

%a

No. b



14.4 7.7 8.7 0.3 9.5 8.4 3.5 6.7 9.5 0.3 2.8 4.5 5.3 5.3 2.4 2.4 1.4 6.0

41 22 25 1 27 24 10 19 27 1 8 13 15 15 7 7 4 17

12.1 7.1 8.8 0.4 9.5 7.9 2.9 4.7 8.5 1.7 1.9 3.7 4.4 3.9 2.5 N.D. 1.7 5.7

P. putida d NCIB9865 NO.

% 34 20 24 1 26 22 8 13 24 5 5 10 12 11 7 5 16

11.0 4.7 8.2 0.3 9.9 8.6 1.9 4.4 7.9 1.3 1.9 3.7 4.2 4.9 2.9 N.D. 1.7 5.6

P. putida PaW1 No,

% 28 12 22 1 25 22 5 11 20 3 5 9 11 12 7 4 14

8.3 11.0 7.1 0.4 13.9 5.9 4.3 5.9 11.0 2.0 1.8 4.7 5.9 5.2 5.4 3.5 2.9 4.6

No. 36 22 19 1 33 32 10 16 30 5 4 10 19 18 17 6 5 14

283

243

221

297

30000

27000

25000

32000

N.D., not determined; a molar fraction percent; b number of amino acid residues; c determined from nucleotide sequence; d previously determined from the purified protein.

230 C o m p u t e r - a s s i s t e d sequence c o m p a r i s o n s [8] were o n l y revealing for H M S D . This e n z y m e was f o u n d to have m o r e that 39% i d e n t i t y to a n u m b e r of a l d e h y d e d e h y d r o g e n a s e s f r o m a variety of sources. Fig. 2 illustrates the degree of h o m o l o g y and d e m o n s t r a t e s that 30% of the residues, including a cysteine residue believed to be i m p o r t a n t for the catalytic activity of these enzymes (Ref. 11 a n d Refs. therein), are identical in all six sequences c o m p a r e d . H M S D a n d the a l d e h y d e dehyd r o g e n a s e s all require N A D as a cofactor. A n N A D b i n d i n g /3~/3-fold fingerprint region has been i d e n t i f e d [12] that includes an i n v a r i a n t G X G X X G sequence n e a r the C - t e r m i n a l of H M S D . However, other a m i n o acid residues of the fingerprint region are not f o u n d n e a r this sequence, m a k i n g it unlikely that it is the c o f a c t o r b i n d i n g site. Interestingly, o t h e r glycine-rich regions of these p r o t e i n s are c o n s e r v e d in all six sequences. H M S H catalyses the cleavage of a c a r b o n - c a r b o n b o n d . As p o i n t e d out b y D u g g l e b y a n d W i l l i a m s [13], this is one of the rarest of all e n z y m e reaction types. Hence, it is not too surprising that c o m p u t e r searches were u n p r o d u c t i v e . A l t h o u g h sequence d a t a is not available, H M S H f r o m a toluene c a t a b o l i c p a t h w a y [13] a n d two m-cresol c a t a b o l i c p a t h w a y s have been purified [14]. The a m i n o acid c o m p o s i t i o n of these enzymes is similar to that of the p r e d i c t e d H M S H r e p o r t e d here, see T a b l e I. However, the degree of r e l a t e d n e s s of these enzymes will have to await c o m p a r i s o n of the p r i m a r y a m i n o acid sequences.

This research was s u p p o r t e d by grants from l h c C e n t e r for E n v i r o n m e n t a l Research ( C M F B: IS5) and The N a t i o n a l Swedish B o a r d for Technical Developm e n t (STUF90-470).

References 1 Dagley, S. (1986) in The Bacteria. Vol. X (Sokatch, J.R., ed.) pp. 527-556, Academic Press, London. 2 Sala-Trepat and Evans, W.C. (1971) J. Biol. Chem. 20, 400-413. 3 Shingler, V., Bagdasarian, M., Holroyd, D. and Franklin, F.C.H. (1989) J. Gen. Microbiol. 135, 1083-1092. 4 Bartilson, M. and Shingler, V. (1990) Gene 85,235-240. 5 Bartilson, M., Nordlund, I. and Shingler, V. (1990) Mol. Gen. Genet. 220, 294-300. 6 Sanger, F., Miklen, S. and Coulson, A.R. (1977) Proc. Natl. Acad. Sci. USA 74, 5463-5467. 7 Shine, J. and Dalgarno, L. (1975) Nature 254, 34-38. 8 Devereux, D., Haeberli, P. and Smithies, O. (1984) Nucleic Acids Res. 12, 387 395. 9 Johansson, J., Von Bahr-Lindstr/Sm, H., Jeck, R., Woenckhaus, C. and J/Srnvall, H. (1988) Eur. J. Biochem. 172, 527-533. 10 Pickett, M., Gwynne, D.I., Buxton, F.P., Elliott, R., Davies, R.W., Lockington, R.A., Scazzocchio, C. and Sealy-Lewis, H.M. (1987) Gene 51, 217 226. 11 Von Bahr-Lindstr0m, H., Jeck, R., Woenckhaus, (7., Sohn, S., Hempel, J. and J~Srnvall, H, (1985) Biochemistry 24, 6834 6838. 12 Wierenga, R.K., Terpstra. P. and Hol, W.G.J. (1986) J. Mol. Biol. 187, 101-107. 13 Duggleby, CJ. and Williams, P.A. (1986)J. Gen. Microbiol. 132. 717-726. 14 Bayle, R.C. and Berardino, D.D. (1978) J. Bacteriol. 134. 30-37.

Nucleotide sequences of the meta-cleavage pathway enzymes 2-hydroxymuconic semialdehyde dehydrogenase and 2-hydroxymuconic semialdehyde hydrolase from Pseudomonas CF600.

The nucleotide sequence of a 2493 base pair (bp) region, spanning the coding regions for the meta-cleavage pathway enzymes 2-hydroxymuconic semialdehy...
316KB Sizes 0 Downloads 0 Views