Biochimica et BiophysicaActa, 1049 (1990) 227-230 Elsevier
227
BBAEXP 90179
B B A Report - Short S e q u e n c e - P a p e r
Nucleotide sequences of the recta-cleavage pathway enzymes 2-hydroxymuconic semialdehyde dehydrogenase and 2-hydroxymuconic semialdehyde hydrolase from Pseudomonas CF600 Ingrid Nordlund and Victoria Shingler The Unitfor Applied Celland MolecularBiology, The Universityof Ume& Umed(Sweden) (Received 11 April 1990)
Key words: Aromatic catabolism; Nucleotide sequence; Biodegradation; (Pseudomonas)
The nucleotide sequence of a 2493 base pair (bp) region, spanning the coding regions for the meta-cleavage pathway enzymes 2-hydroxymuconic semialdehyde dehydrogenase ( H M S D ) and 2-hydroxymuconic semialdehyde hydrolase (HMSH), was determined. The deduced protein sequence for H M S D is 486 amino acid residues long with an M r of 51682. I-IMSD has homology with a number of aldehyde dehydrogenases from various eukaryotic sources. The deduced protein sequence for H M S H is 283 amino acids long with an M r of 30 965. The amino acid composition of this enzyme is similar to that of isofunctional enzymes from toluene and m-cresol catabolic pathways.
Soil bacteria exhibit extensive metabolic diversity enabling them to degrade a large variety of aromatic compounds. Although these bacteria employ a range of enzymes for the initial attack on the different aromatic substrates, the catabolic pathways tend to converge on just a few central intermediates such as catechol or substituted catechols. These key intermediates serve as substrates for ring-fission dioxygenases, products of which can be further metabolised by two distinct sets of enzymes: those of the ortho-cleavage pathway and those of the meta-cleavage pathway (reviewed in Ref. 1). The meta-cleavage pathway is versatile, being used in the dissimilation of a range of substituted catechols. Part of this metabolic versatility is due to divergence of the meta-pathway after ring cleavage of catechol to form 2-hydroxymuconic semialdehyde. 2-Hydroxymuconic semialdehyde (HMS) can be converted to 2Abbreviations: bp, base pair(s); dmpB, gene encoding catechol 2,3-dioxygenase; dmpC, gene encoding HMSD; dmpD, gene encoding HMSH; HMSD, 2-hydroxymuconic semialdehyde dehydrogenase; HMSH, 2-hydroxymuconic semialdehyde hydrolase; Mr, relative molecular weight; ORF, open reading frame. The sequence data in this paper have been submitted to the EMBL/Genbank Data Libraries under accession number X52805
Correspondence: V. Shingler, The Unit for Applied Cell and Molecular Biology, The University of UmeL S-901 87 Umea, Sweden.
oxopent-4-enoate by the action of H M S hydrolase ( H M S H ) or by the action of three sequential enzymes, the first of which is H M S dehydrogenase ( H M S D ) [2]. We have undertaken a detailed study of the metacleavage pathway of phenol-utilizing Pseudornonas CF600 [3]. The phenol catabolic pathway has been cloned, the genes for the ring-cleavage enzymes catechol 2,3-dioxygenase (dmpB), H M S D (dmpC) and H M S H (dmpD) have been mapped, and their protein products have been identified [4,5]. The nucleotide sequence of the catechol 2,3-dioxygenase dmpB gene of this pathway has previously been reported [4]. Here we report the nucleotide sequences of dmpC and dmpD encoding H M S D and H M S H , respectively. D N A spanning the coding regions of H M S D and H M S H was cloned into Stratagene's sequencing vector pbluescript S K ( + ) and ordered deletions were generated as described in Stratagene's E x o / M u n g D N A sequencing manual. The nucleotide sequence of both strands was determined directly from plasmids by the method of Sanger et al. [6]. The sense strand of 2493 bp is shown in Fig. 1 along with translation of two open reading frames (ORF) that lie in an operon structure together with the catechol 2,3-dioxygenase gene. The first O R F is located in the defined [5] region for H M S D , and the predicted amino acid sequence indicates that the protein contains 486 amino acids with M r 51 682. This molecular weight corresponds to the estimate of
0167-4781/90/$03.50 © 1990 Elsevier Science Publishers B.V. (Biomedical Division)
228 ~GGGC~GG L C K
CCATTTTTTA CCACGACCGG GTGCTC~CG I F Y H D R V L N E R
A
GA~GCGCAG
TTT~GCAGA
90
ATGAAAGAGA TC~GCACTT CATT~CGGC GCCTTCGTCG GCTCGGCTAG CGGCCGCACC TTCGAGGACG TC~CCCCGC M K E I K H F I N G A F V G S A S G R T F E D V N P A 2 7
180
T~TGGCCAG GTGATTGCCC N G Q V I A ~GGGGC~G W G R GG~TGC~G E C L
GACCGTG~C T *
L
ACCTGAAAGC
CCGGCTCGCC
AGCTGGTCGC A G R A
GCCG~GTCG E V D A
ACGCAGCGGT A V Q A
GCAGGCCGCA A R A A
CGCGCCGCGC
TG~GGCCC 270 L K G P 5 7
ATGAGCGTGA GCGAGCGGGC GGAGATCCTG M S V S E R A E I L H R
CACCGCGTGG V A D G
CCGACGGCAT I T A R
CACCGCACGT F D E F
TTCGACGAGT L E A ~
TCCTCG~GC 7
GATACCGGCA T G
D
A~G~G~C L K N
GTGTTCACGA R V H E
~CGTTTCCT L T V
F
AGCCC~GTC K P K S
GCTCGCCAGC CATATCGACA TCCCGCGCGG TGCGGCC~C L A S H I D I P R G A A N
TTC~GGTCT TCGCCGACCT F K V F A D L I I 7
450
GTCGCCACCG ~GCCTTCGA A T E A F E M
A
GATGGCCACG CCGGATGGTA GTGGCGCGAT CAACTACGCG GTGCGCCGCC T P D G S G A I N Y A V R R P K G V I 4
GATCGGGGTG ATCAGCCCGT GG~C~GCC I G V I S P W N L P L
GCTACTGCTG ATGAC~GGA AGGTCGGCCC GGCCCTGGCC TGCGGC~CA CGGTGGTGGT L M T W K V G P A I A C G N T V V V I 7 7
630
L
CGCG~GCTC A L L G
720
C~GCC~CG K P S
V
GAGGAAACCC E T
E
GCATGGCTTC GGCCCGGA~ H G F G P D S
CA~GACTAC P L T T
A
CGGCCGGCGC C~C~CACC G A F L T E H
CC~GGGGGT 7
360
GGCGAGGTGA TGCAGGCGGC CGGCGTGCCG GCTGGGGTGT AC~CGTGGT V M Q A A G V P A G V Y N V V 2 0 7 i GAGCACCCGG ACGTC~CGC CATCACC~C ACCGGCGAAA CCCGCACCGG P D V N A I T F T G E T R T G 2 3 7
540
E
CGAGGCGATC ATGCGCGCGG CGGCC~GGG TGTGCGGCCG GTGTCC~fCG ~CTGGGCGG C~G~CGCC E A I M R A A A K G V R P V S F E L G G K N A G I V F
810
GGCATCGTCT TCGCCGACTG D C 2 6 7
900
CGACCTGGAC AAGGCCATCG AAGGCAGCAT GCGGTCGGTG TTCGCCAACG GCGGCCAGGT CTGCCTGGGC ACCGAGCGCC TGTATGTCGA D L D K A I E G S M R S V F A N G G Q V C L G T E R L Y V E 2 9 7
990
GCGGCCGATC ~CGATGAGT R P I F D E F CGGCCCG~G G P L
I
V
~GTCGCCCG A R L
K
A
CCTC~GGCC GGCGCCGAGA GCCTGGTGAT AGGTACGCCG GATGACCCGC AGGCC~CTT A G A E S L V I G T P D D P Q A N F 3 2 7
ATCAGCCTGC AGCACCGCGA G~GGTCCTC TCCTATTACC AG~GGCGGT TGATGAGGGC GCCACAGTGG TCACCGGCGG S L Q H R E K V L S Y Y Q K A V D E G A T V V T G G 3 5 7
1080
1170
CGGCGTGCCC GAGATGCCGG CCGAA~GGC TGGCGGCGCC TGGGTGC~C CGACCATCTG GACCGGC~G GCCGACGGTG G V P E M P A E L A G G A W V Q P T I W T G L A D G A A V V 3 8
CCGCGGTGGT 7
CACCGAGG~ T E E
TGCC~ATGG 1350 L P Y G 4 1 7
C~GGCGGCG L A A
ATCTTCGGGC I F G
T
ACCAT~GGA I W T
C~GCTGCCA P C C H
E
TATCCGCCCG I R P F
TTCGACCGCG D R E E
AGG~G~GC E A V E
CGTCGAG~G L A N S
GCC~TAGCC
CCGAG~CAC ~CGCGCGCG CACCGCGTCG CCGGGC~CT GGAGGCCGGC ATCGTCTGGG ~CAG~G N T S R A H R V A G Q L E A G I V W V N S W 4 4 7
G~C~GCGC F L R
GAT~GCGTA CCGCCTTCGG D L R T A F G G
CGGCAGC~G S K Q S
CG~CTGAAA E L K
~CAT~GCG N I C V
TG~GTTGTG K L *
AGGCAGCCAT G~TGCACCC M N A P
GCA~CGCAC G I R T
C~CCTGCAT N L H
GACAGCGGCG D S G A
CAGGCTTCCC G F P L
CAGTCGGGTA G I G R
GCTGATGATG M M I H
TAGGCCGGGA AGGGGGTGTG CA~CG~GG E G G V H S L E F Y T 4
Q
CAG~CAGCC N S P
E
ATCCACGGCT G S G P
7
AG~ACAC 7
CGGAAATCGG CCGCGAGA~ ATCGCCGCCG I G R E I I A A CCGGCCCCGG G V T A
CGTGACCGCC W A N
TGGGCC~
1260
1440
1530
1620 486 17 1710 47
GGCGC~GGT GATGCCAGAG W R L V M P E
CTGGCC~GA GCCGCCGGGT GATAG~CCG GACATGCTCG GCTTCGGCTA CAGCGAGCGC L A K S R R V I A P D M L G F G Y S E R P A D
CCGGCCGATG
1800 77
CCCACTACAA A Q Y N
CCGTGATGTC R D V W
TGGGTCGATC ACGCGGTCGG CGTGCTCGAT GCGCTCGAGA TTGAACAGGC CGACCTGGTT V D H A V C V L D A L E I E Q A D L V G N S
GGC~CTCCT
1890 107
TCGGTGGCGG F G G G
CATAGCCCTG I A L A
GCGTTGGCCA TCCGCCATCC CGAGCGCGTG CGCCGGTTGG TGCTGATGGG CAGCGCCGGG L A I R H P E R V R R L V L M G S A G V S F
GTCAGC~CC
1980 137
CGATCACCGA P I T E
GGGCCTCGAC G L D A
GCGGTCTGGG GCTAC~CCC CTCGTTCGCC GAGATGCGCC GTCTGCTGGA CATCTTCGCC V W G Y N P S F A E M R R L L D I F A F D R
TTCGACCGCA
2070 167
ATCTGGTG~ N L V N
CGACG~CTG D E L A
GCCGAGTTGC E L R Y
CGCGGCGATG
TTCCCGGCGC F P A
2160 197
CGCGCCAGCG p R Q R
~GGGTCGAC W V D G
GGCCTGGCCA GCGCCGAGGC GGCGATCCGC GCC~GCCTC L A S A E A A I R A L P H E T
ACGAAACCCT GGTGATCCAT V I H G R E
GGCCGCGAGG
2250 227
ACCAGATCAT D Q I I
CCCGCTGC~ P L Q
TGCACGTGTT CGGCC~TGC GGCCACTGGA L H V F G Q C G H W
2340 257
CGCAAATCGA T Q I E
ACACGCCGCA H A A R
CG~TCGCTA F A S L
GCCTGGTCGG V G D F
GGATTTCCTC GCCGAGGCGG ACGCCGCTGC L A E A D A A A I S *
~TGGACAAG
ATTTTGATCA
ACGAGCTCGG
CGACGAGCTG
TACCAGGCAA
T
GCTACCAGGC Q A S I
CAGCATCCGT CCAGGCTTCC ACG~TCCTT R P G F H E S F A A M
ACCTCGCTGA CCCTGGCCGA CTGGATCGCC CGTGCGCAAC S L T L A D W I A R A Q
TGGTCAATCG
L
GGG
CATTTCCTGA
GAGAGACG~
2430 283 2493
Fig. 1. Nucleotide sequence of the H M S D ~ d H M S H coding reDons. A ~ n o acid sequences are ~ v e n in their one letter code with asterisks indicating stop c ~ o n s . The end of the preceding gene of the pathway, encoding catechol 2,3-dioxygenase [4], is also shown. Underlined sequences indicate putative ribosome binding sites in regions c o m p l e m e n t a ~ to the Y end of the 16 S r R N A of E. coli and P. aeruginosa [7].
M r 50 000 for the H M S D gene product [5]. The second O R F is in the defined region for H M S H , and the predicted a ~ n o acid sequence is 283 residues long with
an M~ of 30 965. This molecular weight is very s i ~ l a r to the M~ estimate of 30000 for the H M S H gene product [5].
229
2 3
16 a a . . . v F y N Q I ~ I N N E ~ D A ~ S g K T F ~ f ~ N P S ~ E ~ I ~ q ~ e ~ D K ~ D V ~ K A R A A P ~ J G ~ B D ~ S e R G ~ 15 aa ..E Q P L G L ~ I N N E F ~ K G ~ G K T ~ V I N P S ~ E E ~ T S W H E ~ T E ~ R A A R - - E G P W R ~ V T P ~ E R G I L / I ~
89 86
~ - - ~ T ~ D E ~ S ~H~PRGAAN FKV~D__~~FE~DGSGA IN ~ ~ 155 ~LA[~-~i~R~RTV~A~I~%DNGKP~VI S Y IJvDIJdMVI~CLRYYAG~- - - - ~ D ~ G ~ T I I~iDGD F F S ~ T R H E P v G V ~ I i~ 169 ~ D~I~DNGK~TM~~S IG~LRYYAG~- -- - ~ ~ T N P E T L T ~ T R H E P V G V C G O I I PW~ 165 iKEAGFP~G%~/V~RGFG R-~AGAALmdSH~DVDKVAFTGSTBvGklm 253 IK E ~ P A G V ~ _ ~ T A G A A ~ K V A F T G S ~ 249 Pn I ~ S ~ A ~ W ~ a i~F ~ G Q C C ~ G S R ~ ~ V Q E ~ ¥ d E F ~ R ~ ~ V G N P F ~ S g PN IVRD~_A__~I~I~A~W~GIFY__~.~OCfX~AGSI~II ~ ~ K N I ~ / G N P F E Q
~ L E L G G K S
338 D 334
s~KkEGA~LC~GG~pAA ..... D ~ Y F i Q P T V F ~ D ~ M ~ K ~ E IF G P ~ Q ~ g F ~ T ~ E ~ G K~__~.~GATg]A~_q_qDRHG..... N E ~ Y F I O P T V F T D ~ T S E ~ } ~ _ A ~ E E I F G ~ V A y ~ I K *
S
*
418 414
*
~ G ~ W F L R D L R T ~ ~ ~ ~ ~N~A~AGTVWV~C~DV F G A ~ P F G G ~ S G ~ G R E 1~ A ~ T ~ P Q K N S ~ I ~ G ~ IS ~ O S G~GRE L ~ S ~ N ~ _ ~ I ~ Y R ~ G D A L F A
486 500 497
Fig. 2. Amino acid sequence comparison of H M S D with aldehyde dehydrogenases from various sources. (1) H M S D from Pseudomonas CF600. (2) A consensus sequence of horse and human, cytosolic and mitochondrial aldehyde dehydrogenases [9], lower case letters indicate where the residue is present in two or three of the four sequences, o indicates the position where no c o m m o n residue was found. (3) Aldehyde dehydrogenase from Aspergillus nidulans [10]. Identical residues are boxed with dashes indicating the position and size of spaces introduced into the sequence. Asterisks mark three glycine residues with the same spacing as an N A D fingerprint [12] and a star indicates the location of a conserved cysteine residue, at position 302, that is believed to be important for the catalytic activity of aldehyde dehydrogenases [11].
TABLE I
Amino acid composition of 2-hydroxymuconic semialdehyde hydrolases Amino acid
Alanine Arginine Aspartic acid Cystine Glutamic acid Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Proline Serine Threonine Tryptophan Tyrosine Valine Total No. of amino acid residues Estimated M r
Pseudomonas c CF600
P. putida d NCIB10015
%a
No. b
•
14.4 7.7 8.7 0.3 9.5 8.4 3.5 6.7 9.5 0.3 2.8 4.5 5.3 5.3 2.4 2.4 1.4 6.0
41 22 25 1 27 24 10 19 27 1 8 13 15 15 7 7 4 17
12.1 7.1 8.8 0.4 9.5 7.9 2.9 4.7 8.5 1.7 1.9 3.7 4.4 3.9 2.5 N.D. 1.7 5.7
P. putida d NCIB9865 NO.
% 34 20 24 1 26 22 8 13 24 5 5 10 12 11 7 5 16
11.0 4.7 8.2 0.3 9.9 8.6 1.9 4.4 7.9 1.3 1.9 3.7 4.2 4.9 2.9 N.D. 1.7 5.6
P. putida PaW1 No,
% 28 12 22 1 25 22 5 11 20 3 5 9 11 12 7 4 14
8.3 11.0 7.1 0.4 13.9 5.9 4.3 5.9 11.0 2.0 1.8 4.7 5.9 5.2 5.4 3.5 2.9 4.6
No. 36 22 19 1 33 32 10 16 30 5 4 10 19 18 17 6 5 14
283
243
221
297
30000
27000
25000
32000
N.D., not determined; a molar fraction percent; b number of amino acid residues; c determined from nucleotide sequence; d previously determined from the purified protein.
230 C o m p u t e r - a s s i s t e d sequence c o m p a r i s o n s [8] were o n l y revealing for H M S D . This e n z y m e was f o u n d to have m o r e that 39% i d e n t i t y to a n u m b e r of a l d e h y d e d e h y d r o g e n a s e s f r o m a variety of sources. Fig. 2 illustrates the degree of h o m o l o g y and d e m o n s t r a t e s that 30% of the residues, including a cysteine residue believed to be i m p o r t a n t for the catalytic activity of these enzymes (Ref. 11 a n d Refs. therein), are identical in all six sequences c o m p a r e d . H M S D a n d the a l d e h y d e dehyd r o g e n a s e s all require N A D as a cofactor. A n N A D b i n d i n g /3~/3-fold fingerprint region has been i d e n t i f e d [12] that includes an i n v a r i a n t G X G X X G sequence n e a r the C - t e r m i n a l of H M S D . However, other a m i n o acid residues of the fingerprint region are not f o u n d n e a r this sequence, m a k i n g it unlikely that it is the c o f a c t o r b i n d i n g site. Interestingly, o t h e r glycine-rich regions of these p r o t e i n s are c o n s e r v e d in all six sequences. H M S H catalyses the cleavage of a c a r b o n - c a r b o n b o n d . As p o i n t e d out b y D u g g l e b y a n d W i l l i a m s [13], this is one of the rarest of all e n z y m e reaction types. Hence, it is not too surprising that c o m p u t e r searches were u n p r o d u c t i v e . A l t h o u g h sequence d a t a is not available, H M S H f r o m a toluene c a t a b o l i c p a t h w a y [13] a n d two m-cresol c a t a b o l i c p a t h w a y s have been purified [14]. The a m i n o acid c o m p o s i t i o n of these enzymes is similar to that of the p r e d i c t e d H M S H r e p o r t e d here, see T a b l e I. However, the degree of r e l a t e d n e s s of these enzymes will have to await c o m p a r i s o n of the p r i m a r y a m i n o acid sequences.
This research was s u p p o r t e d by grants from l h c C e n t e r for E n v i r o n m e n t a l Research ( C M F B: IS5) and The N a t i o n a l Swedish B o a r d for Technical Developm e n t (STUF90-470).
References 1 Dagley, S. (1986) in The Bacteria. Vol. X (Sokatch, J.R., ed.) pp. 527-556, Academic Press, London. 2 Sala-Trepat and Evans, W.C. (1971) J. Biol. Chem. 20, 400-413. 3 Shingler, V., Bagdasarian, M., Holroyd, D. and Franklin, F.C.H. (1989) J. Gen. Microbiol. 135, 1083-1092. 4 Bartilson, M. and Shingler, V. (1990) Gene 85,235-240. 5 Bartilson, M., Nordlund, I. and Shingler, V. (1990) Mol. Gen. Genet. 220, 294-300. 6 Sanger, F., Miklen, S. and Coulson, A.R. (1977) Proc. Natl. Acad. Sci. USA 74, 5463-5467. 7 Shine, J. and Dalgarno, L. (1975) Nature 254, 34-38. 8 Devereux, D., Haeberli, P. and Smithies, O. (1984) Nucleic Acids Res. 12, 387 395. 9 Johansson, J., Von Bahr-Lindstr/Sm, H., Jeck, R., Woenckhaus, C. and J/Srnvall, H. (1988) Eur. J. Biochem. 172, 527-533. 10 Pickett, M., Gwynne, D.I., Buxton, F.P., Elliott, R., Davies, R.W., Lockington, R.A., Scazzocchio, C. and Sealy-Lewis, H.M. (1987) Gene 51, 217 226. 11 Von Bahr-Lindstr0m, H., Jeck, R., Woenckhaus, (7., Sohn, S., Hempel, J. and J~Srnvall, H, (1985) Biochemistry 24, 6834 6838. 12 Wierenga, R.K., Terpstra. P. and Hol, W.G.J. (1986) J. Mol. Biol. 187, 101-107. 13 Duggleby, CJ. and Williams, P.A. (1986)J. Gen. Microbiol. 132. 717-726. 14 Bayle, R.C. and Berardino, D.D. (1978) J. Bacteriol. 134. 30-37.