Nucleic Acids Research, Vol. 18, No. 20 6133
Complete nucleotide sequence of the aroA gene from Salmonella typhi encoding 5-enolpyruvyishikimate 3-phosphate synthase S.Chatfield, G.Dougan and I.Charles* Department of Molecular Biology, Wellcome Biotech, Langley Court, Beckenham, Kent BR3 3BS, UK Submitted September 17, 1990
EMBL accession no. X54545
REFERENCES
As part of a programme of research to generate fully characterized aro- mutants of Salmonella typhi for use as live attenuated vaccines we have cloned and sequenced the aroC (1) and aroD genes (submitted to J. Gen. Microbiol.) of S. typhi. We report here the nucleotide sequence of the S. typhi aroA gene encoding 5-enolpyruvylshikimate 3-phosphate synthase (EPSP synthase). An aroA containing cosmid was isolated by complementation of E. coli BRD048 aroA (2). The full length sequence encodes a protein of 427 amino acids that shows a 97.7 % similarity with the S. typhimurium EPSP synthase (3). N
E
S
L
T
L
0
P
I
1. Charles,I.G., Lamb,H.K., Pickard,D., Dougan,G. and Hawkins,A.R. (1990) J. Gen. Microbiol. 136, 353 -358. 2. Dougan,G., Chatfield,S., Pickard,D., Bester,J., O'Callaghan,D. and Maskell,D. (1988) J. Infect. Dis. 158, 1329-1335. 3. Stalker,D.M., Hiatt,W.F. and Comai,L. (1985) J. Biol. Chem. 260, 4724-4728.
A
R
V
GA I
D
N
L
P G
S
K
S
V S
N
R
A L
L
TrTTATTTCTGTTTTTBTTGAGATOTTTCATOGAATCCCTACBTTACAACCCTCCGCGBGTCGATGGCGCCATTAATTTACCTGGCTCCAAAAGTGTT CAACCGTGCTTTGCTC L
A A
30
20
10 L
A
C
G K
T
V
40 L
T
N
50 L
L
D
S
60 D
D
V
9o
70 R
H
M L
N A
L
90
S A L
100
G I
N
110 Y
T
L S
120
A
D
R
CTOOCBBCTTTAGCTTBTGGTAAACCTTCTGACGAATCTBCTGBATAGCGATBACGTCCGCCATATGCTCAATCCCTB4BCGCGTTOGGZATCAATTACACCCTTTCTGCCGATCGC 140
130
T
R
D
C
I
T
O NG
O P
L
R A
16O
170
160
150
S G
T
L
E
L
190
F
L
G
200
N
A G
210
T A
F R
220
P
L
A A
A
L
230
240
C
L O 0 GTCTGGGA 360
ACCCGCTGTGATATCACGGTATOSTCGGCCCATTACGCGCGTCAaCACTCTGGAACTGTTTCTCBGTAATGCCGGAACCGCGATBCGTCCGTTAGCGGCAGCCCTAT 260
250
290
270
290
300
310
330
320
350
340
N E I V L T G E P RM G G A N I D Y L E 0 E N Y P P H L V D S L RM K E R P I G AATGTAGATATTTAACCCGCGACB TCCCBTT TAAGAGCCCATGCATCTGBT TCAATTCCTGCGTCABGOTBGGBCGATATTATTATTCCTGGAGAGAACTATCCBCCC 400 410 390 380 450 440 460 370 420 430 470 460 L
R
L
R
O O
F
I1
0
D
I
E
V
D
O S V S
S
0
F
L
T
A LL
M T
A
P
L
A
P
E
D
T
I
I
R
CTBCGTCTGCGCGGCGGTTTTATCGBCGCB3CACATTGAGBTTGATGGTAGCGTTTCCAGCCABTTCCTBACCGCTCTGCTGATGACGGCGCCGCTGGCGCCTAGACACAATTATTCGC 600 530 490 530 500 510 520 540 550 540 570 590 V
K
G E
L
V
S
K
P
Y
I
D
I
T
L
N
L
M
K
T
F
0
V
E
I
A
N
VY
H
0
0
F
V
V
K
0 B
0
0
OTTAAUGCGAACTGBTATCAAACCTTACATCBATATCACBCTAAATTTAATGAAAACCTTTGGCBTGBAGATABCBAACCATCACTACCAACAATTTBTCBTGAGGG(TCAACAB 650 640 660 680 620 630 610 670 690 700 710 720 R VY L V E G D A S S A S Y F L A A G Y H S P T V K V T G I G G K S M I K BG TATCACTCTCCAGBTCGCTATCTG@TCBAGCATBCCTCBTCAGCGTCCTATTTTCTCGCCGCTGBGGGCATAAAGrGCGGCACBGTAAAAGTGACCGGBATTGBCBGCAAAfTATO 740 770 780 640 930 760 900 730 750 790 920 91o
0G
D
I
R
F
A
D
V
L
H
K
M G
A
T
I
T
W
O D
D
F
I
A
C
T
R
G E
L
H
A
I
D M
D M N
H
CABBBCGATATTCGTTTTBCCGATBTGCTCCACAAAATGGGCGCBACCATTACTTGGGGCGATGATTTTATTGCCTGCACBCGCGGCGAATTGCACGCCATAGATATGBATATGAACCAT 990 660 900 960 950 950 870 9so 930 940 910 920 I
P
D
A
A M
T
I
A
T
T
A
L
F
A
K
O T
T
T
L
R
N
I
Y
N
W
R
V
K
E
T
D
R
L
F
A M
A
T
AG ATCGccTGTTCGCWWTliCGAC ATTCCG9ATGCG7G0T9C6TTCCA9 11CCACGG10CGA0G1G0CCACG1C0TT0GC10A0TATTTATA1060AOTA 990 970 1000 1010 1070 990 1020 1050 10eo 1030 1040 1060 E L R K V GA E V E E 6 H D Y I R I T P P A K L 0 H A D I G T Y N D H RM A M C TATCAC GAOCTACSTAMOTGTB CBCTGATCOAGASCCACTATATTCG 13CCGCCGGCGAAGCTCCAACACGCGG ATATTB GCACG TACAACGACCACCGTATOOCGATGTBT 1200 1190 1160 1100 1120 1130 1110 1140 1170 1090 1150 1160 F
S
L
V
A
L
S
D
T
P
V
T
I
L
D
P
K
C
T A
K
T
F
P
D
Y
F
E
0
L A
R
M S
T
P A
TTCTCACT(3TCBCACTGTCCGATACBCCAOTCACGATCCTGGACCCTA TGTACCGCAACGTTCCCTGATTATTTCGAACACTGGCBCOMGATTACBCCTBCCTAATTCTTC 1210
1220
1230
TOTTGCGCCA 1330
*
To whom
correspondence
should be addressed
1240
1250
1260
1270
1290
1290
1300
1310
1320