Proc. Natl. Acad. Sci. USA Vol. 87, pp. 633-637, January 1990 Biochemistry

Chemical synthesis of the thymidylate synthase gene (gene synthesis/cassette mutagenesis/protein engineering)

SHANE CLIMIE AND DANIEL V. SANTI Departments of Biochemistry and Biophysics, and Pharmaceutical Chemistry, University of California, San Francisco, CA 94143

Communicated by Stephen J. Benkovic, September 22, 1989

A 978-base-pair gene that encodes thymidylABSTRACT ate synthase (TS; 5,10-methylenetetrahydrofolate:dUMP Cmethyltransferase, EC 2.1.1.45) from Lactobacillus casei has been synthesized and inserted into Escherichia coli expression vectors. The DNA sequence contains 35 unique restriction sites that are located an average of 28 base pairs apart throughout the entire length of the gene. A ribosome binding site was included 9 base pairs upstream from the translation start site and codon usage was adjusted to ensure efficient translation in E. coli. The TS gene is flanked by unique EcoRI and HindIII restriction sites that render the gene portable to any of several E. coli expression vectors. Catalytically active TS encoded by the synthetic gene is expressed in large amounts (10-20% of the soluble protein) and is indistinguishable from that isolated from L. casei. The utility of the synthetic gene for mutagenesis is demonstrated by a single experiment in which His-l99 was replaced with 14 different amino acids. Analysis of the mutants by genetic complementation indicates that TS can tolerate a number of amino acid substitutions at that position and shows that His-199 is not strictly required for catalytic activity.

Thymidylate synthase (TS; 5,10-methylenetetrahydrofolate: dUMP C-methyltransferase, EC 2.1.1.45) catalyzes the reductive methylation of deoxyuridine monophosphate (dUMP) to thymidine monophosphate (dTMP) with the concomitant conversion of 5,10-methylenetetrahydrofolate (CH2H4folate) to 7,8-dihydrofolic acid (H2folate) (1). This pathway represents the sole de novo source of dTMP in the cell and therefore TS is essential for DNA synthesis. TS has been widely studied, and much is known about its catalytic mechanism and inhibition. Chemical modification studies have resulted in the identification of several residues that are important for substrate and cofactor binding and for catalysis (2-4). Furthermore, the recent determination of the threedimensional structure of TS from Lactobacillus casei (5) has provided the impetus to use site-directed mutagenesis to examine structure-function relationships. In this report, we describe the design, construction, and use of a synthetic gene based on the amino acid sequence of TS from L. casei. * The synthetic TS gene facilitates the rapid generation of large numbers of site-directed mutations by the simple replacement of small restriction fragments with synthetic DNA duplexes that carry the desired mutations. This approach is made possible by the inclusion of >30 unique restriction sites within the TS coding sequence. The synthetic gene can be used to generate a large variety of mutations ranging from single amino acid substitutions to the replacement of entire structural domains. The use of the synthetic gene is further enhanced by its high level of expression and by the ability to screen TS mutants by genetic complementation in Escherichia coli. The advantages of a synthetic TS gene as a mutagenesis vehicle are demonstrated by the

construction of a series of 15 site-directed mutations at His-199 in a single experiment.

MATERIALS AND METHODS Bacterial Strains. Strain TB-1 [480IacZAM15; ara, A(lacproAB), rpsL, hsdR; provided by T. 0. Baldwin, Texas A&M] was used as the host strain for plasmid-mediated transformations during the assembly of the synthetic TS gene and for the initial isolation of mutants. The Thy- E. coli strain X2913 (AthyA572; a gift from Russell Thompson, University of Glasgow) was used to test plasmids for TS activity by genetic complementation and for the production of recombinant TS. Strain JM101 [supE, thi, A(lac-proAB) (F', traD, proA+B+, lacIqZAM15)] (6) was used to propagate M13 clones used for DNA sequencing. General Methods. Methods for plasmid purification, subcloning, and bacterial transformation were as described (7). Oligonucleotide Synthesis. Oligonucleotides were synthesized at the University of California, San Francisco, Biomolecular Resource Center using an Applied Biosystems 380B DNA synthesizer. Oligonucleotides were purified by denaturing PAGE using 12% acrylamide/8 M urea gels (8). DNA was eluted from the gels using 0.5 M ammonium acetate and was desalted by passage over a Waters Sep-Pack cartridge using methanol/water (60:40) as an eluant. Oligonucleotides >60 nucleotides long were also purified by gel electrophoresis (8% acrylamide/8 M urea) but DNA eluted from the gels was desalted by passage over a Pharmacia NAP-25 column using water as an eluant. Assembly and Characterization of the Gene. The TS gene was assembled from four gene segments that were initially cloned into pUC18 to create plasmids pSCTS1 to -4. Each of the gene segments was constructed with four to six oligonucleotides as follows. Oligonucleotides were phosphorylated in 20-pl reaction mixtures that contained 50 pmol of DNA, 50 mM Tris HCl (pH 8.0), 10 mM MgC12, 5 mM dithiothreitol, 5 ,M ATP, and 10 units of polynucleotide kinase. The reaction mixtures were incubated at 37°C for 30 min and then terminated by heating at 65°C for 15 min. Phosphorylated oligonucleotides were annealed in 10-,l reaction mixtures that included 2.5 pmol of each oligonucleotide and ligation buffer (66 mM Tris HCl, pH 7.6/6.6 mM MgCI2/10 mM dithiothreitol/0.4 mM ATP). The oligonucleotide mixtures were heated at 100°C for 3 min and then allowed to cool slowly to room temperature over a period of 30 min. The annealed fragments were ligated directly into pUC18 by adding 0.5 ,g of suitably digested pUC18, Sx ligation buffer, and 1 Al of 10 mM ATP in a final vol of 20 ,l. Three units of T4 DNA ligase was added and the reaction mixtures were incubated at room temperature for 2 hr. One-half of the ligation mixture was used to transform E. coli strain TB-1 and the cells were plated on LB Abbreviations: TS, thymidylate synthase; IPTG, isopropyl /8D-thiogalactopyranoside. *The sequence reported in this paper has been deposited in the GenBank data base (accession no. M29019).

The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. 633

634

Biochemistry: Climie and Santi

agar containing 20 Ag of 5-bromo-4-chloro-3-indolyl f3D-galactoside per ml, 0.1 mM isopropyl 3-D-thiogalactopyranoside (IPTG), and 100 tug of ampicillin per ml. Plasmid DNA was purified (9) from 20 colonies and characterized by restriction analysis. Four to eight isolates of each segment were examined further by DNA sequencing (10, 11) of M13mpl8 and M13mpl9 subclones. Final assembly of the synthetic TS gene was carried out by the stepwise subcloning of individual gene segments into pUC18 to give pSCTS9. The DNA sequence of both strands of the final construct was confirmed by using M13 subclones of pSCTS9. For regulated expression, the 978-base-pair (bp) EcoRI/HindIII insert from pSCTS9 was also cloned into the tac promoter vector pJF119EH (12) to create plasmid pSCTS11. Protein Purification and Characterization. Fresh overnight cultures of X2913/pSCTS11 or y2913/pSCTS9 were used to inoculate 200 ml of L broth (13) containing 100 Ag of ampicillin per ml. X2913/pSCTS9 was grown at 370C for 19 hr and the cells were harvested by centrifugation. X2913/ pSCTS11 was grown at 37°C until the A600 reached 1.0, and then IPTG was added to a final concentration of 1 mM. After IPTG induction, x2913/pSCTS11 was grown for an additional 18 hr and then harvested by centrifugation. Cell pellets were suspended in 25 ml of 100 mM Tris HCl, pH 7.5/1 mM EDTA and then lysed with a French press. Cellular debris was removed by centrifugation and the crude extract was loaded onto a hydroxylapatite column (2.5 x 8 cm) (Bio-Gel HTP, Bio-Rad) that was equilibrated with buffer A (25 mM KH2PO4, pH 6.9/0.5 mM EDTA). Protein was eluted at 1 ml/min with 150 ml of buffer A followed by a linear gradient (200 ml) to 350 mM KH2PO4, pH 7.0/0.5 mM EDTA. Column fractions of 5 ml were collected and those containing TS (80-125 ml into the gradient) were pooled, dialyzed against 10 mM phosphate, pH 7.0/0.5 mM EDTA, and concentrated with Aquacide II (Calbiochem). Protein concentration was determined by the method of Read and Northcote (14) using bovine serum albumin as a standard for crude preparations or by using E278 1.07 x 105 M-1cm-' for purified preparations (15). TS activity was assayed spectrophotometrically at 25°C using the conditions of Pogolotti et al. (16). One unit of activity is the amount of TS that catalyzes the formation of 1 ,mol of product per min. Proteins were also analyzed by SDS/PAGE using 12% gels (17) that were stained with Coomassie blue R250. Reversed-phase (RP) HPLC analysis was performed with a C8 Aquapore RP-300 (2.1 x 30 mm) column equilibrated in 0.1% trifluoroacetic acid (TFA). Purified protein was applied to the column and eluted at 0.2 ml/min with a linear gradient of 0.1% TFA/water to 0.1% TFA/70% acetonitrile over 45 min. RP HPLC-purified TS was subjected to automated Edman degradation using an Applied Biosystems model 470A protein sequencer at the University of California, San Francisco, Biomolecular Resource Center. Construction and Characterization of His-19 Mutants. A series of single amino acid substitutions of His-199 was constructed by cassette mutagenesis of plasmid pSCTS9. Plasmid DNA (5 ,g) was digested with SnaBI and Nco I to remove a 40-bp fragment that encodes amino acids 193-205 and vector DNA was purified by electrophoresis using 1% low melting point agarose. The 40-bp SnaBI/Nco I fragment was replaced with a 40-bp synthetic DNA duplex containing multiple substitutions on both DNA strands at positions encoding His-199 (see below). The substitutions included equal mixtures of all four bases at the first and second positions of the codon and an equal mixture of G and C at the third position. Mutagenic oligonucleotides were annealed in 10-plI reaction mixtures that contained 1, 5, 10, 50, and 100 pmol of each unphosphorylated oligonucleotide under conditions identical to those described for assembly of the gene.

Proc. Natl. Acad. Sci. USA 87 (1990)

The annealed DNA fragments were ligated with 0.5 Ag of gel-purified vector DNA as described above and one-half of each ligation reaction was used to transform strain TB-1 to ampicillin resistance. The resulting colonies were pooled by flooding the plates with 3 ml of L broth and collecting the cell suspension with a sterile pipette. Plasmid DNA was prepared from the pooled cells and used to transform E. coli strain X2913 (Thy-), which was then plated on LB agar containing 100 Ag of ampicillin per ml and 50 ug of thymidine per ml. Plasmid DNA was prepared from individual X2913 transformants and His-199 mutants were identified by dideoxynucleotide sequencing using plasmid DNA as a template. His-199 mutants were further characterized by their ability to grow on minimal agar in the absence of thymidine. TS synthesis was examined by SDS/PAGE (17) of crude cell extracts that were prepared by sonication of 2-ml overnight cultures grown in L broth containing 100 ,xg of ampicillin per ml and 50 ,ug of thymidine per ml.

RESULTS Design of the Synthetic TS Gene. The design ofthe synthetic TS gene was based on the amino acid sequence of TS from L. casei and used the following strategy. First, a computer program, PROTORES (18), was used to reverse translate the amino acid sequence (19-21) into a degenerate DNA sequence. Second, the locations of at least 450 potential restriction endonuclease sites having recognition sequences >4 bases were noted. A DNA sequence that includes 33 unique sites located approximately every 29 bp throughout the entire length of the coding region was then selected from the large number of possibilities. Only those sites that are recognized by commercially available restriction enzymes and that are not located in pUC18 (except in the polylinker region) were included in the final sequence of the gene. Third, the codon usage of the resulting TS gene was modified to include those triplets that are utilized in highly expressed E. coli genes (22-25) while retaining the largest possible number of unique restriction sites. In some cases, suboptimal codons were used either to allow the inclusion of unique restriction sites or to

R 350 bp

B

B 203 bp

S 315 bp

S

pSCTS1

pSCTS2

pSCTS3

pSCTS4

P 104 bp H

P

H

P

s

pSCTS5

B

P

S

H

pSCTS6

R

B

S

P

H

pSCTS9

FIG. 1. Assembly of the synthetic TS gene. A total of 20 oligonucleotides were used to assemble four gene segments (pSCTS1 to -4) that varied from 104 to 350 bp in length. DNA fragments corresponding to the gene segments were isolated and used to create the functional TS gene (pSCTS9) via stepwise ligation into pUC18 as

shown. R, EcoRI; B, BamHI; S, Sal I, P, Pst I; H, HindIII.

Several isolates of each of the gene segments were characterized by DNA sequencing. Based on the sequencing of several isolates of each gene segment, an overall mutation frequency of -3 per 1000 bp synthesized was observed. A single-base deletion at position 558 was present in all of the segment 3 isolates. Assembly of the TS gene (lacking base-pair 558) was carried out by stepwise ligation ofthe four gene segments into pUC18 to create the plasmid pSCTS9A558. This plasmid carries the entire synthetic gene and ribosome binding site on a 978-bp fragment that is flanked by EcoRI and HindIII restriction sites. The full-length gene carried a single-base deletion at position 558, which was present in all of the segment 3 isolates. To obtain a functional TS gene, the mutant plasmid (pSCTS9A558) was digested with Sal I and Mst II and the resulting 27-bp fragment was replaced by a DNA cassette that repaired the single-base deletion at position 558 to give pSCTS9. The functional gene was further modified by removing a redundant Pst I site at position 844 of pSCTS9 using a 19-bp Sac II/Pst I fragment that carried a silent T to C change at position 844 to create pSCTS9APst. Two additional unique restriction sites were added by replacing the 91-bp Nru I/Aft II fragment with a synthetic DNA fragment that

preclude redundant sites. Finally, a ribosome binding site AGGAGG (26, 27) was added 9 bases upstream of the coding region to direct the initiation of translation in E. coli. The sequence adjacent to the ribosome binding site included an A at position -3 relative to the ATG, and the spacer region (-1 to -8) was made A+T rich to reduce potential mRNA secondary structure in the vicinity of the translation start site (27). Addition of a ribosome binding site made the synthetic TS gene portable to any of a number of commonly available plasmid vectors that carry inducible E. coli promoters. Gene Synthesis. The gene was initially constructed as four segments (pSCTS1, pSCTS2, pSCTS3, pSCTS4) using the plasmid pUC18 as a cloning vector (see Fig. 1 and below). A total of 20 synthetic oligonucleotides that varied from 40 to 120 nucleotides in length were used to assemble the four segments (Fig. 2). The ends of the segments were chosen from restriction sites present in the polylinker of pUC18 to allow stepwise assembly in that vector. Segments 1-4 were, respectively, a 350-bp EcoRI/BamHI fragment composed of six oligonucleotides (pSCTS1), a 203-bp BamHI/Sal I fragment composed of four oligonucleotides (pSCTS2), a 315-bp Sal I/Pst I fragment composed of six oligonucleotides (pSCTS3), and a 104-bp Pst I/HindIII fragment composed of four oligonucleotides (pSCTS4). -19 16 46

36

AATTCAGGAGGTAATAATT GI y GGC CCG

Me t

Ar

G0

u

L eu

CTG GAC Gi n 76 226 CAG GTC 96 Ty r 286 TAC ATG 116 A/ a 346 GCC COG

196 586 216 646

236 706 256 766 276 826 296 886 316 946

GI n

I I e

ATC

TAG

Hi s

Phe

Ly s

CAC GTG

AAG

TTC

AAA TTT

Ph e

As p

L ev

TTC GAC TTA AAG CTG AAT L ys

AAA

TTT

Pr o

Thr

Hi s

Thr

GAT CTA

CGT GCA

ACG CAC ACT TGC GTG TGA

Se r

L y s

GI y

Ph e

Pr o

L eu

GGC TTC CCG CTG CCG AAG GGC GAC Se r GI u L e u L e u Tr p Ph e L eu Hi s AGC GAG CTG CTG TGG TTC CTG CAC TCG cTC GAC GAC ACC AAG GAC GTG AGC AAA TCG TTT

Ar

GTI

CG0 GCG

As n AAC TTG

CAC GTG

ATC TAG

Hi s

Gl y

Pr o

A s p

Me t

Hi s

I I e

GCT

CAC GGG CCC GAC ATG GTG CCC GGG CTG TAC Ve I Tyr Hi s GI u G0 u GTT TAC CAC GAA GAA CAA ATG GTG CTG CTT A/ a Ly s Ty r Gl y Asp GCT AAA TAC GGT GAC

Se r

Ly s

Al/

Ar g

Ty r

CCG GGC

CAC

Hi s

As p

Pr o

Tr p

A s p

Th r

A s p

TGG GAT ACC CTA

L

u

a

As p

CAG CCG TAC CTG GAT GGC ATG GAC CTA

GAA CTT

56 166

176 526

GI u

GTC

ATG CGC TAC

156 466

L e u

ATG CTG GAA

ITCCTCCATTATTAA TAC GAC CTT

106

136 406

Me t

635

Proc. Natl. Acad. Sci. USA 87 (1990)

Biochemistry: Climie and Santi

GI y

GGT CCA

Thr

ACT TGA

L eu

Al a

L y s

CTC GCG AAA

AAA

GAG

CGC

TTT

Tyr TAC ATG

AGC TCG

ATC TAG

AAG

L y s

Va I

Ser

Leu

Th r

Th r

Ly s

GAC

TGG

TGG

TTT

Th r

As n

CTG ACC ACC AAA

G0

y

GGT CCA Ph .

As p

GAT CTA

ACC AAT TGG TTA

I I e

L y s

TTT

Ph.

TTC

AAG GTA TTC CAT i rl

Ar g

Se r

GI n

L y s- A-s p

Pr o

Ly s

As p

Ar g

L

GIy

L eu

Va I

Ty r

GI y

S

A s p

GI n

L eu

Gl y

A s p

Va I

I I

Va I

GI n CAG

GTG GTC

Ph e

GI y

AAG

CCG

TTC GGC

L

#u

Le u

Ph e

Ar g

ACT GAC TTC GGT CAC CGC AGC CAG AAG |GAT CCA TGA CTG AAG CCA GTG GCG TCG GTC TTC CTAFIGT As p

Hi s

CAC

TTC GCG AAG CGC

AAA TTT

Ph e

p

GAA CTT Al a

Tr p Ve I TGG GTT ACC CAA

TTC GAC GAC CGI AAG CTG CTG GCA

s

CTA

Scr AGC TCG 0G u GAG CTC

L y s

AAA TTT

Ph e

A

GAT

Ly s

AAA TTT

Al a

Pr o

CCA

GGT

eu

GAT

TTC AAG

G00

GAA CTT

Me t

G0 y GGC CCG

L

CTA

CGC

TTC AAG

ATG GCC TAC CGG

CAA

ATT TAA

Gl u Tr p Al a GAG TGG GCT CTC ACC CGA Ph e Gl y Hi s

Gl u

Va I

GTT

CTG

GAC As p

GAT CTA

CTG GAC

G0

u

GTT CAA

CTG CAC GAC GAC GCG TTC GAC GTG CTG CTG CGC AAG

L eu

Hi s

As p

As p

Al a

Ph e

r

GI n

Tr p

Ar g

Al a

Tr p

Hi s

Th r

a

GI u

GI n

I I e

L y s

Th r

Hi s

Pr o

Th r

Me t

Al a

L e u

CTIG GGA CTA GTT TAC GGT TCC CAG TGG CGC GCT TGG CAC ACT CGA CGA TTT ATT CCA CTG GAC CCT GAT CAA ATG CCA AGG 0ITC ACC GCG CGA ACC GTG TGA 01 y

As p

Th r

AGC AAA GGT GAC ACT TCG TTT CCA CTG TGA

I I e

ATC GAT CAG CTG GGT GAC GTT ATC GAA CAG ATC TAG CIA GTC GAC CCA CTG CAA TAG CTT GTC TAG -r' A r g L e u I I e Va I Se r A I a Tr p A s n Pr o G0 u A s p Va I Pr o Ty r S r TAC TCC CGIT CGA CTG ATC GTT TCC GCT TGG AAC CCT GAG GAT GTT CCG ATG AGG OCA qTWjGAC TAG CAA AGG CGA ACC TTG GGA CTC CTA CAA GGC A 1I n '.' 'Thr 'IIP li 5 I-r PPra u Vrru 4,s n r Iai I lyy L ySS L WU L e u i y r ui e A S 1:fl, n Fi A S .,p ni e u renhVd I y r UV a II Fi p aLACT CAG TTT GTA GAIT ,

CCG GGC

CCG TGT GGC ACA

Ty r

GI

n

CAC

GTG

Arg$ S.

r

TAC CAG AGA TCT ATG GTC TCT AGA L#

u

Th

r

Hi

s

L*

u

TGA

GAC

CTG

TAC ATG GTC

Al a GCT

As p

II

CGA Va I

e

GAC ATC TTC CTG TAG AAG Ala& Hi s G0 u

CTG ACC CAC CTG GTT GCT CAC GAC TGG GTG GAC CAA CGA GTG G/ y A s p Al a Hi s L e u Ty r Va I GGC GAT GCG CAC CTT TAC GTT CCG CTA CGC GTG GAA ATG CAA Pr o CCG GGC

Ar

Pr

GCC

GGA

0Gg CGCT o

Me t L y s A s p ATG AAA GAT TAC TTT CTA

A/ a

GCA CGT

Pr o

CCG

Ph e

Th

r

ACT

Le u TTG

CTA GAT

CCG

Leu

CTG GAC

dTi

I

GTT

CAA

AAC

TTO

Pr

o

CCG

GGC

CTA

Ph e TTT AAA

As n

TTA

n Leu

Ly s

L e

GAC

GAC

GGC 000

AAG

CT0 GAO

As

II e

Al a

n

IT

ATC AAA

I I e

Ly s

G0

TAG

TTT

CTT

Pr o

As p

GGC

GAT CTA

Ly s AAA TTT

CAT GTA

CTG

GI

n

ACC ATG GCT CTG

TGG TAC CGA GAC

I * L i~i n 9IU s e.r r L ou L e v Gi AGC TTG 010 CAG CTG TOG AAC PJAC Se r Ty r Al a L u

ATC GCT AGC TTG TAG CGA TCG Gl y G0 u Ph e I I . GAG TTC ATC GGC 010 C00 AA-[G AAC

CAG GTC

As p

GAC

CA|G CTG AAT CCG

TTIl

GAG

Va

y GGC

Hi s

I I e

Le u

G0

u

CAC

TAG

u

CAT

Le

TTG GTG

As n AAC

TGA _A[ATC

CTG CTG

ATG

Cy s Gl y L e u 0G u Va I GAA TGT GIGC CTC GAG GTT CTT ACA COG GAG 010 CAA

GGC

ATC AAA

TAC

AAA

AAA ACT CAC CCG TTT TGA GTG GGC

Hi s

u

GAA As p

GAT CTA

G0

n

CAG GTC

II e

ATT TAA

A sI Ty r A s p Pr o Ty r Pr o Al a I I e Ly s AA|C TAC GAT CCT TAC CCG GCC ATC AAG ATG GGA CTA ATG GGC CGG TAG TTC TTrG

TAT

ATA Hi s

GCA

CGT

TTG AAC

Th r

Ph.

CAC ACC TTT 010 TGG AAA

Leu CTG GAC

Se r TCC AGG

Ar g CGC GCG

Th r ACC TGG

Ph e TTC AAG

As p GAC

CTG

Ph e TTC AAG

As p GAC CTG

A/ a

Pr o CCG GGC

Va I GTT CAA

Al

GCC CGG

a GCT CGA

Oc h Val T AA A| GT C AA AT T T

FIG. 2. DNA sequence of the synthetic TS gene carried in pSCTS9. Arrows indicate the ends of oligonucleotides used to assemble the TS Large arrowheads indicate the ends of gene segments (see Fig. 1). Jagged lines between the DNA strands show the cohesive ends of

gene.

overlapping oligonucleotides. The amino acid sequence is shown above that of the DNA. Numbers on the left refer to amino acids (upper) and DNA (lower). Position 1 of the DNA sequence corresponds to the first base of the ATG initiation codon.

636

Biochemistry: Climie and Santi rA

-.

Proc. Natl. Acad. Sci. USA 87 (1990) Ne)

a

04

) NC 0

CO

0

n

N mm

I

*a

a

100

*a

a

B

A

a*

300

200

C

a

*

E

D

am

400

*

U

a

mia * i ma|

a

F G

100

600

H

v

N

hC _D e at

)

500

CO X)

N U) 0 N OD OD _ _ O, 0X 0e cc0

o

)

iv

200

700

iii

J

U

l

i

800

ii

*

*

*

900

951

K

300

FIG. 3. Restriction map of the synthetic TS gene carried in pSCTS13. (Upper) Positions of 35 unique restriction sites. Numbers refer to the DNA sequence as described in the legend to Fig. 2. (Lower) Secondary structure features of TS aligned with the DNA sequence. Capital letters refer to a-helices; lowercase letters refer to /3-strands; numbers refer to the amino acid sequence (5).

encoded Xba I and SpI I sites at positions 36 and 67, respectively, to create pSCTS13. This modification also removed an Aha III site at position 55. A restriction map indicating the positions of 35 unique restriction sites in the TS coding region of pSCTS13 is shown in Fig. 3. TS Gene Expression. Transformation ofE. coli strain X2913 (which carries a deletion of the thyA gene) with pSCTS9 allowed for growth on minimal agar lacking thymidine, showing that the plasmid was directing the synthesis of catalytically active TS. SDS/PAGE and activity measurements of crude extracts (0.3 unit/mg) showed that TS was expressed to a level of -10% of the total soluble protein. Regulated high-level expression of the synthetic TS gene was achieved by cloning the 978-bp EcoRI/HindIll fragment from pSCTS9 into the trp-lac (tac) promoter plasmid pJF119EH (12) to create plasmid pSCTS11. Although pSCTSl1 was able to complement the Thy- phenotype of X2913, expression was low in the absence of IPTG (0.05 unit/mg). Upon induction with IPTG for 18 hr, expression of TS in soluble extracts increased to 0.53 unit/mg. SDS/PAGE analysis of crude cell extracts (Fig. 4) and measurement of specific activities indicated that pSCTS11 directed TS expression to a level of =18% of total cellular protein upon induction with IPTG. Purification and Characterization of the Gene Product. TS encoded by pSCTS9 and pSCTS11 in X2913 was purified by hydroxylapatite chromatography. The yields from 200-ml cultures of pSCTS9 and pSCTS11 were 11 mg (78%) and 5 mg (70%), respectively, of TS that was estimated to be >95% pure by SDS/PAGE. The catalytic properties of purified recombinant TS were indistinguishable from those of the authentic L. casei enzyme (15, 28); the kinetic parameters were 3.3 units/mg, kcat = 2.7 sec', Km for dUMP = 4.3 gM,

B X

R R L K C R L M A A N RAmWM

A c P ..4

4O-

FIG. 4. SDS/PAGE analysis of TS synthesis from X2913. Arrows indicate plasmid-encoded TS. (A) Lane C, crude extract from X2913/pSCTS11 after induction with IPTG; lane P, purified TS isolated from X2913/pSCTS11. (B) Crude extracts from various His-199 mutants. Lanes: X. X2913 transformed with pUC18; R, Arg; L, Leu; K, Lys; C, Cys; M, Met; A, Ala; N, Asn; Am, stop; W, Trp.

and Km for (6R,S)-CH2H4folate = 21 uM. N-terminal sequence analysis of the first five amino acids was in agreement with the published sequence of TS from L. casei (19-21). Construction of His-l99 Mutants. To test the utility of the synthetic TS gene, we constructed a series of mutants at His-199 by cassette mutagenesis (29). The 40-bp SnaBI/Nco I fragment in pSCTS9 was replaced with a synthetic DNA cassette that carried a mixture of 32 possible codons (represented by NNG/C) that encode all 20 amino acids and one stop codon at position 199. DNA sequencing of 40 isolates resulted in the identification of 14 different amino acid substitutions and an amber stop codon (TAG) at position 199. The 15 mutants arose at a frequency close to that expected on the basis of the codon distribution in the mutagenic DNA cassette (data not shown). The wild-type histidine codon, CAC, was not found in the 40 plasmids sequenced, indicating a mutagenesis efficiency of >97% in this experiment. Nine of the His-199 mutants [Leu, Lys, Arg, Met, Asn, Trp, Tyr, Glu, Am (stop codon)] were unable to grow on minimal agar lacking thymidine. The six remaining mutants (Pro, Ser, Gly, Ala, Thr, Val) were able to complement the thyA deletion in X2913, indicating the presence of catalytically active TS. Enzyme assays of crude extracts confirmed TS activity in Thy+ mutants and lack of activity in those that were Thy(data not shown). Analysis of crude cell extracts by SDS/ PAGE showed that all 14 mutant plasmids encoding an amino acid substitution at position 199 directed the synthesis of a 37-kDa protein that comigrated with TS. The 37-kDa protein was absent in extracts from untransformed X2913 and extracts from X2913/pSCTS-H199Ter, a mutant that has a TAG termination codon at position 199.

DISCUSSION Our current understanding of the structure and catalytic activity of TS has been the result of many years of biochemical study. In an effort to extend these studies, we are using a genetic approach that involves the generation of a large number of site-directed mutations within the coding region of the TS gene. As a step toward that goal, we have designed and constructed a synthetic gene based on the amino acid sequence of TS from L. casei. The resulting gene product has been characterized and the synthetic gene has been used to construct a series of mutations at His-199 using cassette mutagenesis. The construction of a synthetic gene allows the design of a DNA sequence with several features that are not present in the natural isolate. The synthetic TS gene described in this paper includes 35 unique restriction sites, whereas the natural L. casei TS gene carries 11 unique sites, several of which are clustered within a short region (21). The restriction sites in the synthetic TS sequence were chosen to maximize the number of evenly spaced, unique sites throughout the entire length of the coding region. Also, we included only those

Biochemistry: Climie and Santi restriction sites that are cleaved by enzymes that are commercially available and whose recognition sequences are >4 bases long. To ensure a high level of gene expression in E. coli, translation initiation signals and codon usage were optimized. Finally, the design of the gene included unique flanking restriction sites that made the entire construct portable to any of several commonly available E. coli expression vectors that carry inducible promoters. The large number of silent changes that were necessary to create restriction sites and to modify the codon usage had no detectable effect on the properties of the encoded protein (see Results). The strategy we used in the assembly of the synthetic TS gene reduced the number of steps that are normally associated with the construction of a gene of this size. The number of steps was minimized by using oligonucleotides that were an average of 98 bases long. Also, the cohesive ends of the overlapping oligonucleotides were 20 bases long (see Fig. 2), a feature that allowed us to assemble DNA fragments up to 330 bp in a single reaction. Furthermore, this strategy enabled us to ligate those fragments directly into plasmid vectors without prior ligation or purification steps. In this way, a gene of close to 1000 bp was assembled in four steps. Mutations that arose due to synthetic errors were repaired by cassette replacement after final assembly of the gene (see

Results). The utility of the synthetic gene for mutagenesis was demonstrated by the construction of a series of amino acid substitutions of His-199 by cassette mutagenesis. His-199 is located in the vicinity of the active site and is a candidate for an "essential" general base catalyst in the TS reaction (5). To test this hypothesis we prepared a mutagenic oligonucleotide cassette in which the codon for His-199 was replaced by a mixture of 32 codons that encode 20 different amino acids and 1 stop codon. Plasmid DNA from 40 isolates was sequenced and 14 different amino acid substitutions and a stop codon were identified at that position. The ability of the mutant proteins to direct the synthesis of catalytically active TS was examined by genetic complementation in E. coli. Six of the mutants (Gly, Ala, Ser, Thr, Val, Pro) were able to grow on minimal agar in the absence of thymidine, an indication that catalytically active TS was being synthesized. However, the other nine mutants [Leu, Lys, Arg, Met, Asn, Trp, Tyr, Glu, Am (stop codon)] did not grow under the same conditions, suggesting that the TS synthesized in these cells was incapable of providing sufficient thymidylate to sustain growth. In general, it appears that TS can tolerate substitutions of His-199 with residues having small side chains. When hydrophobic residues are ranked by size in the order Gly, Ala, Ser, Cys, Thr, Val, Ile, Leu, Met, Phe, Tyr, Trp (30), five of the six active mutants (Gly, Ala, Ser, Thr, Val) can be further classified as having small, hydrophobic side chains. The observation that at least six different residues can be substituted at position 199 without complete loss of activity allows us to conclude that His-199 is not strictly essential for TS function. The construction of a synthetic gene encoding L. casei TS will enable us to generate a large number of site-directed mutations at several positions in the coding region. The high level of TS expression from plasmids that carry the gene will facilitate the purification of large quantities of mutant proteins for biochemical and structural studies. This combina-

Proc. Natl. Acad. Sci. USA 87 (1990)

637

tion of genetic and biochemical techniques could be used to address a broad range of questions relating to the structure and function of the enzyme. We thank Brenda Andrews and Jeff Edman for comments on the manuscript. This work was supported by U.S. Public Health Service Grant CA14394 from the National Cancer Institute.

1. Santi, D. V. & Danenberg, P. V. (1984) in Folates and Pterins, eds. Blakely, R. L. & Benkovic, S. J. (Wiley, New York), Vol. 1, pp. 345-398. 2. Lewis, C. A., Jr., Munroe, W. A. & Dunlap, R. B. (1978) Biochemistry 17, 5382-5387. 3. Cipollo, K. L. & Dunlap, R. B. (1979) Biochemistry 18, 55375541. 4. Rosson, D., Otwell, H. B. & Dunlap, R. B. (1980) Biochem. Biophys. Res. Commun. 97, 500-505. 5. Hardy, L. W., Finer-Moore, J. S., Montfort, W. R., Jones, M. O., Santi, D. V. & Stroud, R. M. (1987) Science 235, 448-455. 6. Messing, J. (1983) Methods Enzymol. 101, 20-79. 7. Maniatis, T., Fritsch, E. F. & Sambrook, J. (1982) Molecular Cloning:A Laboratory Manual (Cold Spring Harbor Lab., Cold Spring Harbor, NY). 8. Maxam, A. & Gilbert, W. (1977) Proc. Natl. Acad. Sci. USA

74, 560-564. 9. Birnboim, H. C. & Doly, J. (1979) Nucleic Acids Res. 7, 1513-1523. 10. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl. Acad. Sci. USA 74, 5463-5467. 11. Tabor, S. & Richardson, C. C. (1987) Proc. Natl. Acad. Sci. USA 84, 4767-4771. 12. Furste, J. P., Pansegrau, W., Frank, R., Blocker, H., Scholz, P., Bagdasarian, M. & Lanka, E. (1986) Gene 48, 119-131. 13. Lennox, E. S. (1955) Virology 1, 190-206. 14. Read, S. M. & Northcote, D. H. (1981) Anal. Biochem. 116 53-64. 15. Santi, D. V., McHenry, C. S. & Sommer, H. (1974) Biochemistry 13, 471-480. 16. Pogolotti, A. L., Weill, C. & Santi, D. V. (1979) Biochemistry 13, 2794-2798. 17. Laemmli, U. K. (1977) Nature (London) 227, 680-685. 18. Martinez, H. M. (1985) Sequence Analysis Programs Manual (Biomath. Comput. Lab., Univ. of California, San Francisco). 19. Maley, G. F., Bellisario, R. L., Guarino, D. U. & Maley, F. (1979) J. Biol. Chem. 254, 1288-1295. 20. Maley, G. F., Bellisario, R. L., Guarino, D. U. & Maley, F. (1979) J. Biol. Chem. 254, 1301-1304. 21. Pinter, K., Davisson, V. J. & Santi, D. V. (1988) DNA 7, 235-241.

22. An, G. & Friesen, J. D. (1980) Gene 12, 33-39. 23. An, G., Bendiak, D. S., Mamelak, L. A. & Friesen, J. D. (1981) Nucleic Acids Res. 9, 4163-4172. 24. Gouy, M. & Gauthier, C. (1982) Nucleic Acids Res. 10, 7055-7074. 25. de Boer, H. A. & Kastelein, R. A. (1986) in Maximizing Gene Expression, eds. Reznikoff, W. & Gold, L. (Butterworth, Stoneham, MA), pp. 225-285. 26. Shine, J. & Dalgarno, L. (1974) Proc. Natl. Acad. Sci. USA 71, 1342-1346. 27. Stormo, G. D. (1986) in Maximizing Gene Expression, eds. Reznikoff, W. & Gold, L. (Butterworth, Stoneham, MA), pp. 195-224. 28. Leary, R. P. & Kisluik, R. L. (1971) Prep. Biochem. 1, 47-54. 29. Wells, J. A., Vasser, M. & Powers, D. B. (1985) Gene 34, 315-323. 30. Bashford, D., Chothia, C. & Lesk, A. M. (1987) J. Mol. Biol. 196, 199-216.

Chemical synthesis of the thymidylate synthase gene.

A 978-base-pair gene that encodes thymidylate synthase (TS; 5,10-methylenetetrahydrofolate:dUMP C-methyltransferase, EC 2.1.1.45) from Lactobacillus c...
1MB Sizes 0 Downloads 0 Views