Proc. Nati. Acad. Sci. USA Vol. 75, No. 12, pp. 5765-5769, December 1978

Chemistry

Chemical synthesis of genes for human insulin (synthetic genes/oligonucleotide synthesis/phosphotriester method/high-performance liquid chromatography)

ROBERTO CREA*, ADAM KRASZEWSKI, TADAAKI HIROSE, AND KEIICHI ITAKURA Division of Biology, City of Hope National Medical Center, Duarte, California 91010

Communicated by Ernest Beutler, October 2,1978

ABSTRACT A -rapid chemical procedure has been developed and used for the synthesis of 29 oligodeoxyribonucleotides to build synthetic genes for human insulin, The gene for insulin B chain, 104 base pairs, and the one for A chain, 77 base pairs, were designed from the amino acid sequence of human polypeptides. They bear single-stranded cohesive termini for the EcoRI and BamHI restriction endonucleases and are designed to be inserted separately into a pBR322 plasmid. Ihe synthetic fragments, deca- to pentadecanucleotides, were synthesized by a block phosphotriester method with trinucleotides as building blocks. Final purification was by high-performance liquid chromatography. All 29 oligonucleotides were pure and had the correct sequences. In a previous paper we reported production of a functional peptide hormone, somatostatin, in Escherchia coi from a gene of chemically synthesized origin (1). This result demonstrated that a gene, designed from an amino acid sequence, can be synthesized and expressed to produce the peptide in E. coli. However, extension of this technology to production of biomedically valuable polypeptides such as human insulin and growth hormone seemed to be limited by the difficulty of synthesizing longer genes in a reasonable time. Recent improvements in the synthesis of oligodeoxyribonucleotides by the phosphotriester method, such as the rapid synthesis of trimers (2) and the block synthesis of oligonucleotides (3), together with the extensive use of high-per-

formance liquid chromatography for analysis and purification of the DNA fragments, have dramatically reduced the time necessary for construction of DNA fragments. In this paper we report the synthesis of 29 oligodeoxyribonucleotides of defined sequence that can be assembled to form genes for human insulin B and A chains. The overall design of the gene for its incorporation into the plasmid pBR322 was similar to that used for the somatostatin gene (1). Fig. 1 shows the amino acid and nucleotide sequences for human insulin A and B chains. These genes will be fused to E. coli ,-galactosidase gene on plasmid pBR322. Transformation of E. coli with the chimeric plasmid DNA will lead to the synthesis of hybrid polypeptides, including the sequence of amino acids corresponding to human insulin A and B chains. Because human insulin has no methionine residues in the amino acid sequence, the A and B chains can be obtained by cyanogen bromide cleavage of the precursors. After the separate production and purification of the A and B chains, active human insulin will be generated by in vitro formation of the correct disulfide bonds between A and B chains (4, 5). The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "ad-

vertisement" in accordance with 18 U. S. C. §1734 solely to indicate this fact.

MATERIALS AND METHODS Chemical synthesis of oligodeoxyribonucleotides Most of the materials and methods used for the synthesis of oligodeoxyribonucleotides were described earlier (6, 7) except for the following modifications. (i) The fully protected mononucleotides, 5'-0-dimethoxytrityl-3'-p-chlorophenyl-,6-cyanoethyl phosphates, -were syn-

thesized from the nucleoside derivatives with a new monoTable 1. Synthesis of trimer building blocks Com- Yield,t Present RF Purity,t No. pound* % a b % in 1 AAG 47 0.15 0.40 93 B5, B6 2 AAT 49 0.25 0.52 95 Hi, Al, A6 3 AAC 52 0.28 0.55 93 H5, B6, A2, A8 4 ACT 43 0.27 0.53 91 B4, B5, A6 5 ACC 56 0.33 0.30 96 B7 6 ACG 39 0.18 0.45 90 H5, B7 7 AGG 45 0.10 0.26 89 H6, H7, B9 8 AGT 33 0.14 0.40 96 B9, A2, All 9 AGC 50 0.19 0.48 92 H8, Bl, A5, A10 10 AGA 48 0.24 0.50 91 A9 11 TTC 44 0.26 0.52 95 B4, B7, A3 12 TTG 49 0.11 0.31 94 H3, H5, A2, A3, A5 13 TCT 58 0.24 0.49 96 A4 14 TCA 45 0.28 0.53 92 Hi, H2, H4, Al 15 TCG 39 0.12 0.34 91 A2 16 TGG 32 0.10 0.28 87 H3, Al, A10 17 TGC 51 0.18 0.47 93 H6, B2, A4, A7, A8 18 TGA 46 0.12 0.37 94 H7 19 TAC 61 0.22 0.50 90 B4, All 20 TAA 55 0.17 0.44 95 B5, A10 21 CCT 53 0.30 0.55 97 H3, H4, B10 22 CAC 47 0.25 0.51 92 A3 23 CAA 58 0.25 0.51 93 H2, H6, H8, A7 24 CTT 41 0.28 0.54 92 B2, B9, A4 25 CGA 40 0.27 0.52 93 A7 26 CGT 75 0.25 0.50 89 H2, H4, B3, Bi 27 GGT 35 0.09 0.26 90 B3 28 GTT 46 0.18 0.45 93 B2 29 GTA 38 0.25 0.50 95 B6, B8, A6 30 GAA 39 0.15 0.39 88 H7, B3, B8, A5 31 GAT 52 0.22 0.49 89 B10, A9 32 GCA 42 0.14 0.39 93 A9 * Fully protected trideoxynucleotides: 5'-O-dimethoxytrityl-3'-pchlorophenyl-2-cyanoethyl phosphate. t Overall yield calculated from the 5'-hydroxymonomers. I Based on high-performance liquid chromatographic analysis. *

5765

Present address: Division of Organic Chemistry, Genentech, Inc., 460 Point San Bruno Boulevard, South San Francisco, CA 94080.

5766

Chemistry:

Crea et al.

Proc. Nati. Acad. Sci. USA 75 (1978) B-chain gene

2

1

Met Phe Vol

3

4

5

12

13

14

Asn Gin His Lou Cys Gly

Ser His Lou Vol

Glu

Ala

11

10

9

8

7

6

18

19 20

Lou Tyr Lou Vol

Cys Gly

17

16

15

21

22

23

25

24

26

27

29

28

30

Glu Arg Gly Phe Phe Tyr Thr Pro Lys Thr

stop stop 6 * 4 * * 7 * e H3 -* , -*Z ~ O. -* -B 8 a33 a H2 "I A ATTCATGT T CGTCAATCAGCACCT T TGTGGT TCTCACCTCGT TGAAGCTT TGTACCTTGT TTGCGGTGAACGTGGT T T CT TC TACACTCCTAAGACTTAA TAG .

2 --------

4

GT ACA A GC A\GT TA GTC GTGGA AAC ACCAAAGAG TGGAGC AA CT T CGA-A ACATGGAACA AACGCCACTT GCAC CAAAAGAAGA TGTGAGG AT TCT GA ATT ATCCTAG Hv

>z

,

8

be

s

*

H.

. *e

H.

o.&|

H-

b

@

Hind lI l

,

-

Bam I

A-chain gene 2

Met Gly

3

4

Ite Val Glu

5

6

Gin Cys

17 18

20

21

Cys Thr Sor lie Cys Sor Leu Tyr Gin Lou Glu Asn Tyr Cys

Asn

7

8

9

10

EcoRI

11

12

13

14

15

16

19

stop stop

AATTCATGGGCATCGTTGAACAGTGTTGCACTTCTATCTGCTCTCTTTACCAGCTTGAGAACTACTGTAACTAATAG GTACCCGTAGCAACTTGTCACAACGTGAAGATAGACGAGAGAAATGGTCGAACTCTTGATGACATTGATTATCCTAG

Bam I FIG. 1. Design and synthesis of human insulin genes. The genes for human insulin, B chain and A chain, were designed from the amino acid sequences of the human polypeptides. The choice of the codon for each amino acid has been discussed (1). The 5' ends of each gene have singlestranded cohesive termini for the EcoRI and BamHI restriction endonucleases for correct insertion of each gene into plasmid pBR322. A HindII endonuclease recognition site was incorporated into the middle of the B-chain gene for the amino acid sequence Glu-Ala to allow amplification and verification of each half of the gene separately before construction of the whole B-chain gene. The B-chain and the A-chain genes were designed to be built from 29 different oligodeoxyribonucleotides, varying from decamer to pentadecamers. Each arrow indicates the fragment synthesized by the improved phosphotriester method, H1-H8 and Bl-B12 for the B-chain gene, and Al-All for the A-chain gene. The genes are designed to be expressed in E. coli separately; then human insulin is generated by disulfide-bond formation between B-chain and A-chain peptides in vitro.

functional phosphorylating agent, p-chlorophenyl-ft-cyanoethyl phosphorochloridate (1.5 M equivalent) (unpublished data) in acetonitrile in the presence of 1-methylimidazole (8). The products were isolated in large scale (100-30 g) by preparative liquid chromatography (Prep 500 LC, Waters Assoc., Milford, MA). (ii) By using the solvent extraction method (2), we synthesized 32 bifunctional trimers (see Table 1) at 5-10 mmol and 13 trimers, 3 tetramers, and 4 dimers as the 3'-terminus blocks at about 1 mmol. The homogeneity of the fully protected trimers was checked by thin-layer chromatography on silica gel in two methanol/chloroform solvent systems: solvent a, 5% (vol/vol), and solvent b, 10% (vol/vol) (see Table 1). Starting from these compounds, we synthesized 29 oligodeoxyribonucleotides of defined sequence (Table 2), 18 for the B-chain gene and 11 for the A-chain gene. The scheme and the reaction conditions for the condensation of trinucleotides to build dodecanucleotides is shown in Fig. 2. Purification and identification of synthetic

oligodeoxyribonucleotides

High-performance liquid chromatography was used extensively during synthesis of the oligonucleotides for analysis of each trimer and tetramer block, analysis of the intermediate fragments (hexamers, monamers, and decamers), analysis of the last coupling reaction, and purification of the final products (see Fig. 3). The chromatography was performed with a SpectraPhysics 3500B liquid chromatograph. After removal of all protecting groups by concentrated NH40H at 50°C (6 hr) and 80% AcOH at room temperature (15 min), the compounds were analyzed on a Permaphase AAX (Du Pont) (9) column (1 m X 2 mm) with a linear gradient of buffer B (0.05 M KH2PO4/1¶0 M KCI, pH 4.5) in buffer A (0.01 M KH2PO4, pH 4.5). The gradient was formed by starting with buffer A and applying 3% of buffer B per min. The elution was at 600C, with a flow

rate of 2 ml/min. The yield of each oligonucleotide, calculated by high-performance liquid chromatographic analysis and the retention time of the pure fragments, is reported in Table 2. The 29 final oligonucleotides were also purified on Permaphase AAX under the same conditions reported above. The material in the desired peak was pooled, desalted by dialysis, and lyophilized. After the 5' termini were labeled with [-y-32P]ATP by using T4 polynucleotide kinase (10), the homogeneity of each oligonucleotide was checked by electrophoresis on a 20% polyacrylamide gel. Most of the sequences were confirmed by two-dimensional sequence analysis (10) of their partial venom phosphodiesterase digestion products (Fig. 4).

RESULTS AND DISCUSSION Our approach to the construction of polynucleotides is the block-coupling method rather than the step-by-step approach (Fig. 2). Trinucleotides were used mainly as the starting building blocks (trimers). For synthesis of the trimers, we have reported the solvent extraction method, which simplifies the purification steps and increases the speed and yields of the trimer synthesis (3). The 45 different trimers necessary for construction of the insulin genes were synthesized in 3 months with yields in the range of 30-60% (see Table 1). In the construction of the fully protected polynucleotides from the blocks, separation of the products from the starting material is more difficult as the length of the chain increases and depends on the base composition of the polynucleotides. This problem was overcome by using a 1.5 M excess of 3'phosphodiester component 3 (Fig. 2) to drive the coupling reaction almost to completion with the aid of a coupling reagent, 2,4,6-triisopropylbenzenesulfonyl tetrazolide (11), and by passing the reaction mixture through a short silica gel column, which traps only the unreacted charged component (Figs. 2 and 3). Without further purification steps, block couplings were

Chemistry:

Proc. Nati. Acad. Sci. USA 75 (1978)

Crea et a). B

B

n1

11 n \4J-P-OCE

_p

DMT

B

B

B

5767

B

0T 04-O~

DMT

0

0

0

0

0

R

R

R

R

R

1 2

/ 3N

B

B

B

B

B

00

111

111

PI \_-Pp

*)MT~

HO,

+

OAn 0

0

0

R

1

3 + HO -

-P-OCE

p 0

R

R

R

R

R

R

5

4

1) TPSTe

1) TPSTe 2) Slica gel 3) Et3N

B

B

B

lol

2) Ht 3) Silica gel

B

B

B

B

11

p

OP

o

R

R

R

0

0

B

B

HO\ O

P-O-

R

R

N

11

OP

P

R

R

R

R

-OAn

0

0

R

0

O

0

0

0

B

B

0

111111

11

-P 0

R

B bl

lol

l 11

p

11

DMT

B

1Il

~I

_p

0

0

00 R

I _ p_

B

B

B

b0

R

7

1) TPSTe 2) Silica gel 3) NH4OH 4) AcOH

B

B

B

B

B

B

Il1f11

HO O0

R=

o

O-

Oo~

o

B N

11

I

-

B

B 11If

B 1

l I OO0 0

0NO-

B

B 1

O0-

1

O0-

CI

B = Protected and

0 An

-C

DMT

=

nonprotected

bases

DOMe

dimethoxytrityl

TPSTe =

S

0 CE = -cyonoethyl

NXN

N

-

FIG. 2. Improved method for chemical synthesis of oligonucleotides. The basic units used to construct polynucleotides are two types of trimer block, the bifunctional trimer I and the 3'-terminus trimer 2. The bifunctional trimer 1 was hydrolyzed to 3'-phosphodiester component 3 with pyridine/triethylamine/water (3:1:1 vol/vol) (unpublished data) and also to the 5'-hydroxyl component 4 with 2% benzenesulfonic acid. The 3'-terminus block 2, protected by an anisoyl group at 3'-hydroxy, was treated with 2% benzenesulfonic acid to give the 5'-hydroxyl block 5. The coupling reaction of an excess of the 3'-phosphodiester trimer 3 (1.5 M equivalent) with the 5'-hydroxyl component 4 or 5 (1 M equivalent) in the presence of 2,4,6-triisopropylbenzenesulfonyl tetrazolide (TPSTe, 3-4 equivalents) went almost to completion in 3 hr. To remove the excess of the 3'-phosphodiester block 3, we passed the reaction mixture through a short silica gel column set up on a sintered glass filter. The column was washed, first with CHCl3 to elute some side products and the coupling reagent, and then with CHCl3/MeOH (95:5 vol/vol) in which almost all of the fully protected oligamer was eluted. Under these conditions, the charged compound 3 remained in the column. Similarly, block couplings were repeated until the desired length was constructed.

repeated in the same way until the desired length was constructed. Starting from the trimers, in 3 months we synthesized 29 different oligodeoxyribonucleotides, ranging from decamer to pentadecamer, to build genes for human insulin. In some cases, for construction of the desired length of oligonucleotides, trimers were coupled with monomers and dimers to prepare tetramer and pentamer blocks, respectively. The fully protected oligonucleotides were deblocked completely and the unmasked oligomer was purified by high-performance liquid chromatography. The purity of each fragment appeared to be more

than 95%, judged by electrophoresis on acrylamide gel and by two-dimensional sequence analysis (Figs. 4 A and B). The chemically synthesized oligonucleotides have been ligated and the separate insulin genes cloned and expressed in plasmid pBR322. The details of this work will be published in a subsequent paper (12). Sequence-specific polynucleotides have proven to be useful biological tools (13, 14). The major drawbacks in the synthesis of polynucleotides were the slowness of each step, the low yields of coupling reactions, and the purity of the final products. Significant improvements at several stages of the synthesis and

Chemistry:

5768

Crea et al.

Proc. Natl. Acad. Sci. USA 75 (1978)

A

Table 2. Yield and retention time of synthetic oligonucleotides

D

Compound H1 H2 H3 H4 H5 H6 H7 H8

E

c

Bi I

1

I

9

5

13

B2 B3 B4 B5 B6 B7 B8 B9 B10 (A12)

1--

.

I

17

5

1

9

13

17

U) N

B

E

C14

I

-1

-T----T-F-t-r---T-

1

5

9

13

17

1

5

9

13

17

21

Al A2 A3 A4 A5 A6 A7 A8 A9 A10 All

Sequence

AATTCATGTT CGTCAATCAGCA CCTTTGTGGTTC TCACCTCGTTGA TTGACGAACATG CAAAGGTGCTGA AGGTGAGAACCA

AGCTTCAACG AGCTTTGTAC CTTGTTTGCGGT GAACGTGGTTTC TTCTACACTCCT AAGACTTAATAG AACAAGGTACAA ACGTTCACCGCA GTAGAAGAAACC

AGTCTTAGGAGT GATCCTATTA AATTCATGGGC ATCGTTGAACAGTG TTGCACTTCTAT CTGCTCTCTTTACC AGCTTGAGAACT ACTGTAACTAATAG CAACGATGCCCATG TGCAACACTGTT

AGAGCAGATAGAAG AAGCTGGTAAAG GTTACAGTAGTTCTC

Yield, %*

Rt, mint

51 53 49 58 37 33 45 54 42 48 54 22 35 37 48 55 52 37 26 27 52 41 53 50 34 45 35 52 48

12.2 13.8 13.0 13.8 13.0 12.8 15.0 12.5 12.8 14.0 14.0 13.6 15.1 15.8 14.8 14.0 14.5 12.5 13.5 16.1 13.9 14.2 13.9 15.8 15.2 13.2 15.0 14.4 15.7

* Based on high-performance liquid chromatographic analysis of the final solid products and estimated on the 3'-terminus blocks. t Retention time of the pure fragment. F

C

the extensive use of high-performance liquid chromatography for analysis and purification of the products have overcome these problems. The results described above show that the chemical synthesis of polynucleotides may no longer be the rate-limiting step in an overall project to produce biologically interesting genes. With these improvements the chemical synthesis of polynucleotides, together with recombinant DNA methods, will provide a powerful tool for production of medically important polypeptide hormones and even enzymes.

C

Iq

1

5

9 13 Time, min

17

1

5

9

13

17

21

Time, min

FIG. 3. High-performance liquid chromatographic analysis of synthetic route for the. deoxypentadecamer GTTACAGTAGTTCTC (All). The reaction between the 3'-phosphodiester trimer (AGT) (0.16 mmol, Rt = 7.5 min) and the 5'-OH, 3'-anisoyl heptamer (AGTTCTC) (0.10 mmol, Rt = 9.5 min) (A, before addition of the coupling reagent) led to the synthesis of the fully protected decamer (AGTAGTTCTC) (Rt= 12.5 min) almost ir| a quantitative yield after 2 hr (B). After purification of the products through a short silica gel filter, the decamer was recovered as a solid in 92% yield with a purity of 75% (C). This latter compound was treated with 2% benzenesulfonic acid (to remove the 5'-protecting group) and coupled with the 3'-phosphodiester pentamer (GTTAC) (0.11 mmol, Rt = 10.5 min) (D). After 4 hr (E), the reaction was stopped and the desired fully

protected pentadecamer (GTTACAGTAGTTCTC) (Rt

=

15.7

min)

was purified on the silica gel column and isolated as a solid (yield, 48% as pure product based on chromatographic analysis). Finally, this material (5-10 mg) was deblocked and passed through the high-performance liquid chromatograph. The material in the major peak was

pooled, evaporated, desalted by dialysis, and lyophilized. The purity of the unblocked pentadecamer (GTTACAGTAGTTCTC) (All) was reconfirmed by chromatographic analysis (F).

We thank Leonor Directo, Lillian Shih, Eddie Huang, and Parkash Jhurani for their assistance in various aspects of the project. This research was supported by contracts from Genentech, Inc., to the City of Hope National Medical Center. 1. Itakura, K., Hirose, T., Crea, R., Riggs, A. D., Heyneker, H. L., Bolivar, F. & Boyer, H. W. (1977) Science 198, 1056-1063. 2. Hirose, Y., Crea, R. & Itakura, K. (1978) Tetrahedron Lett. 28, 2449-2052. 3. Crea, R., Hirose, T. &, Itakura, K. (1978) Tetrahedron Lett., in

press.

4. Lubke, K. & Klostermeyer, H. (1970) Adv. Enzymol. 33,445525. 5. Du, Y.-C., Zhang, Y.-C., Lu, Z.-X. & Tsou, C.-E. (1961) Scientia Sinica 10, 84-95. 6. Itakura, K., Katagiri, N., Narang, IS. A., Bahl, C. P., Mariaus, K. J. & Wu, R. (1975) J. Biol. Chem. 250,4592-4600. 7. Itakura, K., Katagiri, N., Bahl, C. P., Wightman, R. H. & Narang, S. A. (1975) J. Am. Chem. Soc. 97,7327-7332. 8. Van Boom, J. H., Burgess, P. M. T., Crea, R., Luyten, W. C. M. M. & Reese, C. B. (1975) Tetrahedron 31,2953-2959.

Chemistry:

Crea et al.

Proc. Natl. Acad. Sci. USA 75 (1978) Origin

A

5769

B

G

c

T

T

G

'WI

_

*

lb,

4

A

_4

G A Abb., I

%I.

Al

A2 A3 A4 A5

AS

A?

A8 At AIO All

2

A

c

T

FIG. 4. Polyacrylamide gel electrophoresis (A) and sequence analysis (B) of synthetic polynuclebtide fragments. Each oligonucleotide purified by high-performance liquid chromatography was phosphorylated at the 5' terminus with [-y-32P]ATP by T4 polynucleotide kinase. The radioactive oligonucleotides were analyzed by electrophoresis on 20% polyacrylamide gels under denaturing conditions (7 M urea) and autoradiographed (A). The autoradiogram of the synthetic fragments of the A series (Al-All) is shown. Similar results were obtained with the H and B series (not shown). The two-dimensional sequence analysis of most of the synthetic oligonucleotides was performed by the published procedure (10), and the expected sequence pattern was obtained. (B) Sequence of the dodecamer, AGCTTGAGAACT (A5).

9. Van Boom, J. H. & DeRooy, J. F. M. (1977) J. Chromatog. 131, 169-177. 10. Jay, E., Bambara, R., Padmanabhan, P. & Wu, R. (1974) Nucleic Acids Res. 1, 331-353. 11. Stawinski, J., Hozumi, T., Narang, S. A., Bahl, C. P. & Wu, R. (1977) Nucleic Acids Res. 4, 353-372. 12. Goeddel, D. V., Kleid, D. G., Bolivar, F., Heyneker, H. L., Yan-

D. G., Crea, R., Hirose, T., Kraszewski, A., Itakura, K. & Riggs, A. D. (1979) Proc. Natl. Acad. Sci. USA 76, in press. 13. Heyneker, H. L., Shine, J., Goodman, H. M., Boyer, H. W., Rosenberg, J., Dickerson, R. E., Narang, S. A., Itakura, K., Lin, S. & Riggs, A. D. (1976) Nature (London) 263, 748-752. 14. Scheller, R. H., Dickerson, R. E., Boyer, H. W., Riggs, A. D. & Itakura, K. (1977) Science 196, 177-180. sura,

Chemical synthesis of genes for human insulin.

Proc. Nati. Acad. Sci. USA Vol. 75, No. 12, pp. 5765-5769, December 1978 Chemistry Chemical synthesis of genes for human insulin (synthetic genes/ol...
1MB Sizes 0 Downloads 0 Views