J. Mol. Biol. (1991) 217, 737-750

Structure of Cyclodextrig Glycosyltransferase Refined at 2.0 A Resolution Claudio Klein and Georg E. Schulz? Institut

fiir Organ&he Chemie und Biochemie der Universitd Albertstrasse 21, D-7800 Freiburg i. Br., Germany

(Received 27 August

1990; accepted 5 November

1990)

The previously reported structural model of cyclodextrin glycosyltransferase (EC 2.4.1.19) from Bacillus circulans has been improved. For this purpose the known sequence was built into an electron density map established by multiple isomorphous replacement and subsequent solvent-flattening at 2.5 A resolution. The resulting model was refined at 2.0 a resolution using a simulated annealing refinement method. Based on 70,171 independent reflections in the range 7.0 to 2.0 L%resolution, a final R-factor of 17.6% was obtained with a model obeying standard geometry within 0.013 a in bond lengths and 2.7” in bond angles. The final model consists of all 684 amino acid residues, two calcium ions and 588 solvent molecules.

residues with Mr 74,416. Here, we describe the improvement of the three-dimensional model by extending the resolution to 2.0 A, incorporating the known sequence and refining the structure to convergence.

1. Introduction Cyclodextrin glycosyltransferases (CGTaseS, EC 2.4.1.19) are monomeric enzymes that catalyze the reversible 1,4-a-D-glucopyranosyl transfer reactions: G .eG,-,+cG, (cyclization, coupling), G,+G,+G,-,+G,+, (disproportionation),

2. Materials

(a) Crystallization The preparation and crystallization of CGTase from wild-type B. circulans strain no. 8 followed est,ablished lines (Hofmann et al., 1989). The design, production and crystallization of the mutant enzyme Ser428+Cys has been described by Klein et al. (1990). Crystals belong to the orthorhombic space group P2,2,2, with cell dimensions a = 948 a, b = 104.7 a, c = 114.0 A. There is 1 molecule per asymmetric unit. The solvent content is

where G, and G, are a( 1 + 4).glucopyranosyl chains of length n and m; x is the transferred chain part. with ring size denotes a cyclodextrin c&z 1986). Cyclodextrins are x = 6,7,8. . . (Bender, used in pharmaceutical, food and agricultural products. With their hydrophilic outer side and hydrophobic cavity they can serve as micro-encapsulators for stabilizing light and air-sensitive as well as volatile compounds (Szejtli, 1988). The backbone model of CGTase from Bacillus ciwulans strain number 8 has been reported by Hofmann et al. (1989). It contained 664 amino acid residues fitted to an electron density map at 3.4 d resolution (1 .& = @l nm) without sequence knowledge. Meanwhile, the sequence of this CGTase has been established by Nitschke et al. (1990), demonstrating that the polypeptide consists of 684

68%. (b) Data collection and processing Native data were collected to 2.0 a resolution using the EMBL beam line Xl1 at the DORIS storage ring, Hamburg. The data were recorded on X-ray film (Reflexwith 2 films/pack using a modified 25, CEA) Arndt-Wonacott oscillation camera. At a wavelength of 1.49 a a total rotation angle of 80” was measured using 4 crystals. Films were digitized on an Optronics Photoscan P-1000 microdensitometer at the Biozentrum Basel, using an absorbance range of 0 to 2.8 and a raster size of 50 pm. A second native data set was collected on an image plate detector to 2.2 A resolution at the EMBL beam line X31 at DESY. Only 1 crystal was used to record a rotation range of 90” of high resolution data (1.5”/image) and a range of 90” of low resolution data (3”/image). This

t Author to whom all correspondence should be addressed. $ Abbreviations used: CGTase, cyclodextrin glycosyltransferase; m.i.r., multiple isomorphous replacement; r.m.s., root-mean-square; 0, standard deviation.

0022-2836/91/040737-14 $03.00/O

and Methods

737

0 1991 Academic Press Limited

738

C. Klein

and 0. E. Schulz

Table 1 Statistics

Crystal

Data collection

Native

Film Image plate Diffractometer Image plate Diffractometer ilrea detector

K,PtCl, eis-Pt§ Mutantll

of data collection

Resolution (4

Ri”J (%)

&,,t (%) 147 10.5

2.0 2.2 34 2.5 5.9 3.5

12.1 7.4 91 10.6

Sumber of reflections

Completeness of data set (%)

65,193 56,294 16.913 38.644 3651 11,120

88 97 100 97 100 77

w h ere i runs through the symmetry-related reflections. t Rsym is defined as &Jl(i.hkZ) - (Z(hkZ))l/za,hkr(I(hkl)), $ R,,, is defined as Z.ZIFnkl-F,,~I/Z(~~,,+E”,,~), where F hkl and F,,ki are the structure factor amplitudes of symmetry-related reflections on zones hkl and hki. 5 cis-Pt, cis-(NH,),PtCl,. 11The mutant Ser428+Cys of CGTase (Klein et al., 1990) was soaked with methylmercury acetate.

was further used to measure a range of 20” of high and 20” of low resolution data also in the blind region. The wavelength was 1.06 8. The tot,al measuring time was 14 h. Also the K,PtCl, derivative was measured at the X31 beam line. In order to maximize the anomalous scattering effects and to minimize radiation damage a wavelength of 1.06 a near the absorption edge of platinum was chosen. A range of 90” of high and a range of 90” of low resolution data were collected on 1 crystal in the same way as with the native crystal. the reduced using data were Synchrotron MOSFLM/IMAGES program package. All intensity data collected on image plate as well as on film were merged together giving rise to the final native data set with 70,862 unique reflections from 387,934 recorded reflections in the resolution range 10 to 2.0 8, with Rsym= 11.9%. The dat,a were 96% complete, there were no cutoffs. The dependence of Rsym on resolution is plotted in Fig. 2. Missing reflections in the resolution range co to 3.4 a were filled up with data measured on our 4-circle diffractometer (modified model P2,. Nicolet, U.S.A.). The K,PtCl, derivative data set contained 38,644 signals unique reflections with 34,505 anomalous extracted from 181,105 measurements in the resolution range 10 to 2.5 8; with Rsym= 1.4%. The intensities were converted to structure factors using the program TRUNCATE (French & Wilson, 1978). Intensity data for the cis-(NH,),PtCl, derivative were collected on the diffractometer at 6°C using N-filtered CuK, radiation as described by Thieme et al. (1981). The crystal

methylmercury acetate derivative of the mutant Ser428 +Cys was measured to 3.5 A resolution on a Siemens XlOOA multiwire area detector mounted on a rotating anode generator (Elliott GX-18) at the EMRL, Heidelberg. These data were processed with the program package XDS (Kabsch, 1988). Stat’istics of data collection are listed in Table 1.

(c) m.i.r.

phases and solvent jlattening

For the m.i.r. analysis, crystals were soaked with 38 heavy-atom compounds differing from the 22 tried by Hofmann et al. (1989). Among these, the cis-(NH,),PtCl, soak contributed to phase determination as specified in Table 2. Further contributions came from a mercury derivative of the mutant Ser428 -tCys and of a remeasurement of the K,PtCl, derivative of Hofmann et al. (1989) to a resolution of 25 a including anomalous scattering signals. The normal difference-Patterson as well as the anomalous difference-Patterson of the new K,PtCl, derivative data could be interpreted in terms of the 3 sites given by Hofmann et al. (1989). As expected, the difference-Patterson of the mercury derivative of the mutant enzyme showed a single site which could be assigned to the newly introduced Cys428. The 2 cysteine residues in the wild-type sequence at positions 43 and 50 are not available as they form a disulfide bridge. The binding sites of the cis-(NH,),PtCI, derivative were found by a difference-Fourier synthesis.

Table 2 Rejinement

of heavy-atom

Soaking conditions Heavy-atom compound

parameters

Fractional co-ordinates

Concn (m-N

Time (days)

Resolution (4

R,,,t (%)

K,PtCl,f

1.0

2

25

16.7

cis-(NH,),PtCl,

50

Cocrystal

59

15.9

MMA§

I.0

2

3.5

17.8

5

Y

2

0.851 0120 0.950 0.851 0,115 0.468

0898 0.313 0456 0898 @308 0168

0.153 0.093 0.087 0,152 0.086 0.242

Relative Temperature occupancy factor (%) Gi) 84 51 37 98 36 100

35 59 99 47 85 23

Phasing power F/E in ranges (A) m-5.9

jg-3.5

3.2

35-2.5

28

1.3 1.9

t R,,, = 2 x EIF, - F.$Z(E”, + F2), where Fl and F2 are the structure factor amplitudes of derivative and native deta sets, $ Anomalous scattering data were collected at a wavelength of 1.06 A. 0 MMA, methylmercury acetate. Derivative from mutant enzyme Ser428+Cys (Klein et al., 1990).

1.2 respectiveiy,

2.1

Structure of Cyclodextrin

739

Glycosyltransferase

Figure 1. Quality of the solvent-flattened electron density map at 25 A resolution as exemplified density at residues 430 to 435 of strand /I17 in domain C. The contours are drawn at 1.1~. However, both binding sites of this derivative were identical with sites of the K,PtCl, derivative. With hindsight, the 3 binding sites of the K,PtCl, derivative can be assigned to residues Met115 Met145 and His233, respectively, while cis-(NH,),PtCl, was bound to Met115 and Met145 The heavy-atom parameters were refined and phases were determined with the method of Dickerson et al. (1968) using the program PHARE from the CCP4 package. The mean figure of merit out to 25 A resolution was 0.56. The corresponding m.i.r. map was good enough to allow chain tracing. In order to improve the map, solvent-flattening and phase combination (Wang, 1985) were applied to the m.i.r. phases using the program version described by Leslie (1987). The unit cell was sampled at 108 x 120 x 136 grid points. The radius of the averaging sphere was 7 A. The solvent level was iteratively adjusted to 50% (4 refinement cycles), 55% (3 cycles) and 60% (7 cycles). The final figure of merit was 679. The quality of the solvent-flattened electron density map is demonstrated in Fig. 1. (d) Course of rejinement For refinement by simulated annealing we used the program XPLOR (Briinger et al., 1987) following the protocol given in Table 3. The refinement was started at

the resolution of the m.i.r. map (2.5 A) in accordance with the suggestion of Hendrickson (1985). The temperature factors of all atoms of the starting model were set to 25 8’. The initial R-factor between observed and calculated structure factors was 584%. After 3 rounds of refinement the resolution was extended to 2.2 A, and after the 9th round to 2.0 A. After each round the whole model was re-evaluated on a display system (model PS330, Evans & Sutherland, U.S.A.). In order to avoid early biasing toward the current model, refitting was done to the m.i.r. map until round 7. At this stage the phases calculated from the model reproduced clearly the heavyatom positions of the derivatives, which indicated that the model was essentially correct.

by the electron

One significant deviation from the starting model was recognized in a o,-weighted (2m x FO- D x FJexp(ia,) map (Read, 1986) in domain E, where segment 625 through 658 had to be shifted by 2 to 3 residues. After

Table 3 Protocol of a round of XPLOR re$nement in the resolution range 7.0 to CO A Stage 1 2

3 4 5 6

7

Description Determination of weights WA and WPt Minimization: 10 steps conjugate gradient refinement using soft repulsive potential followed by 50 steps with CHARMM non-bonded potential WA= 453,480 kcal/molf, WP= 65,206 kcal/(mol rad’), B=25A2, AF=@05§ Molecular dynamics 65 ps, !I’ = 2000 K, timestep = 1 fs, AF = 0.25 A Molecular dynamics 0.25 ps, T = 300 K, timestep = 1 fs, AF=O2 A Minimizationll, 100 steps with AF = 0.01 A 15 cycles of individual B-factor refinement with standard deviations between B-factors of bonded atoms restrained to 1.5 AZ and between B-factors of atoms connected by an angle restrained to 2.0 A2 Minimization, 20 steps with AF = 001 A

One round required 69 min of central processor unit time on a Cray Y-MP at the HLRZ Jiilich, F.R.G. t WA, weight for the effective energy term accounting for the diffraction data, E,(XRAY). WP, weight for the effective energy term accounting for the phase information, E,(XRAY). These weights relate the X-ray effective energy terms E,(XRAY) and E,(XRAY) to the empirical potential energy.

$ The values for WA and WP are given for the last round of the refinement. 1 cal = 4.184 J. 5 AF is a limit. Any atom movement exceeding AF signals that the first derivatives of the effective energy E,(XRAY)+ E,(XRAY) have to be recalculated. 11For stages 5, 6 and 7 WP was set to zero.

740

C. Klein

and G. E. Schulz

Table 4 Course of structural

re&ement

using the simulated 1 10-25

Number or rounds Resolution range (A) Number of reflections R-factor (Yo) Number of solvent molecules r.m.s. bond length deviation (A) r.m.s. bond angle deviation (deg.) r.m.s. dihedral angle deviation (deg.) r.m.s. improper angle deviation (deg.)

37,665 447 0.036 7.4 292 4.6

round 9, also the segment of residues 180 to 203 was detected to be out of register. It was shifted by 1 residue. At an R-factor of 246% (round 10) for all reflections

between 7.0 and 2.0 A resolution, no more displaced residues could be detected. At this stage, major peaks in the (m x F,, - D x F,)exp(ia,) difference density map revealed 2 Ca ions as well as solvent molecules as highest peaks. Minor peaks in this map indicated small side-chain shifts. All solvent molecules were inserted manually. After cycle 19, refinement was halted at an R-factor of 17.6%. The r.m.s. deviation is 0013 d in bond lengths and 2.7” in bond angles. The final model contains 5267 protein atoms (i.e. all non-hydrogen atoms), 2 Ca ions and 588 water molecules. The final (m x F, - D x F,)exp(ia,) difference density map shows no significant peaks that could be assigned to further solvent molecules or to shifts of protein atoms. The course of the refinement is shown in Table 4. The refinement was calculated on the Cray Y-MP at the HLRZ Jiilich, F.R.G.; all other calculations were done on a MicroVAX-II and on a VAX-Station 3100 (Digital Equipment Corp., U.S.A.).

3. Results and Discussion (a) Model accuracy

The model co-ordinate error can be estimated from the R-factor. In Figure 2 the R-factor is plotted as a function of resolution and related to

0.1

0.2

0.3

0.4

o-5

annealing

4 8-22 55,738 37.8

XPLOR

10

11

19

7-2.0

7-2.0 70,171

7-2.0 70,171

19.0

17.6

70,171 246

535 0014 3.3 285 1.2

co19

0.030 56 27.5 2.6

theoretical

program

4.2 267 1.2

R-factor

588 0013 2.7 264

1.1

curves based on model inaccur-

acies (Luzzati, 1952). A comparison with theoretical lines indicates an upper limit for the average coordinate error of 0.22 d. Above 2.5 d resolutioil, the R-factor increases because of limited data accuracy. In the range 2.2 to 2.0 8, Rsrm reaches 56%. The r.m.s. co-ordinate error can also be derived from the slope of the o,-plot (Read, 1986) as shown in Figure 3. The observed value of 026 A is in good agreement with the Luzzati plot. In the final (2m x FO- D x F,)exp(ia,) map, all atoms of the model have well-defined density, except for one highly flexible loop region at residues can also be derived from 656 to 659. Model accuracy the distribution of main and side-chain eonformational angles (see below). (b) Chain

conformation

A good indicator for the stereochemical correctness of the main-chain conformation of a protein is provided by a (4, II/) scatter plot (Ramachandran & Sasisekharan, 1968). As shown in Figure 4, most of the residues fall into the energetically preferred region. As observed in most other proteins, the bridge region between a-helix and P-sheet around ( -9O”, 0”) is populated by quite a number of residues, most of which are in the (i+2) position of

O-6

sin B/X (l/ii) Figure 2. R-factor as a function of resolution for noncentric reflections (thick line). The broken lines are drawn for a given co-ordinate error estimate according to Luzzati (1952). The Rsymvalue (see Table 1 for definition) for the final native data set is given for comparison (thin line). 2

0.01

0.02

0.03 (sin B/X)’

O-04

0.05

0.06

0

(2)

Figure 3. The dependence of structure quality index oA from resolution according to Read (1986). The dotted line coresponds to an r.m.s. co-ordinate error of 0.26 13. The data for (sin S/1)’ < 001 were neglected.

Structure

of Cyclodextrin

1

Glycosyltransferase

741

360 x

-9u

0

90

180

60

120

phi

Figure 4. Ramachandran plot of main-chain torsion angles 4 and +bfor all 625 non-glycine residues. Residues in the left-handed a-helix region near (60”, 40”) are 9 asparagine residues occurring at the ends of cc-helicesand b-sheets as well as Cys43, Ser90, Phe183, Ala230, Tyr365, Ser567 and Trp614. The 2 residues in the forbidden region near (50”, - 120”) are Ala152 and Tyr195.

reverse turns. Nine of 16 non-glycine residues in the region around (60”, 40”), which corresponds to a left-handed helix, are asparagine residues, corroborating the proposed preference of asparagine for this region (Matthews et al., 1975). Only two non-glycine residues are located in forbidden regions (Fig. 4). These are Ala152 at (44”, - 127”) and Tyr195 at (62”, - 114”). The conformations of both residues are well defined in the electron density map. Ala152 and Tyr195 are at positions i+2 and i+ 1 of reverse turns, respectively. The CB atoms of both side-chains virtually collide with the hydrogen atoms at the peptide nitrogens of the following residues (CB . . . N distances are 287 A and 286 A, respectively). The scatter plot of the (4, $) angles of the 59 glycine residues shows clustering in three regions (not shown here) in agreement with the statistics described by Richardson (1981). There are 22 glycine residues around the left-handed a-helical region. A further 22 glycine residues are found in regions unfavorable for residues with side-chains. Four eis-peptide bonds have been detected in CGTase, all of them occurring before proline residues. &-Pro372 is located at the active center and accounts for a sharp kink in a loop region between strand /?14 and helix a10. The cis-peptide bond is stabilized by a hydrogen bond between Pro372-0 and 375-N. &-Pro505 is located at the end of the first p-strand of domain D (p23). The change of the main-chain direction produced by this bond avoids collision with the side-chain of Tyr247 of helix ~5. The next three residues form a short

180 chi-I

240

300

3 j0

Figure 5. Scatter plot of side-chain torsion angles x1 versus x2 for Leu ( x ) and Ile (0).

parallel sheet with the strand 831 of domain D. Also &-Pro633 is in a sharp loop, which is stabilized by a rather distorted hydrogen bond from 634-N to 631-O. Pro633-0 forms a hydrogen bond with 594-N. No obvious reason can be found for cisPro623. All cis-proline residues are conserved in the known CGTase sequences. The (x1, x2) scatter plot of leucine and isoleucine side-chains is given in Figure 5. The angles cluster well at the staggered conformations showing the quality of the present model. The improvement during the refinement can be derived from a corresponding plot at an earlier stage of the XPLOR refinement: after round 5 at an R-factor of 37.3 O/ein the resolution range 8 to 2.2 A (Table 4), the (x1, x2) plot showed a very broad distribution. The observed frequencies of leucine and isoleucine residues in the nine staggered conformations in Figure 5 are in good agreement with the statistics given by Janin et al. (1978) and the observations of Karplus & Schulz (1987). (c) Secondary

structures

and hydrogen

bonds

In the following discussion, hydrogen bonds were assumed when the distance between the donor and the acceptor was less than 3.5 A and the angle between an N-H bond and an H . . . 0 hydrogen bond was larger than 120”, as suggested by Baker & Hubbard (1984). Secondary structure assignments of P-sheets and a-helices were made manually, inspecting the hydrogen bond list as calculated by the program XPLOR. This assignment turned out to be more reliable for a structure at high resolution than the automatic assignment obtained from the program DSSP of Kabsch & Sander (1983). The assignments are given in Figure 6. The arrangement

742

6. Klein

!? ,,pDTAVTNK 4 SFSTD~EBE"DRFL;G~ PI

and 6. E. Schulz

/' SNNPTGAAY ** "1 ATCSNLKLY A'GGDWQGLIN ' icINDNYFSD I..h"TALWISQ:! HHHHHHH:; HHHHHH EEEEE m

TNGYFHHNG

HHHHHHHHHH a5

EEEEE B11

EEEEEEEE EEEEEEEEE i315 L316

q LAAG$;f$gE;TTAETTPT h E

822

Figure

6. Amino

EEEE HHHHEfiHHHHH

HHHH:FHHH

HHHHHHFttHHHHHH

,23

iy?H

HHHHHHHHzIiHHHHHHHH

m2

E

FSKSTTAFNirISKLAPLRK HHHHHHititHH HHHH

E;,,EE

EEEEEEEEE P17

HHHHHHkltHHHHHHH

Pl6

Pl9

i GHVGPVMGKi' GNWTIDGR"1 FGSTKGTVY i GTTAVTGAA 1 TSWEDTQI K&TIPSVAA::! EEE E;;;EE F22: EEEEEE EEEEEEE "a";:" j?;F EEEEEEEE 830 I.328 !-9 ~25

acid sequence and secondary

structure

assignments

in cyclodextrin glycosyltransferase from to a-amylases around residues 138 (I), 229 (II), 258 (III) and 326 (IV); domain limits are marked (I). Second line, residues involved in t’he binding of Ca ions at Ca-I(*) and Ca-II (*). Third line, sequence (Pu’itschke et al., 1990). Fourth line: manual assignment of a-helicea and b-sheet strands using the hydrogen bond definition of Baker & Hubbard (1984): H, a-helix; E, b-strand. Fifth line, assigned names of the secondary structure elements (see Fig. 7).

B. circulans strain no. 8. Upper line, 4 segments with strong sequence similarity

of the secondary structure elements is presented schematically in Figure 7. Hofmann et al. (1989) have subdivided the cyclodextrin glycosyltransferase molecule into five domains, named A through E. The most prominent feature of the structure shown in the RIBBON-plot of Figure 8 is the (pa), barrel in domain A, which is a TIM-barrel (Banner et al., 1975). All other domains consist mostly of P-sheet structures. As visualized in Figure 7, the g-fold symmetry in domain A is disturbed by additions after p-strands /?2, j35 and p12. The segments of residues 80 to 82 and 102 to 108 in the protuberance between p2 and a2 show a rather irregu1a.r hydrogen bonding pattern. This irregularity prevented us from taking them as P-strands, although they were assigned by the program DSSP (Kabsch & Sander, 1983). Domain B is an insertion in the (/?a)* barrel of domain A. It begins at Asn139 after strand /?5 and ends at His202. Only short segments of secondary structure elements are formed, namely two antiparallel P-sheets consisting of strands fi6 plus p9 and strands 87 plus /JS, respectively, as well as the or-helical segment a3 of eight residues (185 to 192). A third deviation from t.he $-fold symmetry is the additional helix a7 after strand /?12. An interesting

observation is the occurrence of Pro396 within helix a10. Although it interrupts the cc-helix (Pro396-N cannot form a hydrogen bond with Ser392-O), the following four residues fold again as helix (still named ~10) without dramatic change of helix direction. As Pro396 is not conserved within the CGTase family, other CGTases without this proline residue may well have a continuous helix at this position. In contrast, Pro402 is responsible for a, directional change of 90”. Consequently, the following helix al 1 was assigned as a new helix. Pro402 is conserved within the known CGTases. Helix cl11 ends with residues Ala405 at the border between domains A and 6. The following domain C extends from residue 407 to 494 and consists of B-strands that fold as Greek keys (Richardson, 1981). Domain D extends from residue 495 to 580 and adopts an immunoglobulin folding pattern (Schiffer et aE., 1973; Huber, 1976). A different topology is observed in domain E, which comprises residues 581 to 684. Besides the inclusion of few residues in some loop regions in domains B, C and D, no significant alterations had to be done in the chain tracing when compared with the chain fold presented by Hofmann et aE. (1989). However, one reconnection

Structure of Cyclodextrin

743

Glycosyltransferase

581

E Figure 7. A 2-dimensional representation of the secondary structure elements of cyclociextrm giycosyuransrerhG. circles and P-sheet strands by squares (upward) and concentric squares (downward). The Helices are represented s and the residue numbers at the domain borders are indicated. hydrogen bonds in

had to be made in a highly flexible loop around residue 659 in domain E. Thus, the last P-strand, /?39, runs parallel to strand p32, while all other /?-sheet strands in domain E are antiparallel. The

backbone model of CGTase is given in Figure 9(a). The backbone models of domains D and E are given separately in Figure 9(b) and (c), respectively.

Figure 8. Stereo view of a cartoon representation the program RIBBON (Priestle, 1988).

The hydrogen are P-strands

bonds formed between schematically represented

the in

Figure 10. The averages (standard deviations always given in parentheses) of conformational angles 4 and I+!Jare - 114” (31) and + 134” (36) in P-sheets and - 66” (8) and - 39” (8) in a-helices, respectively. The average hydrogen bond distances

of the secondary st,ructure of cyclodextrin

glycosyltransferase

using

744

G. Klein and G’. E. XchuEz

684

684

Figure 9. The chain-fold of cyclodextrin glycosyltransferase given as a CA-backbone model: (a) the whole molecule with 4 conserved segments (see Fig. 6) drawn with thick lines, the Ca ions Ca-I and Ca-II are marked (+); (b) chain-fold of domain D; (c) chain-fold of domain E.

Structure

of Cyclodextrin

745

Glycosyltransferase

*OGN

443

460

470

IN=01 444

479

4;,N ““04k4

*OzN

/323

p25

P39 P29

p2a

p24

D

P30

580 *

Figure 10. A representation of all /3-structures are given

P26

as N

0 distances

p34

p33

036

P37

W’

in domains

A through

E of CGTase. The lengths of all hydrogen

bonds

(8).

and H . . . 0) are 2.95 A (0.1) and 2.01 A (01) in p-sheets and 3.01 A (0.1) and 2.06 A (0.1) in a-helices, re%pectively. The average hydrogen bond angles N-H . . 0 and H . . .8=C are 158” (8) and 151” (12) in P-sheets and 157” (11) and 148” (14) in a-helices, respectively. hydrogen

p35

+O=N

663

(N...O

(d) Side-chain

b32

455

bonds and ionic interactions

Side-chain interactions play an important role in stabilizing proteins. Residues involved in ionic interactions are listed in Table 5. His327 makes two short contacts with glutamate and glutamine residues, to Glu257 via the NE2 atom of the imidazole ring and to Gln19 via the ND1 atom. The bridge between ArglO3 and Glu153 as well as the contact between LyslO6 and Asp159 stabilize loop regions in domain B by a connection to domain A. Other hydrogen bonds are found in domain B which stabilize the irregular folding in this section.

Table 5 Salt bridges in cyclodextrin

Atom 1

Atom 2

Asp3-ODl LysS-NZ AspWOD2 AsplB-ODl LysGO-NZ AspWOD2 Arg103-NH2 LyslOG-NZ Arg227-NH2 Glu257-OE2 Asp282-ODl Asp282-ODI Arg284-NH1 Arg412-NH2 Lys427-NZ Lys509.NZ

Arg519-NH2 Asp224-ODl Lysl31-NZ Lys399-NZ Asp63-OD2 Lys383-NZ Glu153-OEl Asp159-ODl Glu257-OE2 His327-NE2 Arg353-NH1 Arg353-NH2 Asp313-ODl Glu422-OEl Glu495-OEl Asp583-ODl

glycosyltransferase N...O distance (8)

Involved domains

2.83 2.95 291 2.96 2.84 2.70 282 2.85 2.89 295 293 3.07 2.61 265 263 2.97

A-D A-A A-A A-A A-A A-A A-B A-B A-A A-A A-A A-A A-A c-c c-c D-E

C. Klein

100

200

300 400 Residue number

500

600

and G. E. Schulz

7

Figure 11. Mobility of the polypeptide chain. The average B-factor of each residue is plotted against the residue number. Domains are indicated at the top.

One salt bridge has been detected in the core region of the protein without access to solvent molecules. It involves one oxygen atom of the carboxylate group of Asp282 and NH1 as well as NH2 of Arg353. Asp282 occurs at the end of /312 and Arg353 at the end of helix a9. The ionic interaction between Lys427 and Glu495 stabilizes the loop region at the border between domains C and D. The flexible segment connecting domains D and E is stabilized by a salt bridge between Lys509 and Asp583. (e) Chain jiexibility The overall average isotropic temperature factor in CGTase is 23.5 A2 as compared to the value of 18 A2 derived from the Wilson plot (data not shown). The average B-factor of the main-chain atoms is 21.4 A2. This rather low B-factor, together with the very high content of solvent of 68% in the crystal, shows that CGTase is a stiff molecule. As judged from Figure 11, there is only one segment that shows average B-factors for the main-chain atoms above 45 A2, i.e. between residues 656 and 659. In the (2m x F,,- D x FJexp(ia,) map, this part is the only one where the conformations are not entirely defined. This chain mobility has caused the wrong chain tracing given by Hofmann et al. (1989) in that region. Another rather mobile segment with B-factors above 35 A2 is located in a loop between residues 42 and 50, although the disulfide between Cys43 and Cys50 should stabilize this part of the structure. A striking observation is the high B-factors for some residues surrounding the active center (see below), namely in the loop region between residues 89 and 91 and in the reverse turn around Tyr195. This points to conformational changes on substrate binding. High B-factors are also observed in domain C at loops between P-strands and at the entrance to domain D around residue 495. In domain D the loop between residues 567 and 569 seems to be very flexible. In general, all mobile segments are located at the surface of the protein. The most rigid parts of the molecule are deep in the interior in regions not accessible to solvent molecules. In general, there is a

2 Average

3

4

electron

6

5 density

7’

(o-1

Figure 12. Statistics for the 588 solvent molecules modeled as water. The electron density in the final (Zm x E”,-D x F,)exp(iol,) map was determined for each solvent molecule. The number of solvent molecules is given as a histogram over electron density. The average B-factor is plotted as a continuous line.

good correlation between solvent accessibilities.

temperature

factors

and

(f) Solvent structure The final model contains 588 water molecules. At the end of the refinement, all solvent molecules were renumbered according to their electron density in the (2m x FO-D x F=)exp(ia,) map, the lowest numbers corresponding to the highest densities. Since XPLOR does not refine occupancies, and since the occupancies are closely related to temperature factor and electron density, cutoffs for these parameters have been applied to restrict the number of water molecules. All solvent molecules that showed electron density less than 1.00 and B-factors above 70 A2 were deleted from the coordinate list. The distribution of water molecules with respect to their electron density is given in Figure 12 together with the average B-factors. The maximum density of the solvent is at 6.20, the center of distribution corresponds to 2.50 with a B-factor of 43 A2. From the solvent content in the crystal of 68%: the total number of water molecules per subunit can be calculated as 6400. Hence, the included water molecules in the structure represent 8% of the total solvent in the unit cell. There are 144 water molecules that we consider as an integral part of the protein. We defined these water molecules as having B-factors less than 25 A2 and electron densities greater than 3.00, following suggestions of Blevins & Tulinsky (1985) and Karplus & Schulz (1987). A total of 44 of these wat’er molecules are enclosed in the interior of the protein and do not have access to the bulk water. The average B-factor of these firmly bound solvent molecules is 23 AL, which bas to be compared to an overall average of 42 A2 for all assigned water molecules. If we define an inner solvation shell for water molecules lying within a distance of 3-7 A to protein atoms and an outer solvation shell for a,dditional

Structure of Cyclodextrin

Glycosyltransferase

747

Table 6 Interatomic distances at the Ca 2+-binding sites of cyclodextrin glycosyltransferase and a-amylase of Aspergillus niger and Aspergillus oryzae (TAKA-amylase) &-I Distance (A)$ Ligandt Asp199-ODl Asp199-OD2 Asn139-ODl His233-0 Ile190-0 Sol-3 Sol-42 Sol-43

CGTase (Asp/Aspl75) (Asp/Asp175) (Asn/Asnl21) (Glu/His210) (Glu/Glu162)

254 249 224 2.32 2.68 2.17 2.15 2.36

A. niger

2.59 2.79 2.57 2.42 2.63 2.64 254 255

Ca-II A. oryzae

2.77 2.80 2.52 2.43 2.40 235 2.69 2.60

Ligand

Distance (8)

Asp27-ODl Asp53-ODl Asn32-ODl Asn33-ODl Asn29-0 Gly51-0

2.55 2.32 2.21 2.16 2.53 245

t The corresponding residues in a-amylase from A. niger/A. oryzae are in parentheses. $ The distances for cr.amylases from A. niger and A. oryzae at-e from Boel et ~2. (1990).

water molecules within 4.5 a distance to the protein, 519 water molecules are located in the inner shell and 69 belong to the outer one.

(g) The calcium-binding

sites

It has been reported that CGTases need Ca ions for stability and activity (Bender, 1977). In the homologous structure of the a-amylase from porcine pancreas one calcium-binding site has been et al., 1987). Recently, this identified (Buisson calcium-binding site has been described in detail in the refined structures of a-amylases from Aspergillus oryzae and Aspergillus niger (Boel et al., 1990). The corresponding binding site Ca-I of CGTase has been detected in the electron density at a relatively early stage. In addition, a second calcium-binding site, Ca-II, has been found during the course of refinement. The Ca2+-binding sites are located far apart at opposite sides of the active center region (Fig. 9(a)). Both stabilize protein conformations with a comparatively low degree of secondary structure. The residues and solvent molecules involved in chelating the ions are listed in Table 6. Site Ca-I has a distorted octahedral arrangement of ligands (Fig. 13(a)). Four protein ligands form the equatorial plane of the octahedron: Asp199-OD2 (there is also a contact to ODl), Asn139-ODl, His233-0 and Ile190-0, two water molecules (Sol-3 and Sol-42) are at the top ends of the two pyramids. A third water molecule (Sol-43) is very close to Sol-42. Only one negative charge of the carboxylate group of Asp199 is available to compensate the positive charge at the Ca ion. At 11.2 A2, the B-factor of the ion is very low. The B-factors of the surrounding protein atoms have similar values, indicating a very low mobility of this region and a strongly bound Ca 2t . The electron density of Ca-I the highest peak of the constitutes (100) (2m x F, - D x F,)exp(ia,) map, together with the sulfur atom of Met329. Ca-I fixes the geometry at one side of the active center, His233 is in the

conserved region II (Fig. 6). In a-amylases it has been suggested to bind one of the glucose rings at which the glycosidic bond is broken (Matsuura et al., 1984). Residues 190 and 199 are located in domain B. Due to its vicinity to the active center, Ca-I is probably esential for protein activity. In the a-amylases from A. niger and A. oryzae the environment of this calcium ion is nearly identical with that which has been found in CGTase. Again, two main-chain oxygen atoms, one amide oxygen and both carboxylate oxygen atoms of an aspartate form the plane of an irregular octahedron and three water molecules complete the co-ordination sphere. The contact distance differences between CGTases and a-amylases (Table 6) are likely to reflect residual model inaccuracies. The additional calcium-binding site Ca-II in CGTase is located after the first strand (/?l) in the (/LY)~ barrel in a stretch with few hydrogen bonds between main-chain and side-chain atoms. The residues involved in ion binding together with the distances are listed in Table 6. They form an octahedron (Fig. 13(b)). There are four contacting oxygen atoms from aspartate and asparagine sidechains as well as two oxygen atoms from the mainchain. The segment between residues 30 and 52 is rather flexible, as can be seen from B-factors in Figure 11, despite the Ca ion and the disulfide bridge linking Cys43 and Cys50. The observed B-factor of the Ca ion is 22.3 A2, which is in general agreement with the surrounding protein atoms indicating full occupancy. The higher mobility of Ca-II as compared to Ca-I is also reflected by the lower electron density (8.90). (h) Crystal contacts CGTase crystals have a high content of solvent in the unit cell and the molecular contact areas are rather small. Both facts may be responsible for the extremely long time required for crystal growth: one year is not too much to get crystals suitable for high resolution X-ray analysis. Once the incoming molecules have found their contacts, they seem to be

748

C. Klein

and G. E. Schulz

(b

Figure 13. The Ca’+-binding (b) the

calcium-binding

Table 7 Residues involved in hydrogen crystal contacts

Packing interaction

1

(a) The strong

calcium-binding

site Ca-I connecting

site Ca-II in a. loop region after the first P-strand are marked by broken lines (see Table 6).

wea,ker

interactions

sites of CGTase.

Buried surface (6’)

I-II

660

I-III

330

bonds at

Molecule I

Molecule II(II1)

Distance (4

Asn338-ND2 Asn415-ND2 Asn415-ND2 Asn416-ND2 Asn437-ND2 Leu438-0 Ser439-OG &x443-N Glu648-OEl Glu648-OE2 Gly664-0 &x665-OG

Thr596-0 Asp182-ODl Ser184-OG Gln627-OEl Asn626-ODl Gly599-N Asn626-ND2 Ser184-0 Arg103-NH2 Argl03-NH1 LyslO7-NZ ThrlOB-0

2.94 3.07 3.19 3.06 2.96 2.86 3.40 279 2.81 2-81 292 3.57

domains

A and B:

in domain A. The ion :ligand

very well defined, leading to strongly diffracting crystals with a resolution limit of 1.9 A. Each subunit contacts four surrounding molecules that are numbered 11 through V; molecuie I is the reference molecule. In space group P2,2,2,, only two types of molecular interaction are necessary to build a three-dimensional arra,y. The CGTase packing consists of only these two contacts. The same residues as in the contact area of molecules I and II are found in the crystal contact between molecules I and IV. The same applies for the contacts of molecule I to molecules TTP and V. Thus, only interactions I-II and I-III are described in Table 7. Two main-chain-main-chain, one main-chainside-cha.in and five side-chain-side-chain hydrogen bonds are found between domains A and C of molecule I and domains B and E of molecule II (and vice

Structure

of Cyclodextrin

Glycosyltransferase

749

Figure 14. Molecular packing in the crystalline array illustrated with Ca-backbone models. The unit cell is outlined. The reference molecule is given with a thick line together with its neighboring molecules II (above) and III (below), which are related to the reference molecule by the operations: (-9

x {)+(ti)

and

(i

-g

-%),(i[),respectively.

For the sake of clarity neighboring molecules IV and V are not shown. They are related by the same rotation matrices as molecules II and III but with translations (1, - l/Z, l/2) and (- l/2, l/2, 0), respectively. at the contact between molecule I and molecule IV). Only two hydrogen bonds have been detected between domain E of molecule I and domain A of molecule III. This crystal contact, however, is strengthened by a salt bridge between Glu648 (I) and ArglO3 (III). Molecule I together with neighboring molecules II and III are shown in Figure 14. The surface area of molecule I that is buried by all contacting molecules is 1980 A’, as calculated by the program of Kabsch & Sander (1983). This corresponds to 9% of the total accessible surface. The contacts to molecules II and IV are much stronger than the contacts to molecules III and V as judged from the contact areas (Table 7). versa

(i) The active center geometry The reactions catalyzed by CGTases and a-amylases are similar. In a-amylase, the cr(1+4)glycosidic bond of a polysaccharide is hydrolyzed and the produced parts leave the active center. In CGTase, a transglycosylation reaction occurs within one polysaccharide leading to a cyclodextrin. Sequence comparisons with other CGTases and a-amylases show some strongly conserved regions (Binder et al., 1986; Kimura et al., 1987; MacGregor & Svensson, 1989). Here, we account only for the four regions that have been found in at least two of these three papers. In CGTase, these regions comprise residues 135 to 140 (region I, see Fig. 6), 225 to 233 (region II), 257 to 260 (region III) and 323 to 328 (region IV). Region I can be assigned to the calcium-binding site Ca-I (see above). The

residues of regions II, III and IV are located at the earboxy-terminal end of the (@)s barrel and belong to the active center. Residues in region II are involved in the binding of Ca-I as well as in substrate binding in the amylases and may therefore take part in catalysis. For a-amylases, two contradicting proposals have been made with respect to the catalytic residues. From binding studies with substrate analogues and from the importance of the Ca ion for enzyme activity, Buisson et al. (1987) argue that in pig pancreatic or-amylase the catalytic residues are Asp197 (Asp229 in CGTase) and Asp300 (Asp328 in CGTase). Furthermore, they consider region III as less important and point out that only Glu233 (Glu257 in CGTase) of this region is strictly conserved. In contrast, on the basis of binding experiments with maltotriose and a discussion of pK values, Matsuura et aE. (1984) propose the equivalent Glu230 in TAKA-amylase from A. oryzae as a proton donor and Asp297 (Asp328 in CGTase) as a general base. In CGTase, all three equivalences, Asp229 as well as Glu257 and Asp328, are close together and are likely to play a role in catalysis. We will try to establish the active center of the CGTases in detail by binding studies with substrates in the crystal. The co-ordinates will be deposit,ed in the Protein Data Bank.

We thank Dr Bender for discussions and the EMBL Outstation at DESY, Hamburg, for their excellent service on beam-lines X11 and X31 and for their help in data collection.

C. Klein

and G. E. Schulz secondary structures: pattern recognition of hydrogen-bonded and geometrical features.

References Baker, E. N. & Hubbard, R. E. (1984). Hydrogen bonding in globular proteins. Progr. Biophys. Mol. Biol. 44, 97-179. Banner, D. W., Bloomer, A. C. Petsko, G. A.: Phillips, D. C.; Pogson, C. I., Wilson, I. A., Corron, P. H., Furth, A. J., Milman, J. D., Offord, R. E., Priddle, J. D. & Waley, S. G. (1975). Structure of chicken muscle triose phosphate isomerase determined crystallographically at 2.5 A resolut)ion using amino acid sequence data. Nature (London), 255, 609-614. Bender, H. (1977). Cyclodextrin-glucanotransferase von Klebsiella pneumoniae. 1. Synthese, Reinigung und Klebsiella Eigenschaften des Enzymes von pneumonia M5al. Arch. Microbial. 111, 271-282. Bender, H. (1986). Production, characterization, and of cyclodextrins. In Advances in application Biotechnological Processes (Mizrahi, $., ed.), vol. 6, Bindt;. 31-71, Al an R. Liss, New York. Huber; 0 & Bock, A. (1986). F., Cyclodextrin-glycosyltransferase from Klebsiella M5al: cloning, nueleotide sequence and pneumoniae expression. Gene, 47, 269277. Blevins, R. A. & Tulinsky, A. (1985). Comparison of the of dimeric independent solvent structures a-chymotrypsin with themselves and with y-chymotrypsin. J. Biol. Chem. 260, 8865-8872. Boel, E., Brady, L.: Brzozowski, A. M.; Derewenda, Z., Dodson, G. G., Jensen, V. J., Petersen, S. B., Swift, H., Thim, L. & Woldike, H. F. (1990). Calcium binding in a-amylases: an X-ray diffraction study at 2.1 A resolution of two enzymes from Aspergillus. Biochemistry,

29; 6244-6249.

Briinger, A. T., Kuriyan, J. & Karplus, M. (1987). Crystallographic R factor refinement by molecular dynamics. Science, 35, 458-460. Buisson. G., Duee, E., Haser, R. 85 Payan, F. (1987). Three-dimensional structure of porcine pancreatic s(amylase at, 2.9 A resolution. Role of calcium in structure and activity. EMBO J. 6, 3909-3916. Dickerson, R. E., Weinzierl, J. E. & Palmer, R. A. (1968). A least-squares refinement method for isomorphous replacement. Acta Crystallogr. sect. B, 24, 997-1003. French, S. & Wilson, K. S. (1978). On the treatment of negative imensity observat,ions. Acta Crystallogr. sect. A, 34, 517-525. Hendrickson, W. A. (1985). Stereochemically restrained refinement of macromolecular structures. Methods Enzymol. 115, 252-270. Hofmann, B. E., Bender, H. & Schulz, G. E. (1989). Three-dimensional structure of cyclodextrin glycosyltransferase from Bacillus circulans at 3.4 A resolution. J. Mol. Biol. 209, 793-800. Huber, R. (1976). Antibody structure. Trends Biochem. Sci. 1, 174-178. Janin, J., Wodak, S., Levitt, M. & Maigret, B. (1978). Conformation of amino acid side-chains in proteins. J. Mol. Biol. 125, 357-386. Kabsch, W. (1988). Evaluation of single crystal X-ray diffraction from a position-sensitive detector. J. Appl. Crystallogr. 21, 916-924. Kabsch, W. & Sander, C. (1983). Dictionary of protein Edited

Biopolymers,

22, 2577-2637.

Karplus, P. 8. & Schulz, G. E. (1987). Refined structure of glutathione reductase at 1.54 A resolution. J. Mol. Biol. 195, 701-729. Kimura, K.: Shinsuke, K., Yasumasa, I.; Takano; I’. & Yamane, K. (1987). Nucleotide sequence of the P-cyclodextrin glucanotransferase gene of alkalophilic Bacillus sp. strain 1011 and similarity of its amino acid sequence to those of cx-amylases. J. Bacterial. 169; 4399-4402. Klein, C., Vogel, W., Bender, H. & Schulz, G. E. (1990). Engineering a heavy atom derivative for the X-ray strueture analysis of cyclodextrin glycosyltransferase. Protein Eng. 4. 65-67. Leslie, A. G. W. (1987). A reciprocal-space method for calculating a molecular envelope using the algorithm of B. C. Wang. Acta Grystallogr. sect. A, 43, 134-136. Luzzati, V. (1952). Traitement st’atistique des erreurs dans la determination des structures cristallines. Acta Crystallogr.

5, 802-810.

MacGregor, E. A. & Svensson, B. (1989). A supersecondary structure predicted to be common to several a-l ,4-n-gluean-cleaving enzymes. Biochenz. J. 259, 145152. Matsuura, Y.: Kusunoki, M.: Harada, W. & Kakudo, M. (1984). Structure and possible catalytic residues of TAKA-amylase A. 9. Biochem. 95; 697-702. Mat’thews, B. W. (1975). In The Proteins (Neurath. H. & Hill, R. L., eds). 3rd edit., vol. 3., pp. 4033590, Academic Press, New York. Nitschke, L., Heeger, K., Bender, H. & Schulz, G. E. (1990). Molecular cloning, nucleotide sequence and expression in Escherichia coli of the @-cyelodextrin glycosyltransferase gene from Bacillus circulans strain no. 8. Appt. Microbial. Biotechnol. 33, 542-546. Priestle; J. (1988). Ribbon: a stereo cartoon drawing program for proteins. J. Appl. CrystaUogr. 21; 572-576. Ramachandran, G. N. & Sasisekharan, V. (1968) Conformation of polypeptides and proteins. Advan. Protein

Chem. 23. 283-437.

Read. R. J. (1986). Improved Fourier coefficients for maps using phases from partial structures with errors. Acta Crystallogr. sect. A, 42, 140-149. Richardson, J. S. (1981). The anatomy and taxonomy of protein structures. Advan. Protein Cl&em. 34, I67-339. Schiffer, M.; Girling, R. L., Ely, K. R. & Edmundson, A. B. (1973). Structure of a I-type Bence-Jonesprotein at 35 A resolution. Biochemistry, 12, 4620-4630. Szejtli. J. (1988). Cyclodextrin Technology, Kluwer Acad. Publ., Dordrecht. Thieme, R., Pai, E. F., Schirmer, R. H. & Schulz, 6. E. (1981). The three-dimensional structure of glutathione reductase at 2 A resolution. J. Mol. Biol. 152, 763-782. Wang, B. C. (1985). Resolution of phase ambiguity in macromolecular crystallography. Methods Enzymol. 115, 90-112.

by R. Huber

Structure of cyclodextrin glycosyltransferase refined at 2.0 A resolution.

The previously reported structural model of cyclodextrin glycosyltransferase (EC 2.4.1.19) from Bacillus circulans has been improved. For this purpose...
5MB Sizes 0 Downloads 0 Views