Refined structure of the human histocompatibility antigen HLA-A2 at 2.6 A resolution.

J. Mol. Biol. (1991) 219, 277-319

Refined Structure of the Human Histocompatibility HLA-A2 at 2.6 A Resolution M. A. Sapert, P. J. Bjorkmanj

Antigen

and D. C. Wiley$

Department of Biochemistry and Molecular Biology Howard Hughes Medical Institute, Harvard University 7 Divinity Avenue, Cambridge. MA 02138, U.S.A. (Received 3 September 1990: accepted 2 January

1991)

The three-dimensional structure of the human histocompatibility antigen HLA-A2 was determined at 3.5 A resolution by a combination of isomorphous replacement and iterative real-space averaging of two crystal forms. The monoclinic crystal form has now been refined by least-squares methods to an R-factor of 0.169 for data from 6 to 2.6 A resolution. al onto a2 A superposition of the structurally similar domains found in the heterodimer, and as onto Bzrn, as well as the latter pair onto the ancestrally related immunoglobulin constant domain, reveals that differences are mainly in the turn regions. Structural features of the a1 and a2 domains, such as conserved salt-bridges that contribute to stability, specific loops that form contacts with other domains, and the antigen-binding groove formed from two adjacent helical regions on top of an eight-stranded p-sheet, are analyzed. The int.erfaces between the domains. especially those between pzrn and the HLA heavy chain presumably involved in /3*rn exchange and heterodimer assembly, are described in det>ail. A detailed examination of the binding groove confirms that the solvent-accessible amino acid side-chains that are most polymorphic in mouse and human alleles fill up the central and widest portion of the binding groove, while conserved side-chains are clustered at the narrower ends of t)he groove. Six pockets or sub-sites in the antigen-binding groove. of diverse shape and composition, appear suit,ed for binding side-chains from antigenic peptides. Three pockets contain predominantly non-polar atoms; but others. especially those at the extreme ends of the groove, have clusters of polar atoms in close proximity to the “extra” electron density in the binding site. -4 possible role for p2rn in st’abilizing permissible peptide complexes during folding and assembly is presetned. Keywords:

HJ,A;

crystal

structure;

peptide;

1. Introduction

t Present address: Biophysics Research Division and Dept. of Biological Chemist’ry, University of Michigan. Ann Arbor. MT 48109-2099, U.S.A. $ Present address: Division of Biology and Howard Hughes Medical lnstitut’e, (California Institute of Technology, Pasadena. CA 91125. U.S.A. $ Author to whom all correspondence should be addressed. 11HLA. human leukocyte or histocompatibility antigen: .MHC. major histocompatibility complex; CTL, cytotoxic T lymphocyte(s): TCR, T-cell receptor: /&rn, p,-microglobulin; ai, xi domain; az, q domain; us. tlg domain: ~,a,. tli and a2 intrachain dimer; s.i.r., single isomorphous replacement; mi.r., multiple isomorphous replacement: FFT. fast, Fourier transform: c.p.u.. central processor unit; r.ms.. root-mean-square. $03.0/O

immune

system

complex (MHC) that form complexes with peptides normally derived from intracellular degradation of self and foreign antigens (Daussett. 1958; Townsend et nl.. 1986; for reviews, see Germain, 1986: Bevan, when recognized by t,he 1987). These complexes, T-cell receptor molecules on cytotoxic T lymphocytes (CTL), cause the rejection of non-histocompatible tissue transplants (alloreaction) or the killing of virally infected cells (Snell et al.. 1976; Zinkernagel & Doherty, 1979). The requirement that processed antigens (probably peptides) be bound to class J histocompatibility glycoproteins in order to be recognized by T cells, a phenomenon known as MHC restriction, is a critical difference between cellular immunity mediated by hypervariable T-cell receptors (TCR) and humoral immunity. where intact foreign antigens are recognized directly by hypervariable immunoglobulins. As antigen-presenting molecules, individual histocompatibility glycoproteins have been shown to interact, with a broad range of pept,ides of diverse

Human leukocyte antigen-A2 (HLA-A2/1), a class J histocompatibility antigen, is one of a polymorphic family of cell surface, membrane glycoproteins encoded in the major histocompatibility

0022%2836~91/100277-43

&-microglobulin;

277

0

3991 Academic-

t’ress Limited

sequences (Babbitt ef nl.. 1985: Buns cjt al., 1987: Srtte et al.. 1988: Townsend et al.. 1986; Maryanski c~fcrl., 1986), yet exhibit selectivity by failmg to int#eract with all peptides. This selectivity may provide t,he basis for allele-specific, non-responsiveness to certain immunological challenges (Rabbit rt al., 1985; Kuus et al.. 1987; McDevit.t & Tyan. 1968; Fox rt al., 1988). Structural studies of human histocompatibilit)y antigens were made possible with t’he preparation of soluble molecules by the removal of the hydrophobic membrane anchor and cytoplasmic domain wit’h papain (Xathenson $ Shimada. 1968), and with the development of a milligram-scale purificat,ion for specific histocompatibility antigens from huma,n lymphoblastoid cell lines (Turner it al.. 1975). Small, very thin (20 to 50 pm) crystals of HLA-A2 and HLA-Aw68 (formerly named HLA-A28) were grown; HLA-A2 in two crystal forms, monoclinic P2, and orthorhombic P2,2t2,. and HLA-Aw68 in an isomorphous orthorhombic form (Bjorkman et al., 1985). The crystals contain both chains of HLA: a,-microglobulin (P,m, 11,700 $1,) and three domains of the heavy chain defined by exons: c(i, (x2 and aj (the first 271 amino acid residues of the 328 residue heavy chain and one oligosaccharide. 34,000 N,). The three-dimensional structure of HLA-A2. initially determined at 3.5 A resolution, showed the t,wo immunoglobulin-like domains ~1~ and P2rn paired in a novel way at the membrane-proximal end of the molecule, and the a, and z2 domains folded nearly identically, each into a four strand /?-sheet topped by a long helical region (Bjorkman et crl.. 1987n). An “intrachain dimer” of the ‘x1 and m(2 domains forms a deep groove between the long helical regions of the t’wo domains (Bjorkman et al., 1987a). This groove. whose sides are !x-helices and whose bottom is t,he strands of a B-sheet, has a size and shape expect.ed for the antigen (peptide) binding site and contains a large continuous region of “extra” electron density, probably the image of a peptide or mixture of peptide antigens bound in the site (Bjorkman et al., 1987a). The fact, that material remained bound in the HLA-A2 groove throughout purification and crystallization also indicated an unusually long half-life for HLA-peptide complexes and suggested that, HLA molecules are always occupied by peptides (Bjorkman et al., 1987b). Evidence that the groove was the region of the st,ructure that bound antigen and was recognized by T cells was provided by two more observations. (1) The extrnsivr polymorphism of histocompatibility glycoproteins that is responsible for their differential recognition by CTL was concentrated at, the putative peptide-binding sit,e (Bjorkman et nl., 1987r’,). (2) Amino acid subst.itutions in spont,aneous murine mutants and human serological-subtype variants that affected tissue transplantation (alloreaction) and the recognition of foreign antigens by CTL were also concentrated at the (x~Q~groove (Hjorkman et al., 19876). Some of the single amino acid substit)ut#ions that affected alloreactive recognition were at

t’hr bottom of the cleft in a location likely to I)tl buried by a bound peptide. rather than (lirrc~t~l? recognized by a T-cell receptor, arguing that recognition of peptide complexes was part of 1.1~~ allo-response. That sequences of T-cell recept#or rnolec+ules shoastructural similarity with the Fat) portion of immunoglobulins (Yanagi et al.. 1984; Hedrick r/ al.. 1984: Colman, 1988), and that the size of the l?ab footprint on some antigens is about 20 A x 30 .A (1 X = 0.1 nm: see Mariuzza et rll,. 1987). t,ot.ti suggest that a T(:R could simultaneousl\~ rec.opnize a peptide and the cc-helical edges of t,hcb binding groove (about 20 x wide by 30 A long: Kjorkman it al.. 1987h: Brown Pt 01.. 198X). Kvidemr tl~ut a single T cell could recognize both ends of thf, two cc-helical regions was first, provided using single substitution in the murine d regions later localized to t.he r*, and Q~ domains. Knowledge of the structure of irnrllurloglobulin c*onstant domains aided in building partial models for zj and [j2m using interactive graphics. The irtt’rr~)reta~)ilit?of t.hta map was improved by combining phases c~alculat~rti f’rom t,he partial model with the isomorphous rt~plac~rm~~nt phases. C’yt.1~ of phase caombination and sol\-rnt flattening using a hybrid envelope derivt~d frorrl a urlion ~)t t,he \Vang rnvrlopc and one calculated from the nrodt~l. was followed by more model fitt’ing and refinement- \vith ~‘ORELS (Sussman. 1985). lCvrnt,ually. amino a(.id residues \vertb fit within the CY~and /jzm clomains. and a partial polyalaninr trace made of NC, and z2 itlc.ludirlg 8 /I-strands and 2 cc-helicrs. bllt thts c,onnrctirity t)tAt\vfsrn t hf% srcondary strut&turn1 &men ts was unc:lc~at~. ‘I’hth partial model c*trnsistrd of 290 resic1uc.s (79 ‘lo) c~onlprising 2040 atoms. I)ifYiculty in interpreting the maps was increased 1)~. tht> prrsence of caxtra electron density between the 2 long a-helices. later interpreted as bring due to a prptidtz OI mixture of peptides that, co-crystallized with thtt rnolrc.ule. Phase-romhined maps with different t,ypes of’ c~&&ntx (Ri(ar. 1981: St,uart & Art>.tniuk. 1985) suff’rrrd from bias towards the original model. The struc~turr was not fully interpreted until t.he monoclinic> maps were averaged with maps of HI,A-42 in t,he orthorhomhic space group.

of HLA-A2

Strucfwe

at 2.6 A Resolution

Table 2 Monoclbnic

wdtiple

isomorphous

to 3.5 d used for initial

phasing

Occupancy (real/anomalous)t

B(AZ) (fix4-l

modeling

and map aueraying

Fractional

co-ordinates

fb99/3.40 0.591 l-80 0,37/0%9 1~08/1%1 0.16/025 0.70/ 1.23 0~19j~Ol O.SS/1%O Resolution

shell (.\)

13. t’hing

,stutis/~cs

Number of’ reflection::

144

Mean

066

figure

of merit

K,PtC!l, r.m.s. I;,:Eg Hgl, r.m.s. fH/B K,Os(“l, r.m.s. & fC p-(‘hloro-mercuripbcnol

243 0.66

1.31 1.29 (b46 0.97

1.79

391 @68 206

1.58 063 045

0.64 1.18

546 0.67 3.04 I.18 069 1a4

1.29

ii3 O-63 I-89 1~04 0.59 l-14

966 06% I.71 0.89 054 1.06

1228 0.58 I .73 057 O-&i 0.88

1488 w.53 I..iH 0.73 042

5759 0.60 1.76 0.92 0~52

OXi

0.9:

r.m.s. jH:IE

T Oc~c~upancyis on an arbitrary scale. 2 Ij is t,he isotropic temperature factor defined by t.he expression (exp( -H (sin 0/h)*), Q I)rrivat.ive phasing power where fH is the heavy-atom struct.ure fact.or and

Tmmunoglobulin-like domains were solvent-flattened 3.5 ,iz orthorhombic thesr maps werIB judged uninterpretable.

not s.i.r.

(P) Phase @inuvnent by iterative map

resolved in maps, and

a7leraying

The programs of Bricogne (1976) were used to average iteratively in real space the electron crystal forms of HLS-AS. A summary

densities of the 2 of the procedures

used and results obtained have been reported (Saper et al.. 1989). This t,echniyue of phase refinement parallels that in the structure

determination

from

2 crystal

forms

of the

human x,-protrinase inhibitor (Loebermann et al.. 1984). A precise transformation relating the molecules in each space group must be known prior to starting the averaging procedures. Molecules of HLA were predicted to

E

is the

residual

1wc.k of’ closure

pack similarly in both forms based on space group and unit cell similarities (Bjorkman et a/.. 1985). To define the exact transformation between the molecules. a partial model (described above) was used in a G-dimensional

rotation and translation real-space search of the orthorhombic s.i.r. and solvent-flattened map. One major solution was found that was later refined with a stand-alone version of the real-space refinement option of FRODO (Jones; 1982). This solution was consistent with the location of a platinum heavy-atom site common to both crystal forms. The transformation was confirmed by a real-space electron density correlation function (SEARCH in PROTEIN; Steigemann. 1974) that did not use a presumed model. A 3-dimensional translation search with the 200 highest electron density points from the monoclinic map revealed a single solution. This was followed by

Table 3 Orthorhowbbic I)erivative

Occupanq

A. Hmry-atom

co-ordinatrs

2WO

W64

emu

1)150

IO.63

8.24

6.72

5.68

4.Q 1

4.33

3.87

3.50

154.5

147 0.36

225 038

341 0.39

453 040

652 O-40

783 0.41

940 043

1008 t b.39

4549 040

102.05 155.98 1.23

193.75 132.07 1.47

185.78 114.07 1.63

175.15 119.00 147

166-14 1lW.55 1.50

152.35 104.26 1.46

142.65 95.49 1.49

103m 07.75 1.33

1.5639 107.99 1.45

PHARE

(G. Bricogne).

Definitions

ar in Table 2

statiati,a

of’ reflections

&an fiaur~ of merit K,Pt(‘l, r.m.5. j,, r.m.s.

Fractional

@A21

1.33

13. F’hnsing

phase rejnement

sitr paranretem

K,lw’l,

Numlwr

single isvmorphous

E

r.m.5. fw,E (‘alculations

wew

done with

program

MON cwflicients from previous cycle I Calculate MON

I

2Fo- Xg”~~l”b fc) map, fine y I

O-05 o-1.0

&

u%?,2,2, I

4t I %[

t MLIPI” ,

Scale den&y ,a 0=300 t

t.4olwlar envelope on ORTcoune grid

T Trans’orm;pdnsbetween

2

GENERATE

Mode =5 GM1 = O?SAfine grid. MON a.” Grid2 = I.OA course grid. ORT a u

!2.%

i;, where to put demty I” ORT m”r$e grid a.” 7, where 10 get density I” MON line grid a ”

(4)

Calculatecombined

fobs, as,.r

Observed ORT F’s and s.I., phases

t Tranform ORT averaged map 10 y&d str”*ur* Iactors

MAPI’

UP212121

ORT molec”lar envelope

t MA P2 t MmY-

t

A”G.F!a”e”ed 1

Reduce envelope 10 list 01 painls in asymmetric mt

Fetch dew&y a, Xl tram gnd 1 tine gnd a.“. map by Ltnear m,erpala,ion

Calculate ORT map, IA gna, a.” Fili-in *ectmx of map not Included ,n a” Reconstr”c! envelope I” ORT a ” I secbomg &id pf X,j to densttyat r, grfd point m ORT map. Divide sum a, each grid porn, by number o, ,xm,~ summed and output averaged aSymmet”c unit

Gnd points o”,slde 01 envelope replaced by mean sol”ent denstfy t

Figure 1. Flow chart of calculations to average the electron density maps of the monoclinic crystal form of HLA-A2 (MON, Map 1) with the orthorhombic (ORT, Map 2) crystal form. For each averaging cycle, a parallel and analogous series of calculations were also performed in the monoclinic frame. Programs (in boxes) are from the package described by Rricogne (1976), from PROTEIN (Steigemann, 1974), or unpublished.

an Euler angle rotation search using 6900 monoclinic density points above 1 cr. A flowchart of the software used in 1 cycle of the averaging calculations is shown for the orthrombic case in Fig. 1. Tn brief, each point within the orthorhombic molecular envelope was averaged with the corresponding electron density value interpolated from the monoclinic map. Grid pomts in solvent regions outside of the envelope were replaced by their mean value. The average map was transformed by FFT to obtain calculated strmture factors. These phases were then combined with thr s.i.r. phase probabilities, a new map calculated. and the entire procedure repeated. A parallel series of calculations was done simultaneously in the monoclinic space group. With mode 5 of GENERATE. all averaging manipulations are done in 1 asymmet’ric unit,. For the monoclinic case, GENERATE was modified t,o inrlude a Q2 matrix t’hat permitted transformation of the orthorhombic den&ypoints into a non-orthogonal frame (G2 in the notation of Rricogne. 1976). One cycle of mode 5 averaging in both space groups took approximately 20 min c.p.u. time on a VAX-l l/780. After completing a series of mode 5 averaging cycles, a final map is averaged without a molecular envelope using mode 1 of GENERATE, to provide an unbiased map for interpretation. To ensure that the 2 different maps being a.veraged were on the same relative electron den&v scale. each was pre-scaled to have the same standard deviation. Recentlv, after the calculations described had been completed. this procedure was found to introduce a scaling error of approximately 10%. This may have been due to statistical differences between all of the points in the asgmmet-

ric unit and those points within the envelope that art’ actually averaged. This error has since been rorrrcted by implementing an additional pass in the program MODIFY (Fig. 1) to calculate a scale factor between t,hr 2 groups of density values actually being averaged. Envelopes. suitable for the iterativr averaging procedures. were constructed by defining extents of t,he presumed molecular volume. An initial envelope for the averaging procedure was calculated with tht, program ENVATOM (based on a similar program by S. (‘. Harrison). It set all grid points within a spec.ified radius (typically .5 -4) of each atom as TRITE. All grid points within a similar distance of crystallographic symmrtryrelated atoms were then set FATSE (or “solvent”). An envelope determined in t,he orthorhomhic frame ~~uitl be transferred to monoclinic frames with programs from the Rricogne suite: GENERATE mode 3 and RICC’NVI (loc>al modification of RECNF3). In later stages of averaging, a more precise f~nvrlopr was needed that ensured that not more than I symmrtritally equivalent grid point be set TRITE, and that defined the boundary between contacting molecules more me’cisely and without overlap. The program ENVTOM. derived from a program originally written by T. Garrett. searches around each grid point to see which atom within a specified radius is rlosest. If t,hc atom is not a symmetry-related atom, then this grid point’ is set TKI~E (“inside”). For the orthorhombic case, ENVTOM constrained the extent of the molecular ~nvrlopr by considering not. only the /‘2i2,2, symmetry-related molecules, but also the P2, symmetry-related molecules in the monoclinic cell transformed into the orthorhombica ~~11.

Structure of HLA-A2 The monoclinic envelope was calculated in an analogous manner. “Dummy” atoms were sometimes added to model co-ordinates to include, explicitly within the molecular boundary, areas of density not occupied by the model (see Results. section (a)). (f) Restmiwxl

refinement

of monoclinic

HLA-A2

CORELS refinement (Sussman, 1985) during the initial stages of refinement treated amino acids as rigid groups connected by restrained peptide bonds. At first. only $ angles were allowed to vary. At higher resolution, sjdechain torsion angles were allowed to vary. The structure factors were weighted proportionally to the sin 0/,? dependency of IF0 - Fc,I. Movable groups were also restrained t’o their target, positions. The complete model with all side-chains in place was refined with TX’T (Tronrud et al., 198’7), a FFT-based. least-squares refinement program with geometrical restraints. Native dat,a bet’ween 6 and 2.7 w were used with conjugate gradient’ minimization. Weighting of the various paramet,ers followed the scheme suggested in the TNT document’ation. Refinement with X-PLOR (Bruiiger. 198&z), a program with energy-restrained crystallographic minimization and optional molecular dynamics, followed the examples given in the documentation (Briinger, 19886). Both a standard and a modified parameter file to improve mainchain geometry (Weis & Briinger, 1989). were used, Charges on sidr-chain atoms were turned off during dynamics only. (g) Electron

density

maps

Elrc%ron densit,y maps examined during refinement were caalvulat,rd with the PROTEIK package (Steigemann. 1974) using 2 F, - F, and F, - F, coeficients. and eit)hrr model calculated phases and Sim-weights. c*alculat,ed phases directly. or m.i.r.-combined phases and figure-of-merit weights. Some use was also made of the OMIT map procedure (Bhat. 1988). The contoured maps and current HLA model were examined on Evans & Sut’herland PS 300 graphics systems running FRODO (*Jones. 1982: version 6.6 kindly provided by J. Sack and F. A. Quiocho, and TOM version kindlv provided by T. A. .Iones). Peaks from difference Fouiier maps were c*onvenientlg analyzed with a program by T. Garrett. For phase caombination c>alculations during refinement (Remington ut al.. 1982), the m.i.r. phases were redetermined with thr YIJRPH program from PROTEIN to optimizr t)hr derivatives phasing power cfn/E’). The K,Os(‘l, derivative with poor phasing power was omitted, a second K,PtCl, data set’ with higher site occupancy was included. and heavy-atom sites were redetermined from 2.X ,A resolution difference Fourier maps made with model calculated phases. Also. anomalous differences were not inc~ludrtl. sinc*ca only < 20 y0 were judged significant. Though 3 of the 4 derivatives shared the same major site. all had ,fH/fi > 1.0, and t’hr phases. although potentiallq biphasic. wercs adequate for phase combination ((~1) = 0.59 for 10.932 r&&ions from 15 t’o 2.8 A). To visualize better the carbohvdrate and rxtra electron den&v. the disordered solvent”information in the lowresolution structure faetorx (from 12 to 6 8) were included in 2 ways. In the first, a solvent map (positive density outside of thr molecular envelope, zero within) was transformed to give 12 to 2.6 A structure factors which. after applying a large temperature factor (180 a’j. were scaled and c.ombinrtl with 12 to 2.6 A structure factors calcu-

283

at 2.6 A Resolution

lated from the 6 to %6 A refined model (programs kindly provided by J. Varghese). This type of solvent-continuum technique had proven successful in locating carbohydrate in the Fab-neuraminidase structure determination (J. N. Varghese & P. M. Colman, unpublished results). Alternatively, the 12 to 2.6 A structure factors were calculated directly from the refined model ( R12m2,6A = @20 and R ,2-6A = Q40). Typically, the R-factors from the first method were only slightly better than these and both procedures gave similar electron density maps. (h) Structure

interyetation

Interatomic distances were calculated with (‘OUTACTS (c‘CP4 package), accessible surface areas by ACCESS (by M. Handschumacher and F. Richards using a probe radius of 1.4 a). and probe contact surfaces were generated with MS (Connolly, 1983). Similar structures were superposed with the RIG1 option of FRODO to get initial transformations. The program OVERLAP by W. Bennett (Rossmann & Argos. 1975, 1976) minimized the transformation and defined the structurally equivalent residues between the 2 structures.

3. Results (a) Map

average

of two crystal

fbrms

To improve t,he electron density map so that CI~CI~ csould be traced, and to confirm the st,ructures of clj and f12rn, we took advantage of the redundancy of st,ructural information by iteratively averaging the elec%ron densities from the two crystal forms (Table 4). The real-space iterative averaging was carried out in three stages. The first two st,ages which used initial maps phased from the monoclinic m.i.r./ partia,l model-phase combination cycles at 3.5 A resolution and different molecular envelopes, resulted in an interpretable electron density map at et al., 1987a.6; Saper et 3.5 A resolution (Bjorkman al.. 1989). The third experiment repeated the realspace iterative averaging but employed no model information in the initial phases (except the molecular boundary) relying solely on Pxperimental (m.i.r . s.i.r.) phases, t’hereby establishing bhat the final averaged electron density map was free of model-bias. (i) First

aceraging

experiment

For the first averaging calculations. the starting maps were: orthorhombic, observed amplitudes, Ei, with s.i.r. phases, solvent-flattened with Wang proF, cedure: and ,monoclinic, wit,h m.i.r. phases combined with phases from a part’ial model, then solvent,-flat’tened. The resolution for both maps was 150 t,lo 3.5 A, limited by the resolution of the orthorhombic s.i.r. phases. To initiate real space averaging, the relationship between the molecular coordinates in the monoclinic and orthorhombic space groups was described in determined as Esperimental Methods, section (e). The envelope for the first averaging experiment was calculated with ENVATOM (see Experimental Met hods) in the orthorhombic frame and t ransferred

to t,he monoclinic frame. The ort~horhotnbic and monoclinic envelopes contained 34(), and %:i”,, solvents, respectiveI?-. Due to different~ packing arrangements between orthorhombic and monoclinic,. t)he monoclinic envelope contained about 3000 points t’hat overlapped adjacent asymmetric units. For each of the first four of six averaging c?;cles. csalculated phases from the averaged tnap were combined with isomorphous phase probabilities from each respective space group and used to cal~ulate new figure-of-merit-weighted. 2Fo-Fc maps for the next cycle. Tn the last two cycles. the calculated phases were used directly with Sim-weighted. 2l/b- E’, coeficients. To follow convergence of the procedure, El-fact,ors between observed amplitudes and those calculated from averaged maps wrre examined as well as phase changes from previous and initial cycles (Table 4). The final R-factors after six cycles were: orthorhombic W206. monoclinit 0~222. The r.m.s. phase changes from starting phases were orthorhombic X0”. and monoclinic 69’. The phase change frotn the penultimate chock was about 20” suggesting t’hat t’he procedure ha,d not ypt c9tiverged. A final map calculated with mode I of GE:SERATE and contoured onto plastic sheets showed substantial connectivity and side-chain densit’y;. Dertsit’y representing turns bet,weeti p-stands not seen in previous maps allowed a new polyalanine model to be constructed for rt and x2. ln addition, new regions of helical density were

resolved (corresponding to thtb H 1 helix in xi. SW below). Strong densit)c~onnrc+ing ;I hcblix and fl-strand. not, seen in earlier maps, definc~d t h(i l)ositions of (‘\-slOl and (‘~~164. a disulphidr in x2. From this observation. thcl amino acitl siclt~~chaitts c.ot,rospontlitlg to the scqu~~n~~~~of %I ant1 y2 \v(‘re aligned t.o the polyalanine trace. Superimposing an envelope during interac%ivc~ refitting revealed regions of thr trial) that had bren truncated by the crude envelope. 1)uring alignment of t,he srquett~ with the c.ha,in-tracing, it was apparent t,hat about nine residues were missing hetweett residues 12 and 20, the loop bet WW~ the first and second strands of’ x1. In the original tnonoclinic map this density had hren truttcatrtl 1)). the ettrelope. Also. no densit!, was resolvetl c*otrltc>c*tinp domains g, and X2, \Ve suspe(~ted that residues XB to 93 formed a loop extending beyond t ht. cattrrettt envelope. In addition. the side-chains on a [&stralrd in c13 (residues 238 to 253) did not fit the densit?. in the averaged map. despit,e thtl motlt~l t)tGng included in the phases for the monoc~linic starting map. ,\Iodr~l building ittdic*ated that the c,or‘r’rct aligt~trtrtrt required ahift,ing the sequent hy O~IP residue in this region, The model at this stage c~ontaitr~~(l 363 residues with a gap brtween XA and 93.

.d new envelope was c~otistrnc+ed with l~~S\YY).\I t,o include the current model and 50 dutrtm~ ittcjms placed in regions of suspected strnc+ure: the presumed 86-93 loop, t)he N-terminal region of /,‘2m.

Table 4 Summary

of phase

nwuying

rxperimrnts

Expt 2

apt I Starting ORT

phases s.i.r. \r-ith

solvent-flatt,rllirl~ rombinrd (rn.i.r. + partial model) then sol\-ent-flattened Enveloping

ENVATOM: in OKT. then transferred to MON

progi-am

Volume of a.~. inside envelope (?O) OR’1 MON Averaging cycles with combined phases Followed by cycles with calculated phases r.m.s. A phase last v)cwus initial cycle [deg.) ORT MON r.m.s. A phase last CP~SUS penultimak cycle (deg.) ORT MON Il.factor of structure factors from tinal averaged map ORT JlOS t ORT. orthorhombic $ Srr Experimental

crystal Methods

form: MON. monoclinic crystal for the program algorithm.

form

lhpt 3

qf HLA-A2

Structure

20 0

5

10 Averaging

15

M

0

5

10 Averaging

cycle number (0

at 2.6 A Resolution

15

10

285

0.1 0

cycle number

5

IO Averaging

(b)

I

,5

cycle “umber Cc)

Figure 2. Course of phase refinement by density averaging for the 17 cycles of experiment 3 (described in Results, section (a) (iii)). Arrow A marks the switch from using combined isomorphous phases to Sim-weighted calculated phases. Arrow B indicates a redetermination of the monoclinic to orthorhombic transformation. (0) Orthorhombic form; (+) monoclinic form. (a) r.m.s. phase difference between the current cycle and starting phases. (b) r.m.s. phase difference between the current cycle and previous cycle. (c) R-factor of structure factors calculated from averaged map WKSUScsycle number.

the unknown or extra density between the tu-o helices of c~i and CQ, and the carbohydrate region near AsnS6. These dummy atoms were not included in any phase calculations, but only used t.o faci1itat.e construct,ion of envelopes. The new envelopes, calculated independently in each space group, had 33q,, (orthorhombic) and 20% (monoclinic) of the asymmetric unit assigned as solvent. No point from the asymmetric unit was duplicated inside either molecular envelope. The same starting maps were used as in the first experiment (see above). Three cycles of averaging done as before with phase combination were followed by seven cycles of averaging with &m-weighted coefficients. Convergence was achieved at Fd= (P166 and 0.185; r.m.s. phase cbhangefrom the next to last cycle was 9.4” and 8.4O. ort.horhom hi; cooling at 300 K. posItiona and B refinement: F‘0, ‘F‘ox= - 1245X: - 11267 Added several new water molecules and deleted othrrs Positions.1 and B refinement with modified parameters (Weis & ISriinper. 1989): EO,‘E,o,= - 12646/- 1 1320 Dynamics with slow cooling from 500 K to 27.5 K in th3 ps, followed by positional and R refinement E,:E,,, = - 1238.5: - IO975

t Key: Modeling, with PROD0 (Jones, 1982). Examined maps calculated with coefficients 2F,-F, or F,- F,, and acalc unless otherwise noted. CORELS. geometrically constrained/restrained group refinement (Sussman, 1985). TNT, geometrically restrained refinement (Tronrud rt al.. 1987). X-PLOR. refinement with energy restraints and optional dynamics at high temp~ratnrrs (Hriinger rt al., 19HRU.h). z Number of reflrctt,ions in various resolution ranges: 6 to 3.5 a. 4713: 6 to 3.2 A, 6492: 6 to Z+Oa. 80X7: 6 to 2.8 ‘9. 10.090: 6 to 2.7 4, 11,161: 6 to 2.6 a, 12,320. 0 Number of atoms with occupancy = 1.0 contributing to structure factors. 11r.m.s. shift calculated only for non-water atoms with occupancy = 1.0 (residues 267 through 270 omitted). l’ E, is the energy of the fully relaxed structure before crystallographic rrfinrment. E t0, is the energy of the skucturr afkr r&rained cr,ystallopraphic refinement (Remington et al., 1982).

structure. The complete model built from the averaged maps (experiment 2) was re-examined with the final averaged map from experiment 3 (see above) to find side-chains that fit density poorly and to set the occupancy for these atoms (about 600 atoms from about, 20% of t.he residues) to zero for refinement. Five rounds of CORELS refinement, gradually increasing resolution from 3.5 to 2.8 A resolution, interspersed with one round of refitting to include more atoms with full occupancy, reduced the R-factor from 0.44 (6.0 to 3.5 A) to 0.26 (6.0 to 2.8 A). Phase-combined maps with m(2F,-ZJC) c*oeficients were useful at this stage. The geometry of the final model from CORELS was poor, since the entire structure ha.d not been regularized during interactive refitting. With most of the side-chains positioned in density, refinement continued with data from 6 to 2.8 A with TNT. a less c.p.u.-intensive program than CIORELS. A significantly lower R-factor (0.20) was obtained with the same data, but the r.m.s. geometry remained poor: @06 A error in bond lengths. 6” in angles. After round 12 of TNT, the structure was completely refit and all poor geometry

idealized with the REFI option of’ FROI)O (round 13, Table 6). During subsequent rounds of refitsting and TXT minimization, OMIT maps (Bhat, 1988) were consulted to reorient side-chains, the largest peaks in difference maps were resolved by refitting sidechains or adding water molecules, and regions with bad geometry were regularized. This produced a final TNT model (round 20, Table 6) with excellent geometry (r.m.s. bond deviation = 0018 8, r.m.s. angle deviation = 2.2”) and an R-fact’or of @I8 (369 residues+ 29 water molecules = 3012 atoms). The mean temperat,ure factor for all of t,he atoms was 17.0. ranging from 2.0 to 50 (at Aspl96). Refinement continued at 2.7 !I with molecular dynamics/energy restraints program X-PLOR. Though clearly useful for refining poor. init,ial structures, X-PLOR was used here to see if any further improvement could be made to an already good structure (see Sacchetini et al., 1989), a.nd to refine HLA-A2 under the same conditions used for the parallel refinement of a closely related structure, HLA-Aw68 (Garrett et aE., 1989 and unpublished results), for subsequent structure comparison. Since

the starting model was good, only one dynamics c:ycle 1000 K for was done: heating to cooling at +5 picosecond, 300 K for 0.25 picosecond, followed by positional and individual K-factor refinement, (round 22, Table 6). The lower R-factor (0.168 versus 0.18 with TNT) after this first cycle may not reflect a “better” structure. as 0007 of this drop was due to individual B-factor refinement (mean R-factor was now about 20.0): ant1 the geometry was considerably more lenient, using X-PLOR energy paramebers, especially bond angles (r.m.s. angle deviat’ion = 3.7”; ~wrsus 2.2” with TNT) and peptide bond planes (r.m.s. o deviation = 5.2”. ?:ersus QO” with TNT). The heat, cool and minimization of round 22 resulted in a mode1 differing by 0.5 A r.m.s. from the starting model, with a maximum of 3.4 A. Ninety-seven side-chain atoms changed by more than two standard deviations (side-chain r.m.s. d = 0.64 8). Most of these differences were on the protein surface at charged residues. Twenty-two of the affected side-chains differed significantly in xI values, 33 had changes only in other x torsion angles. A careful evaluation of all side-chain rnovt’merits was made to see if they fit’ OMIT map densit’! bet,ter than the TST model. Five side-chains werr changed back to their pre-X-PLOR conformations, and six were changed to other conformations. Most conformations picked b> of the alternative X-PLOR were for side-chains in density which, at this resolution, could not precisely define the exact, conformation. The X-PLOR minimization moved at least two carbonyl oxygen atoms outside OMIT map density. This may be due to electrost.atic forces crystallographic restraints. Other overriding cshanges seemed reasonable and often tried to optimize secondary structure interactions. One significant improvement was better density for Tyrll6, a polymorphic residue at, c(~. Though surrounded b) (we section (g), Mow). well-ordered side-chains good density for the Tyr ring consistently failed to appear in all TNT refinement)s. After X-PLORJ. difference electron density maps suggested that the Tyrll6 side-chain was in the correct place, that it hydrogen-bonded to Asp77, and that nearby Va195 needed to have a different x1 torsion angle in order t,o avoid the Tyrll6 hydroxyl group. During final rounds of X-PLOR, 18 more water molecules were assigned to positive (F, - F,) peaks. The criteria for modeling water molecules were: (1) difference electron density peak heights greater than four standard deviations: (2) positioned less tha,n 3.5 A from pot,ential hydrogen bond donors or acceptors: (3) peaks continued to reappear in 2FO- Fc electron density maps: (4) densitv could not be modeled as alternative side-chain coniormations. No model for the extra electron density in the peptide-binding cleft (see sect.ion (h), below) was included in the refinement. At this stage, t)his density was sufficiently detached so that there was no ambiguity as t’o which density arose from protein side-chains and which was extra. Three water molecules (Wat) within t’he cleft were included in the

Table 7 ti ?$i Resolution (.q) 0.169 K-factor Sumber of reflections I2 >.‘PO ” Atomic tenlperature factors (X2) “4 Minimum Maximum RZ.7 W“j Mean --. “3.1 Mean (8.K) ” I .r, Mean (m.c.) Difference Fourier (FO- PC) density valuw (c,‘A”) Minimum -0257 ( -- 5.1 a) Maximum (b347 (63 U) ()~050 u Num her of residues X!4 46 Number of water molecules Number of atoms 3063 Bonds (i\) r.m.8. deviation

Angles (’ )

from ideality

mx.. main-chain; SC.. side-chain: Refined co-ordinates are available from Brookhaven Protein Data Bank. entry 3HLA.

refinement. Difference density for Wat917 and Wat946 are detached from the extra density. Only Wat944, which bridges the side-chains of iZrg97 and Asp77 and makes hydrogen bonds to both, has density close enough to the extra density that it represent a functional group of bound may peptide( Electron density for the C terminus of CI~, residues 267 to 270. had always been poor reflecting either ends from papain cleavage. disorder or “ragged” Residues 268 to 270 were included as Ala, since no side-chain density was visible (round 28, R = 0.162). The final minimizations at 2-7 A switched to a. moditied parameter set to improve t)he backbone geometry (Weis & Briinger, 1989), reducing the r.m.s. angle error to 3.5 a. After including data out to 2.6 A. the final K-factor for 12,320 reflections between 6 and 2.6 a was 0.169. There are 3064 atoms, with a mean B-factor of 22.3, ranging from 2.0 (5 atoms) to 62.7 (GluB74 OE2). Geometry for the final structure is shown in Table 7. The mean co-ordinate error is estirnated to be 0.25 t,o 0.30 a from a Luzzati plot (not shown; Luzzati, 1952). The final difference map contains 14 positive peaks above 4 LT,of which four are part of the extra density seen in the antigen-binding cleft (see section (h), below). The quality of the final phases is displayed in Figure 3 by an fii--pC map where t)htJ residues shown have been omitted from the phase calculation. (c) Molecular

structure

Overall, the structure of HLA-A2 refined to 2.6 A remains identical to that reported at 3.5 i% resolution (Bjorkman et aE., 1987a). The molecule is a

Structwr

of HLA-A2

289

at 2.6 -4 Resolutions

Figure 3. Quality of the electron density map in the region of the cr3-fizm interface. The map is a 6 to 2.6 A F,- F, map contoured at 2.5 0 (0.146 e/A3) calculated with structure factors derived from the model after omitting the residues shown. A difference map with phases calculated after positional and R refinement of the entire model (with the residues shown having zero occupancy) was virtually identical. heterodimer comprised of two polypeptides, the HLA-A2 heavy chain with three domains defined by separat’e coding exons, a, (residues 1 to 90)> ~1~(91 to 182) and a3 (183 to 270), and the light, chain, Bzm (Bl to B99). There are three distinct structural domains in HLA-A2 (Fig. 4(a)): the top comprised of ozl and CQ and containing the antigen recognition site bet,ween two helical regions, the a3 domain under the lower left corner of LYEand a2, and &rn directly under the center of the top (Fig. 4(a)). The extent and notation of’ all secondary structure elements are shown in Table 8.

Secondary

Strand

at Sl SI s3 s4

Sheet

Residues

3-12

HI H2

21-28 31-37 46-47 60-53 57.-84

Sl s2 S3 s4 Hl Hla H2b H3

94-103 109-118 121-126 133.-135 138-150 1.52-161 163-174 176-179

a2

Bot’h CY~and t12 have four-stranded, anti-parallel /?-sheets related to each other by a pseudo-dyad forming an eight-stranded sheet (sheet A). The sheet has the usual left-handed twist of approximately 70” between the outermost strands of aI and tlz. The n-helices of the @1 and a2 domains, in a slightly exaggerated way, follow the twist, of the b-sheet. On each domain, a short nearly vertical helix precedes a long curved (H2 in NJ or kinked (H2a. H2b in a,) helix. The short helices rise to a

Table 8 structure qf HLA-A2

Strand

a3 Sl SP S3 S3’§ s4a S4b S6 S6 s7 p,-Microglobulin 91 52 $3 $3’5 84a S4h s5 S6 s7

Sheet

us 13 (‘0 lr’ 13 u (’

Residues

(‘orresponding framework strand in immunoglobulinst

186-193 198-208 214-219 222-224 228-230 234-235 241-250 2.i7F261 II

I) E E

SB-1311 I321-K30 1335-K41 1344&B45 135O~U51 u55-856 I362-I370 Bi8-K84 M--H94

Secondary structure designations made with algorithm described by Kabsch & Sander (1983). t Lesk & Chothia (1982). $ Sheets 13 (a,), and II (pZm) are the classic immunoglobulin 4-stranded sheets referred to by Lesk & Chothia (1982). § The classic 3-stranded sheets (C in G(~and in Pzm) are actually 4 strands using this algorithm. S3’ is an additional short sheet, running antiparallel to 83 and begins the crossover back to sheet 13 (a,) or I) (p2m). 11Note that S7 of rx3 sheet C was not detected in the HLA-A2 structure.

P1 F (;

f5 I’ t:

strand

in this

peak from “low” corners of the p-sheet and. after a sharp turn, the second helices then follow the sheet‘s t,wist down to the opposite corners (Fig. 4(a)). Figure 4(b) illustrates this for the a2 helical region. which rises through a 3.5 turn g-helix (HI) and descends after a turn of 105’ over a 22 residue kinked a-helix (H2a, H2b). The loops between P-strands in each domain extend alternatively above and below the /?-sheet (Fig. 4(b)). Those loops below are between strands 2 and 3 of each domain (labeled residues 30 and 120 in Fig. 4(b)) and interact with the other two domains: Sd-+S3? (residue 30) of a1 with a3 and S2+S3 (residue 120) of CQ with Bzrn (see also sect,ion (f). below and Fig. 7(b)). The remaining two loops from each domain, between strands 1 and 2 and strands 3 and 4 (labeled 19, 108 and 43, 131 in Fig. l(b)). extend above the P-sheet and pack against t,hr outer faces of the a-helices of their respect,ive domains, The loop between t’he CI~and ~1~domains (preceding 93 on Fig. 4(c)) forms a flat open feat,ure on the top surface of HLA adjacent to the antigen-binding cleft’. The CI~domain ends with a one-turn helix H3 (176 to 179), approximately perpendicular to the S~~CQb-sheet, before entering an extended st’rantl leading into a3 (Fig. 4(a)). Tn addition to the hydrogen bonds in the CI, and CQ domains, a disulfide bond (residue 101 to 164) and a series of salt-bridged, hydrogen bondsf (Table 9), many conserved in all class I sequences, appear to stabilize this region (which may also be stabilized by interdomain contacts (see section (f)) and interaction with a bound peptide (see Discussion)). Figure 4(c) illustrates how all the b-strands in the CI~domain are “crosslinked” by five salt,-bridged, hydrogen bonds: 3-29; Sl-S2+S3; 14-39, Sl +S2-S3-+S4; 21-37, S2-S3; 35-46, S3-S4; 46-48, S4-S4+Hl. One conserved salt-link (44-61) also connects the P-sheet (S3+S4) to helix H2 (Fig. 4(c)). Tn the CI~domain, only a-strands 61 and S2 are crosslinked by salt-bridges (93-l 19. Sl-S2-+S3; 102-l 11, Sl-S2), but four salt-bridges from one turn of an a-helix to the next are distributed along the “outer” surface of the CQdomain’s a-helices (Fig. 4(c)). Although many of these saltbridges are conserved in most class 1 sequences (Table 9), the one in the a2 helix at, 157 and 161 is conserved in all class T and class II (Brown rt (~1.. 1988) sequences currently known. One salt-bridge is nearlv always conserved between the a1 and c(~ domiins’ a-helices. Glu55 to Argl70 (Fig. 4(c) and Table 9). (ii)

a 3 rind

/?,-microylobuli7~

x3 has a fold like an immunoglobulin constant domain (see Fig. 6(a)). The p-strands in the two sheets 13and C of a3 run perpendicular to sheet A of 7 The nomenclature S2-&3 indicates the turn or loop between the secondary structural elements. 1 As defined by Baker & Hubbard (1984), salt bridges (ion pairs) are a specific type of hydrogen bond between 2 charged atoms.

a1 and a2 (Fig. 4(a) and Table 8). fl,rn is alho folded like an immunoglobulin constant. tlomaitr (SW Fig. 6(c)). with b-strands running approximatrl~ perpendicular to those in ag. The contact ~wtween f12rn and cz3, which is novel for an irnmunoyIol)uIirI constant domain interaction (Rjorkman ct (11.. 1987a.6): is discussed in section (f). below. (d) Carbohydrate str uct urp All class I HLA molecules are S-glycosylated at Asn86 of the rl domain, though studies indicate that thr carbohydrat’e moiet’y is not essential for folding or function (Ploegh et al., 1981). As previously reported. the 3.5 ‘4 averaged maps had regions of electron density extending away from the polypeptide backbone at AsnX6 (Rjorkman rt al., 1987a,b). This density helped define the position of Asn86 during model building. Tn round 4. after refinement at) 3.2 A&, thrrtk sugar groups. two N-acetyl-glucosamine and one fucose residue. were modeled into the flat. connected density emanating from the amide group of Asn86 and included as rigid groups in CORELS refinement at 6.0 to 3.0 8. During refinement with Tn’T using 60 to 2.7 x data. density no longer appeared for the sugar and it was removed from the structure (round 13). After round 24 of X-PLOR, difference maps c*alculated with solvent-modeled phases (see Experimental Methods, section (g)) from 12.0 t,o 3.2 8, but not from 12.0 to 2.7 A, revealed a 4 CJ,featureless peak off of Asn86. Attempts to refine a sugar ring in this density using ,Y-PLOR failed to produce arly interpretable density in (2F0- p,) maps. We suspect that the carbohydrate extends into a caavitg present in the crystal lattices and is disordered giving little density with maps ralculated at high resolution.

HLA-A2 contains two pairs of similarly folded domains. c(~ and ax, and a3 and f12rn (Bjorkman ut al., 1987a). To generate sequence alignments hased solely on tertiary struct’ure similarity, \ve superposed the corresponding domains by pairing residues judged equivalent by the algorithm of Rossmann & Argos (1975, 1976), which cbonsiders the spatial proximity of corresponding residues and their orientation with respect, to the preceding and succeeding residues. (i) a1 cxnd a2 Overall. t’hr ax1and a2 domains have remarkably similar structures, even to t,he conserved location of /?-bulges (Phe33-Va134 and Tyr123-Ile124) in the b-sheet structure. The a-carbon atoms of the superposed a, and a2 domains deviate by an r.m.s. d = 1.86 A for 68 equivalent residues (76% of a, ! 74”/;, of az: see Table 10). This transformation corresponds to a rotation of one domain by 178.5” and a @49 A translation along an axis passing approximately perpendicular to sheet A near 98 C”. The /?-sheet, alone (including loops) superposed with a

Structure

of HLA-A2

at 2.6 A Resolution

291

(a)

1 b)

Figure 4. HLA-AX refined structure. (a) RIBBOX drawing (Priestle, 1988) of the HLA-A2 C” backbone viewed perpendicular to the pseudo-dyad axis of the ala2 domains (blue, helices; red, /l-strands). The al and a2 domains are at the top with the az helices in front. The polypeptide strand continues below the j-sheet and into the a3 domain (lower left). Directly underneath a,a, and behind a3 is the Bzm subunit. (b) A side view as in (a) of the ala2 /?-sheet and az a-helices. The ai a-helices have been removed for clarity. Two loops from each domain (at 108 and 131 in a2 and at 19 and 43 in al) reach up to contact the a-helices. The 52-53 loop from each domain (labeled 30 and 120) reach below the plane of the sheet to contact the a3 and &rn domains, respectively. Note how the Hl helix (138 to 150) rises high above the /?-sheet, where it turns sharply into the H2 helix (152 to 174), which slants down towards the opposite side of the sheet. (c) aI (residues 1 to 90) and az (91 to 182) domains of HLA-A2 viewed from the top of the molecule along the ala2 pseudo-dyad. Highlighted in color are salt links conserved in most mouse and human sequenced alleles. They are mainly concentrated in 2 places: anchoring loops of the /?-sheet and along the a2 helices. ORTEP (Johnson, 1965) figures prepared with a version kindly provided by S. J. Remington. Stereo pairs are conveniently viewed with a low-cost stereoscope (e.g. Abrams Instrument Corp., Okemos, MI, U.S.A.).

r

s F 8

5

t f

2.87

ODI-NH2 OD2-NH2

al-&m NHB-ODl

Asp220--Arg256

332 3.39 253 268 3.26 2.77

258 286 333 2.43 2.74 2.67

v&m NDl-OD2 OEI-NZ OEZ-XZ NHl-0 NHl-OT NH2-0

&mw%m OE1-NZ OEB-NZ ODl-NE OD2-NH2 OEl-NZ -NZ

2.96

s3-s3 S3-S6 S5-+S6-s7

&m-Bzm 61 -+s2-Sl-52

S4Cterm

a3-B2m Sl -G-term S4((-s 1

al-h s3-s411

S3+S4-S5+SB

a3--a3 Sl-S5+S6

H2-HZ

H2-H2

HI -+H2--H2

HI-HI

Sl-s2

c

c

C’

c

C

(’

B3BD-B45R B38D-B81R B77E-B94K

BlGE-B19K

234R-B99M

192H-B98D 232E-B6K

35R-B53D

220D-256R

191H-254E

166E-l69R (-Al, Aw24) D

151Hm 154E C-B) R 157R-161E D (-A3)

144K-14BE

192D-IIlR

B3BQ.. B3BQ. B77T.

B45K BSlR B94Y

B16E-B19K

192H-B98D 2323. B6Q C-QlO) Q 234R-B99M

35R-B53D

19lH-254E (-Kd) Y (-K’) R 220N ,256X Y (-Kb, -Kk. -Kw29.7) K (-Dd)

157R-16lE (-D”) K (-Q7) Q 166E-169R 6 (Kw29.7) (-Q7) Q H (-D”, Ld) L (-&lo)

102D-I 11R (-Kb, -K’) E 144R-148E (-Kb. -Kk) K 151Gl54E

Salt-bridges must be I35 A and of acceptable geometry. All are between side-chains except Arg234 and the C-terminal carboxylate group of p,m, MetB99. t Secondary structure designations, as defined in Table 8, of salt-bridge residues. Designations such as Sl +S2 indicate that the residue is in a turn or loop between t,he 2 elements of aecondar> structure. $ Salt-bridges flagged with C are assumed struct.urally conserved if in most (all except 1) human and mouse alleles examined the 2 corresponding residues have oppositely charged side-chains. 4 Amino acids shown with the 1-letter code. Alleles for which the amino acid predominates are shown in parentheses. (-) Salt-bridge; ( ) potential hydrogen bond. Compiled sequences taken from Parham et al. (1988) and Klein & Figueroa (1986). 11Residues B53 (in &m) and 232 (in a,) are located in a bulge between S4a and S4b.

AspB38-ArgB45 AspB38-ArgB81 GluB77-LysB94 OE2

GluBlG-LysB19

Pm-&m

Arg234-MetB99

a&m His192AspB98 Glu232-LysB6

Arg35-AspB53

al-B2m

271 310

v-(x3 NE2-OE2

a3-a3 Hislgl-Glu254

2.54

OEI-NHI

GIUIGG-Argl69

2.77 304 2.94

292 348 302

Arg157 -Glul61

ODl-NE ODI-NH1 NZ -0El

NDl-OEI NDl-OE2 NE -0E2

Hisl.il--G&154

Lys144-Glu148

Aspl92-Arglll

294 -_---.

similar r.m.s. deviation (I.79 A for 43 residues). The S4 strands are less similar than the other three strands. The corresponding long H2 helices are more similar: @91 A deviation for 19 residues (65oi’, of H2cl, t, 83 y. of H2a,). The equivalent screw axis for relating the helices alone is 179.8” rotation and a -4.92 A translation and passes near 8 C”. This axis is tilted approximately IO” from the one relating the entire domains. Figure 5(a) and (b) show the superpositions of the entire domains and the H2 helices. Of all the intrastrand turns, the turn between S2 and S3 is most similar between the two domains (labeled 30 and 120 in Fig. 5(a)). This may be due to the conserved salt links anchoring this turn to Sl in both domains (His3 to Asp29 in CI~,Asp93 to His1 19 in ~1~).The turns between Sl and S2, and S3 and S4 have different conformations in each domain (Fig. 5(a)), with those in a, folding up closer t,o the long helix. The structurally based sequence alignment is shown in Figure 5(c). In addition to the 68 residues judged structurally equivalent by the automatic algorithm (* in Fig. 5(c)), three others were subjectively considered to be structurally similar (I in Fig. 5(c)). (For stretches of poor structural homology (e.g. 55 to 62, 143 to 151 in HI), the alignment is based solely on the spatial proximity of the residues in the superposed structures.) Of the 82 structurally aligned pairs of residues, eight positions have identical amino acids in both a1 and a2 (Glyl, 91: Ser2. 92; His3, 93; Arg21, 111; Asp29, 119; Aia69, 158; Thr73, 163; Leu78, 168). Of these eight pairs, five (I, 91; 3, 93; 21, 111; 29, 119; 78, 168) have the same residue in c1i and ~1~in virtually all human and murine class I alleles examined (Parham et al., 1988; Klein & Figueroa, 1986). Three of these pairs appear to form conserved intrastrand salt-links in each domain (Table 9), while one pair. 78 a,nd 168, are leucine residues on the underside of the H2 helices that pack against the b-sheet over the junction of Slcc, and Sla,. pl;o conserved function of Glyl or Gly91 is apparent, although it may be noteworthy that’ they are at the beginning of the domains.

The p-bulge 33-34 in xl is conserved in x2 (123-124) and appears to “elevate“ a. hydrophobic side-chain (Val34 or Ile124) above the plane of sheet A. so as to support the underside of the respective heliees. The residues corresponding t.o the 101-164 disulfide in az are Serll and His74 in c~i. which, although in close proximity in HL,4-A;?, do not appear to form a hydrogen bond. (ii) a3 and Pzm, and immunoglobulin constant domains

The a3 domain of the heavy chain and &rn have immunoglobulin folds typified by the Cn3 domain of the Fc fragment (residues A342 to A443, Brookhaven Protein Data Bank entry IFCl; Deisenhofer, 1981). x3 and p2rn superpose with an equivalent r.m.s. d = 082 A for 50 structurally residues (Table 10; Fig. 6(a) and (d)) of which 14 (28%) are identical amino acids. Comparison of a3 with the Cu3 domain of the Fc structure gives a similar value (083 A) and number (54) of structurally equivalent residues (Fig. 6(b)). fiZrn superposes slightly better on Cn3 with 0.78 A deviation for 58 equivalent pairs (Fig. 6(c)). This number agrees with the 0.9 A deviation hetween /J-sheet residues reported for bovine fl,m and C,3 (Becker & Reeke, 1985). The values may also be compared with the average value of 0.60 A obtained by superposing the Cnl domains from the KOI, Fab structure with Cul domains from five other Fah structures (see Table 10). Aligning the sequences of 6 f12rn and Cu3 on t,hr basis of the structural superposition (Fig. 6(d)) gives a total of 18 identical amino acids between a3 and C&3, and 22 between pzrn and Cn3 (Becker & Reeke (1985) found 23). Except for t’he papain-truncated S7 strand in a3 (G strand in Cn3 nomenclature: Lesk & Chothia. 1982), all three domains contain the seven b-strands that comprise the three and four-stranded p-sheets constant domains. The of immunoglobulin conserved t’ryptophan found in all immunoglobulin domains (A381 in Cn3) is not present in flzrn (Becker & Reeke, 1985). Despite this, Bzrn is slightly more similar to Cn3 than as.

t H2 helix in a, domain.

Table 10 Superposition

of HLA-A2

and immunog1obu.h a2 (92)

a1

WV)

P-sheet only (50) H2 helix only (29) a3 (88) 8~ (9% GA1 (97)

domains: 8~

r.m.s. diference (QQ)

(8)

of equivalent

C,3$ (102)

C!” positions (:,,I

1.86 (SS$, l-79 (43) 0.91 (19) 082 (50)

983 (54) 678 (58)

t Number of residues in domain being compared. $ Number of residues judged equivalent by method of Rossmann & Argos (1975, 1976). § Cu3 domain (residues A342 to A443) of F, fragment (Brookhaven Protein Data Bank entry IFCI; II Cul domain (residues HllQ to H215) of human KOL Fab fragment (entry 2FB4; Marquart et al.. 7 Average value for Cnl domain of KOL compared with Cul domains of other Fab structures: human, (Sheriff et al., 1987), 3HFM (Padlan et al., 1989); mouse, 1FJB (Suh et al., 1986). IMCP (Satow et al..

040 (59)ll

Deisenhofer, 1981). 1980). 3FAB (Saul et al.. 1978). 2HFL 1986).

Structuw of HLA-A2

at a2

at 2.6 A Resolution

2 5 1 3 4 45678901234567890123456789012345678901234567S901234567S90123456 DTQFVRFDSDAASQRMEPRAPWIEQEG SMRYFFTSVSRPGRGEP ****t****** *tt********t,,,** TVQRMYGCDVGSDWRFL 1234567890123456789012345678 9 0 1

GKDYIALKE 1 1

2 1

7

6 7890 PEYW

DLRSWTAADMAAQTTKHK 901234567890123456 4 3 1 1

WEAAH 78901 5 1

295

8 2345678901234567890

12345678901 DGETRK" ****t VAEQL 23456 6 1

4567890123456789012 7 1

9

8 1

(c)

Figure 5. Superpositions of the aI domain (filled bonds) on the a, domain (open bonds) with the OVRLAP program by W. Bennett. (a) Entire domains superimposed, r.m.s. d = 1.86 A for 68 structurally equivalent residues. (b) H2 helices superimposed, r.m.s. d = @91 A for 19 equivalent residues. Hl helices are also shown. (c) Sequence alignment of the 2 domains based on the structure superposition. (*) Residues judged structurally equivalent according to the algorithm of Rossmann & Argos (1975, 1976). (I) 0th er residues judged structurally similar. Shaded positions have identical amino acid residues in both domains of HLA-A2.1.

The major differences among the three domains (Q, f12rn and C,3) are in t,he loops and polypeptide

connections between B-strands. Two of the seven loops, 52 to S3 and S6 to S7, appear similar in all three structures, while the other four loops differ. Loops Sl to 52 and S5 to S6 are similar in a3 and C,3, but differ in fi*rn. In a3 and C,3, the Sl to S2 loops follow approximately the same path, while the Sl+S2 bzrn loop folds over the end of the domain, possibly avoiding collision with as. Becker & Reeke (1985) point out that the side-chains comprising this loop (Bl2 to B21) are very conserved among five aligned Bzrn sequences. AsnB17 does hydrogen bond to ArgB97, perhaps stabilizing the conformation of the C-terminal end of Bzrn, which interacts with a3

(see section (f), below). The S5 to 56 loops in a3 and C,3 are helical, while the fi2rn S5+S6 loop is extended (Fig. 6(b) and (c)). The loops at S3 to S4 and 54 to S5 are more similar between p2rn and C&3. In the first loop (S3+S4), a3 has a one amino acid insert (227, which forms part of the CD8 binding site, see Discussion), while the second loop (S4+S5) in a3 deviates from Bzrn and C,3, possibly as the result of contacts it makes with pzrn (Fig. 6(a) and (b)). The final P-strand in a3 is partially disordered in the monoclinic crystal or possibly degraded by

papain. The final /&strand in fi2rn finishes by hooking back away from the four-stranded /?-sheet to be involved in the fizm-a3 interface. This strand appears

t)o differ in the last few amino acid positions

Figure 6. Superpositions of the immunoglobulin &S-like domains of HLA-A2 highlighting loops with differing and blue (S5+S6). (a) d/s (open bonds) superimposed on bzrn structures: red (Sl-S2), green (S3’+S4), pink (S4+S5) (filled bonds). (b) CL~(open bonds) superimposed on C,3 (filled bonds, from Data Bank entry 1FCl). (c) p2rn (open bonds) superimposed on C,3 (filled bonds). (d) Sequence alignments of aJ, fi2rn and Cn3 based on the structure superpositions. (*) and (I) as for Fig. 5(c).

&w&w

of HLA-A2

at 2.6 kf Resolution

Table 11 Pzrn-u3 interface

P2m

and residue contacts Conserved H-bond or saltbridget @rn

a3

C

Contacts* a3 .:.. _ .~~~ c4 $. . ..@ $:: ,.:...s ... ..: ::::::. i!fB

-

C

.:x2.

-

A 3.39

2.53

E

TyrBlO 6222335” SerBll

Gln242

C

()

j$

C

S.C. 0

g;

.:.

0

*-* NE2

3.09 3.02 3.02

C C

C

2.56

S.C. 0

- S.C. a-- NE2

3.04

. .. 0

2.74

S.C. S.C. C

ND2 S.C.

C

LeuB65

Pro235

OG S.C. S.C.

C

*

S.C.

.

m.c.

3.23

:$Q@,@ 3.32 .:...._ * *

S.C. S.C.

S.C.

*

S.C.

m.c.

.

8.c.

m.c.

:

S.C.

. . . ;@@

gtg@

m.c.

:

-

2.68 2.77 3.26 2.81

Interface residues show a decrease in contact surface area of at least 10% compared to the free domains. t C. Based on amino acid sequence, a hydrogen bond or salt-bridge is expected to be present in most other mouse and human alleles examined. $ Bold, side-chain atoms; bold and shaded, charged atom or residue; s.c., sidvchain; m.c., main-chain: (-) saMink; ( ) hydrogen bonds I 3.5 d; (. ) hydrophobic or van der Waals‘ interactions ~40 A. § Other residues in interface but without interdomain contacts 14 ip: CQ, HislXA. Leu206 and Leu233; bzm, Hi&l3 and ProU14.

(a)

(b)

Figure 7. The residues forming the domain interfaces are highlighted in color on the C” backbone stereogram. Filled and colored C”’ atoms make contacts 14 A (see Tables 11 through 13). Red, al~Z residues in interface with /&rn; green, &m residues in interface with cr,cr,; blue a3 residues in interface with jzm; yellow, film residues in interface a,; pink, ala2 residues in interface with a,; orange, as residues in interface with a1a2. (a) A view perpendicular to ala2 pseudo-dyad with the binding cleft viewed end-on (ala,-a, interface not shown). (b) A side view with the molecule rotated 90” about the pseudo-dyad (as-/I2 m interface not shown). (c) A drawing of the p2rn residues (blue) interacting underneath the cqa2 &sheet. a,a,residues in flzrn interface are in red; those in a3 interface with small green labels. Hatched region highlights the pleat under which j2m side-chains PheB56 and TrpB60 make van der Waals’ contacts with a,a2. from that in the isolated fizrn structure (Becker & Reeke, 1985), possibly as a result of the interaction with the a3 domain. (f) Domain Residues interaction

interactions

were considered if their solvent

as part of a domain probe contact surface

area decreased by more than 10% in the presence of the interacting domain. An overview of the different’ interfaces is shown in Figure 7(a) and (b). (i) P29x-a 3 interface 596 A2 of the solvent-accessible surface area of fi2rn is buried in the interface with the as domain (a3 of HT,A (Table 11). The buried area = 610 8’)

Structure

of HLA-A2

at 2% d Resolution

299

i0

u (c) Fig. 7.

interface is formed by parts of the four-stranded /?-sheets of both fizrn and aj, with the direction of the strands in /Izrn being approximately perpendicular to that in a3 (Fig. 8(a)). Nineteen of the 24 residues in the interface are from the first two P-strands of Pzm (Sl, S2) and the fourth and fifth b-strands (S4,S5) of a,; that is, only the strand 1 and 2 edge of the /?-sheet of jzm contacts a small patch on the /?-sheet of a3. The atomic contacts in the interface are listed in Table 11. The interactions are very polar, a property shared by other immunoglobulin inter-domain interactions (Schiffer et al., 1988). There are 16 hydrogen bonds (including 6 involved in saltbridges), 11 of which are between side-chains of one domain and main-chain polar atoms of the other domain. Figure 8(b) illustrates how side-chains from a3 and flzrn appear to intercalate to form hydrogen bonds with main-chain atoms across the interface. Four hydrogen bonds appear to be made between a3 (Arg234, Trp244) and the terminal carboxylate group of j12rn (Fig. 8(a) and Table 11). The C-terminal four residues of Pzm, unlike the analogous C-terminal residues in immunoglobulin constant domains, break out of the final b-strand to

make a sharp reverse turn. This conformation appears to be stabilized by an internal hydrogen bond from ArgB97 to AsnB17. In the interface, it would also be stabilized by the four hydrogen bonds to the carboxylate group (see above) and by an interdomain hydrogen bond to His192 of as from AspB98 of jIzm. Whether the /?-bend at the C terminus of jIzrn exists in unbound /3*m cannot be assessed with certainty, as the electron density in that region is weak in the unbound jIzrn structure determination (Becker & Reeke, 1985). (ii) fi2m-alat

interactions

/I,-microglobulin interacts with the underside of the eight-stranded p-sheet of the al and a2 domains (Fig. 7(a) and (b)). Figure 7(b) illustrates that the 15 residues of /12rn (green) in the interface are from three chain segments at one end of the Bzrn subunit: two residues at the N-terminal, four on the S2+S3 loop and nine residues in a continuous stretch including b-strand 54, the S4+S5 loop and the beginning of strand 55. These residues interact with 19 residues from the bottom of the ala2 /?-sheet and below the one loop (S2+S3 in a2) that projects

M. A. Saper et al.

300

232 mpIIIIIIII-sc 234

mc B99 SC1~~1111111111111

235 mo 242

IIIIIIIIIII~sc

BIO

mc B12 SC~~1111111111111

236 mo 244

B8

IIIIIIIIIII~sc

812

mc B99 SC~~1111111111111 (b)

Figure 8. (a) Detailed view of residues comprising as-pzm interface (see Table 11). /l zm in blue; contacting side-chains in lighter blue. us in red; contacting side-chains in orange. Interdomain hydrogen bonds are shown with broken yellow lines. (b) A schema showing how side-chains (arrows) in the c&Izm interface make alternating hydrogen bonds with main-chain atoms of the other domain. This intercalation also produces van der Waals’ contacts between flanking sidechains such as the ring of TyrBlO with the side-chain of Arg234.

Figure 9. A close-up of the ~zm--c(,cLz interface (the view is parallel to the plane of the ala2 /?-sheet; /&rn, filled bonds; open bonds) showing residues from the /Izm S4 strand and S4+S5 loop packing into the underside of the ala2 pleat, and surrounded on 2 sides by residues 8, 98, 115 and 10, 96, 117.

alaz,

Structure

of HLA-A2

301

at 2.6 A Resolution

Fig. 9). On one end of the interface is a salt-bridge and hydrogen bond cluster involving b2rn AspB53> and Gln32, Arg35 and Arg48 from tlI, On the other end (right, Fig. 7(c)), bzrn residues RI and B3 appear to hydrogen bond with the S2+S3 loop of CQ (residues 119 to 121), which projects below the P-sheet.

plane of the P-sheet (Fig. 7(b), red). /?*rn residues B53 to B63 run across the underside of the cllclz B-pleated-sheet (Fig. 7(a)), such that the carbonyl oxygen atom of B54 and the side-chains of PheB56 and TrpB60 project into a “pleat” of the /?-sheet and are surrounded on two sides by ala* residues 25, 8, 98, 115 and 23, 10. 96. 117 (Fig. 7(c), detail in

Table 12 fi2m-a3

interface

and contacts

COnSeNed

H-bond hm E & z’

IleBl

a1

A

a2

Lys121 Asp1 19 Glyl20

S.C.

C

I ~~~~ “.:.:..,.../ _ Glyl20 .:.....~..,... .. .,..::.: gg$&&@&~~Glyl20 Gln96

7

ProB32

§

z

SerB33

Thr94 Val12

t As@34

§

SerB52

§

LeuB54

lie23 Va125 ThrlO

SerB55 PheB56

c

i2

0

TyrB63

3.00 3.18

2.62, 2.81

f

i)

GJ C

m.c. . S.C. m.c. . S.C. O-*- QH7 947 . ..OGl

2.82, 2.62

S.C. . S.C. n

5; S.C.

QG)

ThrlO

vz vi

. S.C.

S.C. . m.c. S.C. * S.C. S.C. . S.C.

u

Gln96

Tyr27

g@g&..o

OG --- OH2 923 . ..OGl S.C. . S.C.

Phe8 Phe8 Phe9 ThrlO Gln96

- S.C.

N.--O S.C. * m.c.

2.71

n

PheB62

Atomic contacts*

I C

0-s S.C. S.C. S.C. *

NE2 S.C. m.c. S.C.

c

N”;“; .:$&

2.80

2.95

S.C. - S.C. S.C. * S.C. a’

C

OH---OH

3.40

Interface residues show a decrease in contact surface area of at least IO06 compared to the free domains. t C. Based on amino acid sequence, a hydrogen bond or salt-bridge is expected to be present in most other mouse and human alleles examined. $ Hold, side-chain atoms; bold and shaded, charged side-chain atom; SC., sidechain: m.c., main-chain; (-) salt-link; ( ) hydrogen bonds 135 ‘4: (.) hydrophobicor van tier LVaals’ interactions 549 A. 9: Other residues in interface but without interdomain contacts

Calmodulin structure refined at 1.7 A resolution.

Refined crystal structure of ascorbate oxidase at 1.9 A resolution.

Structure of cyclodextrin glycosyltransferase refined at 2.0 A resolution.

Structure of porin refined at 1.8 A resolution.

Refined structure of porcine pepsinogen at 1.8 A resolution.

Crystal structure of human immunoglobulin fragment Fab New refined at 2.0 A resolution.

Crystal and molecular structure of human plasminogen kringle 4 refined at 1.9-A resolution.

Analysis of the structure of a common cold virus, human rhinovirus 14, refined at a resolution of 3.0 A.

The refined crystal structure of Pseudomonas putida lipoamide dehydrogenase complexed with NAD+ at 2.45 A resolution.

Crystal structure of the reduced form of p-hydroxybenzoate hydroxylase refined at 2.3 A resolution.

Refined structure of the complex between guanylate kinase and its substrate GMP at 2.0 A resolution.

The structure of 6-phosphogluconate dehydrogenase refined at 2.5 A resolution.

The crystal structure of staphylococcal nuclease refined at 1.7 A resolution.

Structure of a sarcoplasmic calcium-binding protein from Nereis diversicolor refined at 2.0 A resolution.

Structure of bovine prothrombin fragment 1 refined at 2.25 A resolution.

Structure of NADH peroxidase from Streptococcus faecalis 10C1 refined at 2.16 A resolution.

Crystal structure of cholesterol oxidase from Brevibacterium sterolicum refined at 1.8 A resolution.

Refined structure of rat Clara cell 17 kDa protein at 3.0 A resolution.

Refined crystal structure of Cd, Zn metallothionein at 2.0 A resolution.

Refined crystal structure of beta-lactamase from Staphylococcus aureus PC1 at 2.0 A resolution.

Refined crystal structure of type III chloramphenicol acetyltransferase at 1.75 A resolution.

Refined crystal structure of the complex of subtilisin BPN' and Streptomyces subtilisin inhibitor at 1.8 A resolution.

Structure of the U2 strain of tobacco mosaic virus refined at 3.5 A resolution using X-ray fiber diffraction.

Crystal structure of human alpha-lactalbumin at 1.7 A resolution.