A method for determining the positions of polar hydrogens added to a protein structure that maximizes protein hydrogen bonding.

PROTEINS: Structure, Function, and Genetics 12:266-277 (1992)

A Method for Determining the Positions of Polar Hydrogens Added to a Protein Structure That Maximizes Protein Hydrogen Bonding Michael B. Bass, Derek F. Hopkins, W.Andrew N. Jaquysh, and Rick L. Ornstein Molecular Science Research Center and Environmental Science Research Center, Pacific Northwest Laboratory, Richland, Washington 99352 An automated method for the ABSTRACT optimal placement of polar hydrogens in a protein structure is described. This method treats the polar, side chain hydrogens of lysine, serine, threonine, and tyrosine and the amino terminus of a protein. The program, called NETWORK, divides the potential hydrogenbonding pairs of a protein into groups of interacting donors and acceptors. A search is conducted on each of the local groups to find an arrangement which forms the most hydrogen bonds. If two or more arrangements have the same number of hydrogen bonds, the arrangement with the shortest set of hydrogen bonds is selected. The polar hydrogens of the histidyl side chain are specifically treated, and the ionization state of this residue is allowed to change, if this change results in additional hydrogen bonds for the local group. The program will accept Protein Data Bank as well as Biosym-format coordinate files. Input a n d output routines can be easily modified t o accept other coordinate file formats. The predictions from this method are compared t o known hydrogen positions for bovine pancreatic trypsin inhibitor, insulin, RNase-A, and trypsin for which the neutron diffraction structures have been determined. The usefulness of this program is further demonstrated by a comparison of molecular dynamics simulations for the enzyme cytochrome P-450,, with and without using NETWORK. Key words: computer program, hydrogen bonds, molecular dynamics, computer modeling INTRODUCTION The accuracy of molecular dynamics simulations has evolved over time with the improvement of methods and computer technology. With this increase in the atomic accuracy of the protein model, locating the position of the hydrogens within the protein has become more imperative. Early protein dynamics simulations have used an extended-atom approximation where hydrogen atoms are omitted 0 1992 WILEY-LISS,

INC.

from the structure and embedded in the parameters for the heavy atoms.' This was done to reduce the number of calculations which are necessary to be performed at each time step. To compensate for the lack of hydrogen atoms, the atomic parameters of the heavy atoms were adjusted. Later, to better model hydrogen-bonding interactions in proteins, polar hydrogens were specifically added while aliphatic hydrogens remained omitted from the ~imulation.'.~ Currently many molecular dynamics force fields now include all hydrogen atoms explicitly in protein and nucleic acid At least for some force fields, inclusion of all hydrogen atoms is necessary for a more accurate representation of proteins during a molecular dynamics simulation (ref. 7 and unpublished results). Atomic resolution structures of proteins are generally obtained by X-ray crystallography. This technique is limited to locating only the heavy atoms in a structure. Neutron diffraction studies of crystallized proteins can locate the positions of hydrogen atoms. To date, only four such structures have been deposited in the Protein Data Bank.' Molecular dynamics can compliment X-ray crystallography and neutron diffraction by providing insights into the motion of particular atoms and residues. Because of the critical role polar hydrogens play in determining secondary structure and protein packing, the exact placement of polar hydrogens and ensuing formation of hydrogen bonds are very important. We have found that one misplaced active-site polar hydrogen can drastically change the substrate conformation during a molecular dynamics simulation of the enzyme cytochrome P-450,,, (Bass and Ornstein, unpublished results). Computer-based methods for satisfactorily adding hydrogen atoms to X-ray-determined structures exist for those hydrogens which have restricted torsional degrees of freedom. For hydrogens which

Received April 2, 1991; revision accepted May 7, 1991. Address reprint requests to Rick L. Ornstein, Molecular Science Research Center, Pacific Northwest Laboratory, P.O. Box 999, Mail Stop K2-18, Richland, WA 99352.

MAXIMIZING PROTEIN HYDROGEN BONDING

have a high degree of torsional freedom, no generally acceptable method exists for finding the best location for these hydrogens. The most important of the torsionally flexible hydrogens are the hydrogens which form hydrogen bonds, since these form specific interactions with other groups in a protein. The side chain hydrogen of histidyl residues makes a special case of those polar hydrogens which have restricted mobility. While the position of a particular hydrogen can be inferred from the geometry of the heavy atoms, whether the hydrogen is placed on the NE2 or ND1 nitrogens is not known a przori. In addition, since the pKa of the histidyl side chain is near neutrality, a number of charged residues, containing hydrogens on both nitrogens, might be expected in a large protein even a t neutral pH. There are currently three distinct methods to place hydrogens in a protein structure. The first is to make a n arbitrary initial placement and let energy minimization move the hydrogens to their nearest local minima. Due to the multiple-minima problem,9910this method does not adequately treat polar hydrogens. Energy minimization leads to the nearest local minima: hydrogens will not cross any barriers to find its true minimum. Recently Brunger and Karplus” used a n iterative search-and-localenergy-minimization procedure to find the best torsion angle of each polar hydrogen. This method will find the true minimum for a particular hydrogen. But it does not consider the simultaneous interactions of all rotatable polar hydrogens. These authors realized this problem and attempted to reduce the effect of this omission by performing the search twice: once forward through the list of polar hydrogens and once backward through the list. The approach taken here is to maximize the number of hydrogen bonds within a protein or model system based on conventional geometric rules. A program, called NETWORK, was written to solve this problem. This program divides a protein into groups of interacting hydrogen bond donors and acceptors, called networks. Each network is independently evaluated to maximize the total number of hydrogen bonds. Because only polar hydrogens which have rotational degrees of freedom need be considered, all amide, guanidinyl, aliphatic, and aromatic hydrogens are excluded from consideration by this program. The imidazole hydrogens of histidine are treated as a special case, and the ionization state of the residue can be adjusted to optimize the number of hydrogen bonds. It can be argued that the pKa of certain acidic or basic residues within a protein are uniquely shifted toward neutral pH. This would yield a residue with a protonation state similar to the free acid or base. While no attempt is made by this program to determine the protonation state of these residues, if the protonation state of a residue is manually changed, this program will optimize its hydrogen bonds accordingly.

267

The advantages of NETWORK are: (1) NETWORK is an independent, fully automated program, (2) all rotatable polar hydrogens in each hydrogen bonding network is exhaustively searched for the optimum number of hydrogen bonds, and (3) all histidy1 residues are included as a special case in the optimization. NETWORK was used to predict the positions of the polar hydrogens for bovine pancreatic trypsin inhibitor, insulin, RNase-A, and trypsin. These results were compared with the known positions as determined by neutron diffraction experiments. NETWORK correctly predicted the location of all of the known protein-protein hydrogen bonds which were present in the neutron diffraction structures. In a molecular dynamics simulation, NETWORK was able to reduce the RMS deviation of the backbone atoms of the enzyme cytochrome P-45OCa, by 30% as compared to a simulation, which did not use NETWORK. The authors will make this program available through the Brookhaven Protein Data Bank.

METHODS NETWORK Program A protein is divided into a series of interacting donors and acceptors called networks. The word “donor” refers to a heavy atom which is covalently bound to a hydrogen which can participate in a hydrogen bond. The word “acceptor” refers to a heavy atom which can participate in a hydrogen bond. The term “ D A (donor/acceptor) refers to a heavy atom which can both donate and accept a hydrogen to form a hydrogen bond. In general, only donors which have hydrogens which are capable of significant rotational freedom are considered by this program. This includes the hydroxyl hydrogens of serine, threonine, and tyrosine and the amino hydrogens of lysine and the amino-terminal residue of a protein. The polar hydrogens of the histidine side chain are treated as a special case (see below). Although bifurcated hydrogen bonds have been seen in protein structures,” these are not specifically considered by this program. All bifurcated hydrogen bonds were found to involve either the peptide amide hydrogen or the side chains of residues which have highly restricted rotational degrees of freedom. Except for histidine, these potential hydrogen bond donors are not considered by NETWORK. In the case of histidine, since the position of the hydrogen is not moved, bifurcated hydrogen bonds could result but not as a consequence of this program. At all times, the coordinates of the heavy atoms and excluded hydrogens (amide, guanidinyl, aliphatic, and aromatic hydrogens) remain fixed in space. Each network is rigorously searched for the optimum number of hydrogen bonds between the donors and acceptors. In the case where two or more alternative arrangements with the same number of hydrogen bonds exists, the optimum network is se-

268

M.B. BASS ET AL.

lected by using the arrangement with the smallest total donor-acceptor distance. Optimized hydrogen bonds are created in the coordinate file by rotating the hydrogens keeping the bond lengths and valance angles constant. The program is designed to read both Protein Data Bank and Biosym coordinate file formats. NETWORK was written in the C programming language and compiles under Unix and the VAXNMS operating systems. The program was written in several modules to facilitate debugging and modification. The input and output routines have been separated to allow easy adaptation to other coordinate file formats. A discussion of each of the major sections of the program follows.

Read Donors and Acceptors Before the hydrogen bond donors and acceptors can be read from the data file, they must be defined. Two auxiliary files contain the definition of all atoms to be read in as hydrogen bond donors or acceptors. The files are called “donor-lookup” and “acceptor-lookup.” The purpose of having external definition files is to allow individual users to define exactly which donors and acceptors will be considered by the program. There are exceptions to this idea of an external definition, however. NETWORK considers any oxygen atom to be an acceptor, and the side chain hydrogen donors of histidines are defined internally and treated as a special case (described later). Some molecular dynamics simulations use “neutralized” representations of the amino acids. NETWORK will correctly handle these amino acids as well. The amino acids are distinguished according to the convention used by Biosym Technologies in their coordinate-file format. (Neutralized residues of arginine, aspartic acid, glutamic acid, histidine, and lysine are named ARG, ASP, GLU, HIS, and LYS, whereas the charged forms of these residues are named ARG + , ASP-, GLU-, HIS + , and LYS + , respectively.) The first step in reading the donors and acceptors is to read the necessary information for each residue from the data file into an array. Each element in the array holds information about one atom in the residue. The array is passed to a routine that checks for donors and acceptors. Each atom in the residue is compared to the definition files to identify it as a donor or an acceptor. If the atom is a donor, the information taken from the residue array consists of the donor position, the position of all hydrogens attached to it, and the position of the donor’s parent and grandparent atoms. The last two atomic positions will be used in the calculation of the dihedral angle of the hydrogen. If the atom is an acceptor, only the position of the acceptor atom is taken. If an atom is both a donor and an acceptor, an element of each type is built. This information is stored in ei-

ther the donor or acceptor list. These lists are the two main data structures used by NETWORK.

Find Partners The next step in the algorithm is to define all potential partners between donors and acceptors. This is done by searching the acceptor list once for each element in the donor list. If a donor and acceptor are less than 3.2 from each other, they are considered to be potential partners. Each donor and acceptor atom keeps track of its potential partners in a partner list within the donor or acceptor list element. In addition to the position of its partners in the acceptor list, a donor also stores the distance from the donor to the acceptor for later use. If an atom is on both the donor and acceptor list, the fact that it can both donate and accept is recorded in the acceptor element, along with a pointer to the atom’s location in the donor list. Find Networks A network is defined as a group of donors and acceptors which are linked together as potential partners. Because it is possible for an atom to be both a donor and a n acceptor (the oxygen in water, for example) and because making one hydrogen bond in one place may preclude making a hydrogen bond elsewhere, it is necessary to define and optimize such networks instead of arbitrarily making hydrogen bonds with a first come, first serve bias. NETWORK allows three different levels for defining networks. In level 1, a network is defined as all nonwater donors and acceptors which are linked together as potential partners. Level 2 allows nonwater to nonwater as well as nonwater to water hydrogen bonds, but water to water hydrogen bonds are excluded from the network. Level 3 considers all possible bonds in determining a network. The potential hydrogen bonds within these networks will be optimized, while potential bonds not included in a network will be assigned arbitrarily in a step following optimization. The networks are defined by using two indirectly recursive routines. First, a donor that has not been assigned to a network is sent to the donor fanout routine. The donor’s partners are scrutinized to see if they meet the criteria for the network option in effect. Each of the donor’s partners which meets this criteria is used to call the acceptor fanout routine. At this point, all the partners of the acceptor are checked for possible inclusion in the network. Each of the acceptor’s partners meeting the inclusion criteria is used, in turn, to call the donor fanout routine. At each point, any donor or acceptor which is included in the net is assigned a net number. The recursion ends when all donor or acceptor partners have been assigned a net number. As will be seen in the following section, it is advantageous to keep the size of the networks as small


as possible. To keep a network small, a network can be divided into two networks given the following condition: If a n obligate acceptor (an atom which can accept a hydrogen bond, but not donate one) is found in the net, and the total number of hydrogen bonds it can form is greater or equal to the number of possible partners, the fanout recursion stops at this acceptor. The acceptor is not given a net number, but is included in the net through the link from its calling donor. This allows a n acceptor to be included in more than one net. It should be noted that a n acceptor which can accept all offered hydrogens cannot preclude forming a hydrogen bond in any network in which it is included.

Optimize Networks Once the networks have been defined, it is necessary to optimize the number of hydrogen bonds within the network. This is done by checking every possible permutation of the hydrogen bonds within a network, and implementing the permutation that provides the most hydrogen bonds. If a different permutation has an equal number of hydrogen bonds but a smaller sum of all donor-acceptor distances, that permutation is regarded as optimal. Once a network of hydrogen bonds has been determined to be optimal, this fact is recorded in the donors’ partner lists. To check every possible permutation of a network, the donors in the network are linked on a one to one basis with their partners and put into an array. Thus, the array has one element for each donoracceptor pair in the network. Each permutation of this array is checked from top to bottom for the number of hydrogen bonds which can be formed. The donor-acceptor pairs are checked to ensure that a donor does not participate in more hydrogen bonds than it has hydrogens. Likewise acceptors are tested to make sure that it will not participate in more hydrogen bonds than it has lone-pair electrons. DAs are checked to make sure they do not exceed both criteria. Thus, if a bond in the lower part of an array permutation is precluded by a bond made before it, it will be made in a permutation where that donoracceptor combination is at the top of the array. Because each network is rigorously examined for the best possible set of hydrogen bonds, the number of comparisons scales with the factorial of the number of elements in the network. This causes this module to use a significant amount of time for networks containing over 11elements. For instance, on a Silicon Graphics Personal Iris 4D25 workstation, a network of nine elements takes approximately 1.5 min to evaluate, whereas a network of 10 elements takes approximately 16 min.

Assign Leftovers In some cases, due to the specified definition of a network, certain donor-acceptor pairs will not be

269

included in the optimization of the network. For instance, if level 2 was in effect, only protein-protein or protein-water hydrogen bonds would be included in the networks. No water-water hydrogen bonds would be considered for optimization. These additional hydrogen bonds are assigned arbitrarily in this module. As the donor list is parsed, each donor is checked to see if it has any unassigned hydrogens. If it does, and the donor has partners who have the ability to form additional hydrogen bonds, then the hydrogen bonds are assigned based on the order of appearance in the coordinate file.

Move Hydrogens Once the bonding partners have been defined, the spatial locations of the donated hydrogens are changed. This is done by moving the hydrogen to the point closest to the acceptor that maintains the constraints of covalent bond length and valence angle for the hydrogen. First internal coordinates for the acceptor are determined using the donor, the donor’s parent, and the donor’s grandparent as reference atoms. The dihedral angle to the acceptor atom provides the direction the hydrogen must point. This dihedral angle, in combination with the hydrogen’s internal coordinates (bond length and valence angle), are converted to Cartesian coordinates. The resulting position is the closest approach of the hydrogen to the acceptor that maintains the original hydrogen-to-donor bond length and valence angle. If one hydrogen attached to a donor is moved, the rest of the hydrogens attached to the donor must be moved to maintain the approximate geometry of the donor hydrogens. Consequently, NETWORK moves all hydrogens covalently bonded to the donor by adjusting each of the hydrogens’ dihedral angles. Energy minimization prior to molecular dynamics should readily resolve any relatively small distortions that may result from the torsion angles between the hydrogens. Although optimization specifically forbids two DAs from donating to each other, NETWORK does not check to see if an unmoved hydrogen might coincidentally overlap with a moved hydrogen. Energy minimization should readily resolve any bad nonbonded contacts. Once the new positions of the hydrogens have been determined, these changes are reflected in a new data file.

Histidines A special case which has not been discussed is the case of histidines. One option of dealing with histidines is to consider that the identity of each histidine is the same as is originally stated in the data file. In other words, if the file contains a histidine identified as HIS, the hydrogen is considered attached to the NE2 nitrogen, making NE2 a donor, and no hydrogen attached to the ND1 nitrogen, making ND1 an acceptor (see Fig. 1).

270

M.B. BASS ET AL. HE2

HE2

NE2

NE2

HISD

HIS t

NE2

HIS

Fig. 1. The side chain atoms of the histidyl residue showing the three different protonation states this residue can occupy. Each atom is labeled with its atom name.

NETWORK deals with histidine in a different way. Instead of considering the identity as set, we try to find the best identity for each given situation. This means that if a histidine originally identified as HIS and can increase the number of hydrogen bonds in the network by donating with hydrogens on both ND1 and NE2, the histidine’s identity is changed to HIS . Since the pK, of histidine is near neutrality, it may be that in certain situations, some histidines may be charged. The fact that histidines have no a priori identity makes them a special case in almost all of the modules which we have discussed. To make a histidine fit more closely into the general case, both ND1 and NE2 are considered to be a single atom, a DA, which can donate two hydrogens and accept one. The case where both hydrogens will be donated and one accepted will not be allowed to occur. When histidines are read into the list, the positions of both ND1 and NE2 are read into the donor element, as well as all the reference positions for each atom. Special fields are reserved for the atoms not included in the general case. Also, the hydrogen positions are not read. Because we may need to know the position of a hydrogen which does not exist in the original file, we calculate the positions for both hydrogens. A special routine has been written to check to see if a histidine partners with another atom. In this routine, NETWORK assumes that both the donor and acceptor could be histidines, and checks both distance and angle constraints. NETWORK assumes that both ND1 and NE2 have at most one possible partner each, and those partners are placed at position 0 and 1 in the partner list, respectively. As mentioned before, when optimizing a network, the identity of a histidine is not fixed. If more hydrogen bonds can be made by changing the identity, the program will do so. If two hydrogen bonding alternatives are of equal value to the network, then

+

the program will maintain the original histidine identity. The actual change in identity is done in a separate module. A histidine is examined to see if ND1 and NE2 are donors or acceptors, or if they have not been partnered. Depending on the optimum configuration, the identity is changed, if needed. In the case of a HIS+ which donates one hydrogen but not the other, even though it could maintain its present identity, it will become either a HIS or HISD. The new identity is recorded in the donor list element containing that histidine.

Molecular Dynamics Each molecular dynamics simulation for the norcamphor-cytochrome P-450,,, system was performed as stated in ref. 19: Restrained energy minimization was carried out by 500 steps of steepest descents minimization with the positions of the heavy atoms restrained by a 2000 kcal/mol restraining force. This step allows the hydrogen atoms to relax. Another 500 steps of steepest descents energy minimization was performed in which the heavy atoms of the protein and substrate were restrained with a 2000 kcal/mol force allowing the solvent molecules to relax. Unrestrained energy minimization was performed for 1500 steps of steepest descents and up to 5000 steps of conjugate gradients minimizations. Minimization was terminated if the maximum gradient was less than 1 kcallA; this usually occurred after 3400 steps in the conjugate gradient minimizer. Molecular dynamics was performed for 10 psec by initializing the system at 50 K and slowly warming to 300 K using a temperature-scaling time constant of 2 psec. The system was reinitialized a t 300 K with a temperature-scaling time constant of 0.1 psec. Dynamics was continued for a total of 75 psec for the NETWORK-optimized structure and 73 psec for the unoptimized structure. These calculations were performed on a Cray XMP.

271


TABLE I. Comparison of Hydrogen Bonds i n Protein Structures Determined by Neutron Diffraction and NETWORK All hydrogen bonds Conserved Observed* Predicted'

Protein BF'TIs Insulin** RNasett -sinf*

91 191 186 167

123 453 262 176

89 163 182 166

Protein-protein hydrogen bonds Conserved3 Observed Predicted 33 65 97 167

34 68 106 176

33 65 97 166

*Hydrogen bonds detected using the criteria of hydrogen-to-acceptor distance less than 2.5 A and the donorhydrogen-acceptor angle of 180 ? 45". +Structurepredicted by NETWORK using Level 2 optimization (allprotein-protein and protein-water hydrogen bonds). *Hydrogenbonds present in both the neutron structure and the structure predicted by NETWORK, and include hydrogen bonds which have reversed donor-acceptor polarity (e.g., a hydrogen bond from water A to water B which was optimized to be from water B to water A are considered conserved). $Bovine pancreatic trypsin inhibitor (BPTI) 1.8 A resolution structure from Protein Data Bank file 5PTI." **Insulin 2.2 A resolution structure from Protein Data Bank file 31NS.16 This structure did not contain the positions of any of the water hydrogens. Therefore the values under the observed and conserved headings reflect the positions arbitrarily assigned by MOLEDT. ++RNase-A2.0 A resolution structure from Protein Data Bank file 5RSA.13 **Trypsin 1.8 A resolution structure from Protein Data Bank file 1NTP.14This structure did not contain any water of crystallization.

RESULTS AND DISCUSSION Four protein structures with hydrogen positions determined experimentally by neutron diffraction were used to validate and measure the possible benefit of the NETWORK program. These are RNase-A (Brookhaven Protein Data Bank file 5RSA),13 trypsin (file 1NTP),14bovine pancreatic trypsin inhibitor (file 5PTI),15 and insulin (file 31NS).16 In the case of trypsin, the monoisopropylphosphoryl group was removed for this validation. This structure did not contain any waters of crystallization. In the case of insulin, both dimers in the asymmetric unit were included. The insulin structure did not contain the positions of the water hydrogens. Any missing hydrogen atoms were added to the structure using the program MOLEDT (Biosym Technologies). Based on the heavy atom positions, MOLEDT adds hydrogens to the structure with idealized bond lengths, valence angles, and dihedral angles. Since NETWORK ignores the original position of the hydrogens, no special care was taken to remove the hydrogens originally found in the coordinate file. Each coordinate file was then used as input to NETWORK. The hydrogen bonds were optimized at level 2 (all protein-protein and proteinwater hydrogen bonds were considered). Hydrogen bonds were detected by a separate program using the constraints of the hydrogen-to-acceptor distance less than 2.5 A and the donor-hydrogen-acceptor angle must be 180 45". These criteria are slightly less restrictive than the criteria used by NETWORK, where the donor-acceptor distance is less than 3.2 A. Table I summarizes the hydrogen bonds found experimentally and those predicted by the NETWORK program. Figure 2, a network from in-

*

sulin, represents a typical hydrogen bonding network. In a direct comparison, NETWORK placed hydrogens to a n average of 0.77 A (RNase-A), 0.37 A (trypsin), 0.46 (BPTI), and 0.51 A (insulin) of the position determined experimentally. Table I1 summarizes the rms differences of the donor hydrogens in the NETWORK-optimized structure as compared with the positions determined by neutron diffraction. For comparison the results of Brunger and Karplus" are included. NETWORK found all the hydrogen bonds which were present in the neutron diffraction structure of bovine pancreatic trypsin inhibitor and an additional hydrogen bond between the side chain of Ser-47 and its carbonyl oxygen. While the data in Table I1 show that the position of the tyrosine hydroxyl hydrogens is not different from the neutron structure, this is because NETWORK did not move these hydrogens. Two tyrosyl residues were not included in any networks (there were no acceptors within the cutoff criterion). The other two tyrosyl residues (10 and 23) were in networks which were optimized to form a hydrogen bond from a water molecule to the hydroxyl rather than in the opposite orientation. For the insulin structure, NETWORK converted the D5 histidine from HIS + to HISD. All of the protein-protein hydrogen bonds were conserved. NETWORK was also able to find three additional hydrogen bonds: ThrA8 side chain to Glu-A4 carbonyl oxygen and Ser-12 side chain to its carbonyl oxygen in both the A and C subunits. From Table I, it appears that NETWORK doubled the number of intermolecular hydrogen bonds. No hydrogen positions for the waters of crystallization were reported in the insulin struc-

272

M.B. BASS ET AL.

p.91

Fig. 2. A stereogram depicting a typical network identified by NETWORK. This figure shows the atomic positions as determined by neutron diffraction (in black) and those optimized by NETWORK (in red). Also shown are the lengths of the optimized hydrogen bonds (in green).

TABLE 11. rms Differences of the Donor Hydrogens Between NETWORK and Neutron Diffraction Structures ~

Protein Trypsin RNase BPTI Insulin§§

Method Number of hydrogens NETWORK Brunger and Karplustt Number NETWORK Brunger and Karplus Number NETWORK Brunger and Karplus Number NETWORK

Lysinet 42 0.16 0.19 27 0.20 0.60 9 0.22 0.25 6 0.24

rms differences (A)* Serine* Tyrosine§ 33 0.34 0.89 15 0.61 0.98 1 0.96 0.71 6 0.63

10 0.44 0.64 6 0.60 0.60 4 0.00" 0.81 8 0.35

Threonine** 10 0.17 0.34 10 0.30 1.12 3 0.07 0.19 4 0.81

*rms differences are calculated by measuring the root-mean-square difference in the positions of the donor hydrogens between the NETWORK-optimized position and the position determined by neutron diffraction. tl;-Hydrogens. *y-Hydrogens. "-hydrogens. **yl-hydrogens. ttValues from Brunger and Karplus." **NETWORKdid not move any of the tyrosines in this structure. $$Brungerand Karplus did not determine values for insulin.

ture. Consequently, the "observed intermolecular hydrogen bonds reflect those that were assigned by MOLEDT. NETWORK optimization of RNase-A converted His-105 from HIS + to HISD. All protein-protein bonds found in the neutron diffraction structure were identified by NETWORK as well as 9 additional protein-protein hydrogen bonds (only two of

which were wil in 15" of the angle cutoff criteria). Nearly all of the intermolecular hydrogen bonds were identified by NETWORK. In addition, NETWORK identified 76 more hydrogen bonds in the solvent. For trypsin, His-57 was converted from HIS + to HISD. This structure showed the first loss of a protein-protein hydrogen bond: the His-57 HD1 to Asp-102 OD1 hydrogen bond is missing. This hy-

273


TABLE 111. Performance of Various Techniques on the Placement of Polar Hydrogens in RNase-A

Ster, Neutron structure* MOLEDT placement5 MOLEDT, energy minimization** NETWORK placementtt NETWORK, energy minimization**

Total hydrogen bonds Number* Conserved' 186 155 133 207 152 262 182 250 172

Protein-protein hydrogen bonds Number Conserved 97 91 87 100 90 106 97 97 91

*Numberof hydrogenbonds found meeting the following criteria: hydrogen-acceptor distance less that 2.5 A and the donor-hydrogen-acceptor angle is 180 t 45". 'Hydrogen bonds present in both the neutron structure and the structure predicted by NETWORK, and include hydrogen bonds which have reversed donor-acceptor polarity (e.g.,a hydrogen bond from water A to water B which was optimized to be from water B to water A is considered conserved). *Reference 13. $Hydrogen coordinates were removed from the coordinate file and added back using the MOLEDT (Biosym Technologies) program. **MOLEDT placement of hydrogens, followed by 500 steps of steepest descents energy minimization with the position of the heavy atoms fixed in space. '+NETWORK optimization using Level 2, all protein-protein and protein-water hydrogen bonds were optimized. **NETWORKoptimization at Level 2 followed by 500 steps of steepest descents energy minimization with the position of the heavy atoms fixed in space.

drogen bond is half of a bifurcated hydrogen bond between the ND1-edge of the histidine ring and the carboxylate of Asp-102. The hydrogen bond between the HD1 and the OD2 atom of Asp-102 is still present. The reason this bifurcated hydrogen bond is lost is due to the use of idealized hydrogen positions for determining the position of the histidine hydrogens. The hydrogen coordinates are not taken from the input file, but are calculated by NETWORK from the heavy atom positions. Energy minimization has been used in the past to allow, among other things, relaxation of hydrogen atoms added to a crystal structure. In this laboratory, we have been using an initial step of constrained energy minimization prior to molecular dynamics.l7 This allows the hydrogens to adopt a low energy position without affecting the coordinates of the heavy atoms. RNase-A was examined to see how well NETWORK facilitates hydrogen bonding over constrained energy minimization. To more directly compare to a n actual case where only the X-ray structure is known, the hydrogen atoms were deleted from this structure. Hydrogen atoms were added back using the MOLEDT program. This structure was energy minimized by performing 500 steps of steepest descents energy minimization with the positions of all heavy atoms fixed in space using the Discover program (V2.41 from Biosym Technologies) and the consistent valence force field.4 The NETWORK-optimized structure was also subjected to this constrained energy minimization procedure. Data from this procedure are summarized in Table 111. As expected all of the backbone-backbone hydrogen bonds were preserved. Energy minimization

did gain 10% more protein-protein hydrogen bonds over those arbitrarily assigned by MOLEDT. NETWORK found 6% additional protein-protein hydrogen bonds over energy minimization. Energy minimization after NETWORK lost 12 hydrogen bonds, of which 9 were protein-protein hydrogen bonds. Five of the protein-protein hydrogen bonds were near the cut-off criteria for classification of a hydrogen bond (hydrogen-to-acceptor distance is less than 2.5 A and the donor-hydrogen-acceptor angle is 180 45"): Three hydrogen bonds were within 15" of the angle cutoff and two were within 0.1 A of the distance cutoff criteria. The others were lost due to competition from multiple hydrogen bonds formed with a single side chain donor. As a test of NETWORK on a larger protein, P450,,, with norcamphor bound in the active site" was used to determine the effect on a molecular dynamics simulation between arbitrarily defined hydrogens and hydrogens optimally positioned by NETWORK. NETWORK changed two histidyl residues. His-270 was changed from HIS to HIS+ and His-355 was changed to HISD. To analyze the distribution of hydrogen bonds, we performed a series of energy minimizations as was done for the case of RNase-A. Table IV shows the performance of various energy-minimization and NETWORK optimizations of hydrogen bonds for P-450,,,. The energy minimization performed here was the same as was used in the test with RNase-A, namely, 500 steps of steepest descents energy minimization with the positions of the heavy atoms fixed in space. Energy minimization alone found 25 additional protein-protein and 158 total hydrogen bonds over those arbi-

*

274

M.B. BASS ET AL.

TABLE IV. Prediction of Polar Hydrogen Locations in P-450,,*

Step MOLEDT placement* MOLEDT, energy minimizationg NETWORK placement** NETWORK, energy minimizationtt

Total hydrogen bondst 431

Protein-protein hydrogen bonds 301 326 331 341

589

615 632

*Cytochrome P-450,,, with norcamphor in the active site." +Hydrogenbonds detected using the criteria of hydrogen-to-acceptordistance less than 2.5 A and the donor-hydrogen-acceptor angle is 180 f 45". *Hydrogen positions were added to the coordinate file by the MOLEDT (Biosym Technologies) program. OMOLEDT hydrogen placement followed by 500 steps of steepest descents energy minimization with the position of the heavy atoms fixed in space. **NETWORK optimization of hydrogen positions a t Level 2. "NETWORK optimization at Level 2 followed by 500 steps of steepest descents energy minimization with the positions of the heavy atoms fixed in space.

trarily assigned by MOLEDT. NETWORK, however, found 30 additional protein-protein hydrogen bonds over those that resulted from arbitrary assignment of hydrogen position and 5 more than resulted from energy minimization. When examining total number of hydrogen bonds, NETWORK found 184 additional hydrogen bonds over arbitrary placement and 28 additional hydrogen bonds over energy minimization. Energy minimization after NETWORK yielded an additional 10 protein-protein hydrogen bonds, adding 17 hydrogen bonds to the total number. In this case, NETWORK with energy-minimization yielded more hydrogen bonds than the use of NETWORK alone. A 75 psec molecular dynamics simulation was conducted using the Discover program for the NETWORK optimized structure. Details of the simulation are given in the Methods section. The unoptimized structure was also subjected to a 73 psec dynamics simulation. Figure 3 depicts the rms deviation for backbone and all heavy atoms for P-450,,, for both simulations. In Figure 4, the secondary structure rms deviations are shown. As can be seen, the RMS deviations for P-450,,, are significantly lower after hydrogen bond optimization with NETWORK. The rms deviation reaches a steady-state much earlier than a standard simulation without NETWORK optimization. Energy minimization from the initial structures yielded a backbone atom deviation of 0.78 away from the crystal structure, while the NETWORK-optimized structure deviated by only 0.69 A. The simulation-average rms deviations differed by 0.37 A between the two simulations. Surprisingly, when the rms fluctuations, analogous to the crystallographic B-factors, were calculated (averaged from 50 to 73 psec for each simulation), the NETWORK-optimized structure did not have a statistically significant difference in the mobility. Figure 5 depicts the total number of hydrogen bonds as a function of time for both simulations as well as the number of protein-protein hy-

0

IS

30

60

45

15

Time (psec)

Fig. 3. A plot depicting the rms deviation of cytochrome P450,,, (with bound norcamphor) as a function of two molecular dynamics simulations comparing NETWORK-optimized and unoptimized starting structures. The solid and dashed traces reflect the backbone and heavy atom deviations, respectively, for the NETWORK-optimized simulation. The solid and dashed traces with circles reflect the backbone and heavy atom deviations, respectively, for the unoptimized simulation.

drogen bonds. Although the differences are not great, the NETWORK-optimized trajectory seemed to consistently possess more hydrogen bonds than the trajectory without NETWORK optimization. This is especially true when only the protein-protein hydrogen bonds are examined. We have compared the average displacements of the a-carbons of P-450,,, from the crystal structure. For the most part, the displacements are not much different between the simulations. But the average displacement of helix A is about 1.2 A closer to the crystal structure for the NETWORK-optimized trajectory than the unoptimized trajectory. Likewise the C-terminus of the G helix has been displaced about 1.5 A less than the unoptimized trajectory. The last 10 residues in the I helix and first few residues of the J helix are displaced 1.0 further than in the optimized trajectory. This displacement of the

A


275

( + 0.36) and the carbonyl oxygen (-0.38) may also contribute to the maintenance of this hydrogen bond and the close association between these two regions. p-sheet 5 contained a 2.8 A greater displacement from the crystal structure for the unoptimized trajectory. The only exception to this trend was p-sheet 3 which was displaced about 1.2 A more in the optimized trajectory than in the unoptimized trajectory. Y These changes in the C, distances between the two simulations are on the periphery of the protein. The closest is about 25 A from the active site. The 0.0 only exception is p-sheet 3 which forms part of the 0 15 30 45 60 I5 active site. The residues which form part of the acTime (psec) tive site Val-295 and Asp-297 (which is thought to form a hydrogen bond to a heme propionate group2') Fig. 4. A plot depicting the rms deviations for the secondary and nonsecondary structures of norcamphor-bound cytochrome are displaced the same amount in both the optimized P-450,,, comparing NETWORK-optimized and unoptimized and unoptimized trajectories. The largest difference starting structures. The solid and dashed traces reflect the backis in another strand of this sheet from residues Aspbone rms deviations of the secondary and nonsecondary structures for the NETWORK-optimized simulation. The solid and 316 to Leu-320 (12 to 20 A from the substrate). The dashed traces with circles reflect the secondary and nonsecondother difference in this sheet occurs at Gly-298 and ary rms deviations for the unoptimized simulation. to a lesser extent at Ala-296. The side chains on these residues project away from the active site. Because of the distances from the active site, these changes should not invalidate the results from earlier simulations of P-450,,m.17,19,21 These earlier simulations utilized hand-optimization of hydrogen bonds near the active site, such as Tyr-96 and His355. (Present and future simulations of cytochrome P-450,,, now employ NETWORK-optimization.) These data show that NETWORK provides additional stability to certain secondary structures. This leads to an overall reduction in the rms deviation of the protein as compared to a trajectory which was not optimized by NETWORK. 200 1 ,

1 ,

0

15

30

45

60

75

Time (Dsec)

Fig. 5. A plot depicting the number of hydrogen bonds in norcamphor-bound cytochrome P-450,,, in two molecular dynamics simulations comparing the NETWORK-optimized and unoptimized starting structures. The solid lines reflect the number of hydrogen bonds for the NETWORK-optimized trajectory. The dashed lines reflect the number of hydrogen bonds for the unoptimized trajectory. The lines without circles represent the total number of hydrogen bonds. The lines with circles represent the number of protein-protein hydrogen bonds.

a-carbon on His-270, which was changed by NETWORK from HIS to HIS+, was 1.5 A less in the NETWORK-optimized simulation than the unoptimized simulation (see Fig. 6). In the NETWORKoptimized trajectory, a hydrogen bond between His270 and the carbonyl of Val-338 is maintained. This hydrogen bond keeps the J helix and region between Pro-335 and Ser-341 in close proximity. When the space between these two regions increases, several water molecules move in to fill the space forming a cluster which hydrogen bonds to His-270. The electrostatic attraction between the histidyl side chain

CONCLUSIONS NETWORK is a program which automatically finds the optimal locations in a protein structure for polar hydrogens that have rotational degrees of freedom and the polar hydrogens on histidyl side chains. The method is based on maximizing the number of hydrogen bonds that can be formed in each network or group of interacting donors and acceptors and minimizing the total distance between donors and acceptors. NETWORK rigorously tests each network to find the arrangement with the greatest number of hydrogen bonds. Each hydrogen is assigned to a position which allows the closest approach of the hydrogen to the acceptor and maintains the hydrogen's bond length and valance angle. NETWORK should not be construed as a final determination of the position of polar hydrogens in a protein. It makes the assumption that hydrogens are placed so as to optimize the total number of hydrogen bonds and minimize the distance between the hydrogen and the acceptor. It does not explicitly optimize the positions of the hydrogens based on energy considerations as in the method of Brunger and

276

M.B. BASS ET AL.

a

Fig. 6. A pair of stereograms depicting the hydrogen bond patterns surrounding His-270 in cytochrome P-450,,,. These structures represent the simulation-average position of these atoms. Hydrogen atoms have been removed from amino acid residues for clarity. (a) The average structure of the unoptimized

trajectory. (b) The average structure from the NETWORK-optimized trajectory. Notice the cluster of four water molecules (519, 612,620, and 694) in the unoptimized trajectory which moves into the space between residues His-270 and Val-338.

Karplus," albeit some energy criteria are implicit in the geometric constraints that are imposed. We view NETWORK as a tool to overcome the multipleminima problem in a coarse manner. Subsequent energy minimization should be performed to relieve

any nonbonded interactions between moved and unmoved atoms. Energy minimization will also allow hydrogens to stretch or bend from ideal bond lengths and valence angles. The placement of hydrogens on a nearly direct


line from the donor to the acceptor may lead to steric conflicts with other atoms. To date, the authors have not found any steric conflicts within 0.6 A (interatomic distance). At this distance energy minimization using Discover was able to correct the steric conflicts. Although not encountered in the test cases, NETWORK may not correctly optimize all of the hydrogen bonds in a given protein. Therefore NETWORK lists each hydrogen it has moved and each histidine that changes identity. This allows the user to evaluate the arrangement of hydrogen bonds in regions of interest. In the test cases where information exists on the location of hydrogens in protein structures, NETWORK was able to correctly predict the location of all of the hydrogens involved in proteinprotein hydrogen bonds. The only exception was the loss of one-half of a bifurcated hydrogen bond in the trypsin structure. NETWORK has demonstrated its usefulness in a molecular dynamics simulation by lowering the rms deviation from the crystal structure by 30% for cytochrome P-450,,,. In addition, the structure reached a steady-state much earlier in the simulation as compared to a simulation without NETWORK optimization. Additional improvements could be made to NETWORK in which the energetics of equivalent hydrogen bond networks is evaluated to find an arrangement which is lower in energy. The current version of NETWORK does not evaluate the relative strengths of hydrogen bonds. For instance it does not differentiate between a hydrogen bond donated from a histidine side chain to a serine side chain and a hydrogen bond in the opposite orientation (from the serine to the histidine).

ACKNOWLEDGMENTS The authors gratefully acknowledge the helpful discussions and critical review of the manuscript by Dr. Mark Paulsen of Pacific Northwest Laboratory. Pacific Northwest Laboratory is operated for the U.S. Department of Energy by Battelle Memorial Institute under Contract DE-AC06-76RLO 1830. M.B. Bass is supported by the Northwest College and University Association for Science, in affiliation with Washington State University, under Contract DE-AM06-76RLO 2225 with the U.S. Department of Energy, Office of Energy Research. D.F. Hopkins and W.A.N. Jaquysh are supported by the Department of Energy, Division of University and Industry Programs, Office of Energy Research, as Science and Engineering Research Semester program participants a t Pacific Northwest Laboratory.

277

REFERENCES 1. McCammon, J.A., Gelin, B.R., Karplus, M. Dynamics of folded proteins. Nature (London) 267585-590, 1977. 2. Brooks, B., Karplus, M. Harmonic dynamics of proteins: Normal modes and fluctuations in bovine pancreatic trypsin inhibitor. Proc. Natl. Acad. Sci. U.S.A. 80:65716575, 1983. 3. Post, C.B., Brooks, B.R., Karplus, M., Dobson, C.M., Artymiuk, P.J., Cheetham, J.C., Phillips, D.C. Molecular dynamics simulations of native and substrate-bound lysozyme. J . Mol. Biol. 190:455-479, 1986. 4. Dauber-Osguthorpe, P., Roberts, V.A., Osguthorpe, D.J., Wolff, J., Genest, M., Hagler, A.T. Structure and energetics of ligand binding to proteins: Escherzchia colz dihydrofolate reductase-trimethoprim, a drug-receptor system. Proteins 4:31-47, 1988. 5. Weiner, S.J., Kollman, P.A., Nguyen, D.T., Case, D.A. An all atom force field for simulations of proteins and nucleic acids. J. Comput. Chem. 7:230-252, 1986. 6. Nilsson, L., Karplus, M. Empirical energy functions for energy minimization and dynamics of nucleic acids. J . Comput. Chem. 7:591-616, 1986. 7. Ornstein, R.L. Using molecular dynamics simulations on crambin to evaluate the suitability of different continuum dielectric and hydrogen atoms models for protein simulations. J . Biomolec. Struct. Dynam. 7:1019-1041, 1990. 8. Bernstein, F.C., Koetzle, T.F., Williams, G.J.B., Meyer, E.F. Jr., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T., Tasumi, M. The protein data bank: A computerbased archival file for macromolecular structures. J . Mol. Biol. 112:535-542, 1977. 9. Scheraga, H.A. Recent progress in the theoretical treatment of protein folding. Biopolymers 22:l-14, 1983. 10. McCammon, J.A., Harvey, S.C. “Dynamics of Proteins and Nucleic Acids.” Cambridge: Cambridge University Press, 1987. 11. Brunger, A.T., Karplus, M. Polar hydrogen positions in proteins: Empirical energy placement and neutron diffraction comparison. Proteins 4:148-156, 1988. 12. Baker, E.N., Hubbard, R.E. Hydrogen bonding in globular proteins. Prog. Biophys. Mol. Biol. 44:97-179, 1984. 13. Wlodawer, A,, Borkakoti, N., Moss, D.S., Howlin, B. Comparison of two independently refined models of ribonuclease-A. Acta Crystallogr. Ser. B. 42379-387, 1986. 14. Kossiakoff, A.A. Use of the neutron diffraction-HiD exchange technique to determine the conformational dynamics of trypsin. Basic Life Sci. 27:281-304, 1984. 15. Wlodawer, A,, Deisenhofer, J., Huber, R. Comparison of two highly refined structures of bovine pancreatic trypsin inhibitor. J . Mol. Biol. 193:145-156, 1987. 16. Wlodawer, A., Savage, H., Dodson, G. Structure of insulin: Results of joint neutron and x-ray refinement. Acta Crystallogr. Ser. B. 45:99-107, 1989. , R.L. A 175 ps molecular dynam17. Paulsen, W . Ornstein, ics simulation of camphor-bound cytochrome P-450,,,. Proteins 11:184-204, 1991. 18. Raag, R., Poulos, T.L. The structural basis for substrateinduced changes in the redox potential and spin equilibrium in cytochrome P-450,,,. Biochemistry 28:917-922, 1989. 19. Bass, M.B., Paulsen, M.D., Ornstein, R.L. Substrate mobility in a deeply buried active site: Analysis of norcamphor bound to cytochrome P-450,,, as determined by a 201 psec molecular dynamics simulation. Proteins (in press). 20. Poulos, T.L., Finzel, B.C., Howard, A.J. High-resolution crystal structure ofcytochrome P450cam. J . Mol. Biol. 195: 687-700, 1987. 21. Paulsen, M.D., Bass, M.B., Ornstein, R.L. Analysis of active site motions from a 175 picosecond molecular dynamics simulation of camphor-bound cytochrome P450,,,. J . Biomolec. Struct. Dynam. 9:187-203, 1991.

A New Method for Determining Structure Ensemble: Application to a RNA Binding Di-Domain Protein.

A method to identify protein sequences that fold into a known three-dimensional structure.

Indirect use of deuterium in solution NMR studies of protein structure and hydrogen bonding.

A Novel Method of Determining the Functional Effects of a Minor Genetic Modification of a Protein.

A Chemiluminescent Protein Microarray Method for Determining the Seroglycoid Fucosylation Index.

Retrovirus purification: method that conserves envelope glycoprotein and maximizes infectivity.

Complexometric titrations of protein-bound metal ions: a method for determining binding constants.

The hydrogen bonding in the crystal structure of raffinose pentahydrate.

The role of internal packing interactions in determining the structure and stability of a protein.

Strong ionic hydrogen bonding causes a spectral isotope effect in photoactive yellow protein.

Hydrogen bonding in the crystal structure of alpha,beta-panose.

A boundary sedimentation velocity method for determining nonspecific nucleic acid-protein interaction binding parameters.

The two-hybrid system: a method to identify and clone genes for proteins that interact with a protein of interest.

A method for predicting protein-protein interaction types.

A method of searching for related literature on protein structure analysis by considering a user's intention.

Control of cytochrome c redox reactivity through off-pathway modifications in the protein hydrogen-bonding network.

Halogen Bonding versus Hydrogen Bonding: A Molecular Orbital Perspective.

Determining protein similarity by comparing hydrophobic core structure.

Prediction of the structure of a receptor-protein complex using a binary docking method.

Development and application of a label-free fluorescence method for determining the composition of gold nanoparticle-protein conjugates.

Structure, hydrogen bonding and thermal expansion of ammonium carbonate monohydrate.

Computational method to identify druggable binding sites that target protein-protein interactions.

Fluorescence from Multiple Chromophore Hydrogen-Bonding States in the Far-Red Protein TagRFP675.

Hydrogen-hydrogen bonding: the current density perspective.