Research Article

Proteins: Structure, Function and Bioinformatics DOI 10.1002/prot.24590

Burial of Nonpolar Surface Area and Thermodynamic Stabilization of Globins as a Function of Chain Elongation Theodore S. Jennaro, Matthew R. Beaty, Neșe Kurt-Yilmaz#, Benjamin L. Luskin, Silvia Cavagnero* Department of Chemistry, University of Wisconsin-Madison Madison, WI, 53706

Running title: Role of Chain Elongation in Protein Folding

Key words: protein folding, hydrophobic effect, folding entropy, free energy, protein biosynthesis

#

Present Address: Department of Biochemistry and Molecular Pharmacology,

University of Massachusetts Medical School, 364 Plantation Street, Worcester MA, 01605-2324

* Correspondence to: Silvia Cavagnero, Department of Chemistry, University of WisconsinMadison, 1101 University Avenue, Madison, Wisconsin 53706, USA, Phone: 608-262-5430, Fax: 608-262-991 8, Email: [email protected]

This article has been accepted for publication and undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process which may lead to differences between this version and the Version of Record. Please cite this article as an ‘Accepted Article’, doi: 10.1002/prot.24590 © 2014 Wiley Periodicals, Inc. Received: Feb 11, 2014; Revised: Apr 01, 2014; Accepted: Apr 12, 2014

PROTEINS: Structure, Function, and Bioinformatics

Abstract Proteins are biosynthesized from N to C terminus before they depart from the ribosome and reach their bioactive state in the cell. At present, very little is known about the evolution of conformation and the free energy of the nascent protein with chain elongation. These parameters critically affect the extent of folding during ribosome-assisted biosynthesis. Here, we address the impact of vectorial amino acid addition on the burial of nonpolar surface area and on the free energy of native-like structure formation in the absence of the ribosomal machinery. We focus on computational predictions on proteins bearing the globin fold, which is known to encompass the 3/3, 2/2, and archaeal subclasses. We find that the burial of nonpolar surface increases progressively with chain elongation, leading to native-like conformations, especially upon addition of the last C-terminal residues, corresponding to incorporation of the last two helices. Additionally, the predicted folding entropy for generating native-like structures becomes less unfavorable at nearly-complete chain lengths, suggesting a link between the late burial of nonpolar surface and water release. Finally, the predicted folding free energy takes a progressive favorable dip towards more negative values, as the chain gets longer. These results suggest that thermodynamic stabilization of the native structure in newly synthesized globins is significantly enhanced as the chain elongates, especially upon exit of the last C-terminal residues from the ribosomal tunnel (hosting ca. 30-40 residues). Hence, we propose that release from the ribosome is a crucial step in the life of single-domain proteins in the cell.

-2John Wiley & Sons, Inc.

Page 2 of 40

Page 3 of 40

PROTEINS: Structure, Function, and Bioinformatics

Introduction In order to understand how proteins fold in the cellular environment during and after biosynthesis, it is important to assess how the conformation of the protein chain responds to the progressive addition of amino acids to the chain, from N to C terminus. Experimental studies on purified C-terminally truncated protein chains of increasing length showed that the incomplete chain is either soluble or highly aggregation-prone, at low µM concentrations. In the case of soluble C-terminally truncated fragments (e.g., for staphylococcal nuclease1-4, staphylococcal nuclease R5,6, barnase7,8 and CI29,10), it was possible to experimentally assess both the secondary and tertiary structure of the monomeric C-terminally truncated polypeptides. These studies led to the overall conclusion that C-terminally truncated polypeptides are generally unstructured, and are only able to adopt a compact conformation and gain most of their secondary structure during the latest stages of chain elongation (≥ 80% of chain length)11. In the case of staphylococcal nuclease (SNase), a 149-residue protein, it was possible to specifically discriminate the stages of chain elongation corresponding to the incorporation of secondary and tertiary structure. A construct of this protein lacking thirteen Cterminal amino acids (SNase136) was found to be compact but lacked ca. 50% of the secondary structure2. Further elongation by only six residues (SNase142) led to incorporation of all the remaining secondary structure into the already compact chain12. Alternatively, experiments with C-terminally-truncated fragments with a higher nonpolar content (e.g., sperm whale apomyoglobin) generate soluble and insoluble aggregates13,14 burying nonpolar groups away from the solvent11. Elucidation of the intrinsic conformational trends of the monomeric species requires high dilution, which minimizes aggregation. However, these

-3John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

types of experiments are practically unfeasible due to the difficulty in gaining sufficiently sensitive spectroscopic readouts. In addition, all the proteins analyzed to date share one common trend, identified computationally. Namely, the burial of nonpolar surface to generate the native state becomes significantly more effective upon addition of the last C-terminal residues to the growing polypeptide15. This finding was established for four well-known proteins, and it was found to apply regardless of the incomplete chain’s tendency to fold intramolecularly or aggregate15. This result embodies the fact that the hydrophobic effect16-19 becomes particularly effective towards the end of the chain elongation process. Despite the past progress considerable challenges remain, before one can fully appreciate the significance of nonpolar surface burial for in vivo protein folding as a function of chain elongation. For instance, the generality of this concept has not been tested yet, and it is necessary to extend computational and experimental studies to a larger variety of proteins highly represented in Nature. Further, the energetic contribution of nonpolar surface burial as the polypeptide chain gets longer is unknown. This study takes initial steps towards addressing the above lack of knowledge. We focus on a systematic computational analysis of the chain-elongation dependence of the degree of nonpolar surface burial, and target proteins belonging to the globin fold (Fig. 1). This particular fold was selected because it is functionally essential20, ubiquitous21, and the folding mechanism of proteins belonging to the apo form of this structural class has been extensively studied both in vitro22,23 and in cell-relevant environments24-26. We find that there is a strong tendency to preferentially bury nonpolar surface towards the end of the chain elongation process for all members of the three known classes of globins: 3/3, 2/2 and archaeal. Further, the tendency to

-4John Wiley & Sons, Inc.

Page 4 of 40

Page 5 of 40

PROTEINS: Structure, Function, and Bioinformatics

bury nonpolar surface upon addition of the last C-terminal amino acids is paralleled by a significant decrease in the predicted folding free energy. The observed similarities across globins from a variety of organisms from all kingdoms of life suggest that the trend to bury nonpolar surface to generate a stable native-like state is evolutionarily conserved. Our results highlight the fact that adding the last C-terminal residues to a vectorially elongating protein chain is energetically advantageous. In essence, globins that are missing more than about 30-40 Cterminal residues are unlikely to assume a stable native-like fold. The same conclusion applies to the case of N-terminal truncations. These results suggest that the folding steps following exit of the nascent chain from the ribosomal tunnel may be of key importance in protein folding.

Methods Criteria for the selection of globins. Globins from all three kingdoms of life were examined in this study and were selected according to the following criteria. All chosen globins27-38,39 ,40-46 were required to be wild-type species and have a crystal structure available in the Protein Data Bank (PDB47) with a resolution of 2.4 Å or better. In addition, only globins that are (i) singledomain and (ii) monomeric in solution were selected. Criterion i was waived for the 3-3 globin from E. coli, apoHmpH48, which is the N-terminal domain of the 3-domain flavohemoglobin (also known as Hmp49), and criterion ii for the archaeal globin from Methanosarcina acetivorans27. ApoHmpH was retained in light of its emerging importance as a model globin for folding studies48,50, its emerging functional significance51, and the fact that it does not have extensive interactions with the other domains. The protein from Methanosarcina acetivorans was retained in the analysis because it is the only archaeal globin with an available structure, and hence an important element of comparison with globins from other kingdoms. Three-

-5John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 6 of 40

dimensional images of representative 3/3 and 2/2 globins, and the archaeal globin from Methanosarcina acetivorans are provided in Figure 1. A comprehensive list of all the globins selected for this study is provided in Table S1. Calculation of mean net charge and mean hydrophobicity: Mean net charge per residue values were determined by dividing the net charge of each globin by the total number of amino acids. The net charge was computed upon assigning a value of +1 to each Arg and Lys, and a value of -1 to each Asp and Glu. The net charge of the remaining amino acids was regarded as negligible, on a first order approximation, at the physiologically relevant pH of 7.4. Protein hydrophobicity values were assessed according to Kyte and Doolitle52. A 5residue sliding window was used, and the resulting values were normalized on a scale from 0 to 1. Values for each residue were summed and the resulting quantity was divided by the total number of amino acids minus four. As proposed by Uversky et. al53, mean net charge per residue (MNC) and mean hydrophobicity per residue (MH) values enable discriminating folded from intrinsically disordered (IDP) proteins based on amino acid sequence alone. The curve dividing the two groups53, shown as a solid line in Figure 2, is defined as |MNC| = 2.743 × MH − 1.109 ,

(1)

Boundary margins with distance of ±0.045 from the above curve denote effective error bars for ordered and disordered proteins, respectively53. These regions are defined by the lines |MNC| = 2.743 × MH - 1.225 ,

(2)

|MNC| = 2.743 × MH - 0.993 ,

(3)

-6John Wiley & Sons, Inc.

Page 7 of 40

PROTEINS: Structure, Function, and Bioinformatics

and correspond to the dashed traces in Figure 2. Proteins lying outside of the region defined by eqns (2) and (3) were determined53 to be intrinsically disordered (left region) or independently folded (right region) with 95% and 97% confidence, respectively. Determination of fractions of nonpolar solvent accessible surface area and related values. The program Surface Racer54 (version 5.0) was employed to compute fractions of nonpolar solvent accessible surface area (fNSASA) and related values. The data set of Richmond and Richards55 was used for the van der Waals atomic radii. The water-molecule probe was assigned a radius of 1.4-1.5 Å. We verified that changes in probe size within this range have a negligible effect (0.55% or less) on NSASA. The fNSASA reported in Figure 3 is defined as the ratio of the nonpolar surface area to the total solvent-accessible surface area at any given chain length. Relative differences in fNSASA were evaluated upon subtracting the fNSASA of the native structure from the corresponding value for a hypothetical fully extended unfolded chain, at each chain length, followed by dividing the resulting value by the fNSASA of the fully extended chain. PDB files of the native-like protein fragments were generated upon progressively trimming the PDB file of the corresponding full-length proteins every three residues. The information not required by SurfRacer was manually removed from the original PDB files. The software package PyMOL (version 1.2r2, Schrödinger, LLC) was used for structure visualization and for generating the PDB files for the fully extended protein chains. The initial and final residues of each helix in Figure 5 were deduced from the appropriate PDB files. Assessment of folding entropy. The standard entropy of folding (∆Sf°) was determined from NSASA values and known relations as follows. ∆Sf° is defined as the sum of the entropy corresponding to the burial of nonpolar surface upon folding (∆Snp°) and the configurational chain entropy change upon folding (∆Sch°), according to

-7John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

∆Sf° ≈ ∆Snp° + ∆Sch° .

Page 8 of 40

(4)

It was shown that a reasonable estimate of the configurational entropy change upon folding is ∆Sch° ≈ -5.6 cal mol−1 K−1 per residue56-58. ∆Snp˚ can be assessed from the NSASA buried upon folding59, the change in nonpolar surface area ∆Anp in Å2, according to T

∆Snp˚ = (0.32 x ∆Anp˚) ln(T ) ,

(5)

s

where T is equal to 298 K and Ts is a reference temperature corresponding to ∆Snp˚ equal to zero. Ts was set to the known typical value for proteins of 386 K59. Estimation of free energy of folding. The software FoldX60 (version 2.5.2) was used to estimate the free energies of extended and native-state globins as a function of percent chain length, upon progressive elongation from N to C terminus. While FoldX is often employed to evaluate variations in free energy resulting from mutations or following protein-protein or protein-nucleic acid interactions, the program was used here to compute variations in free energies as amino acids are added to the polypeptide chain. Once free energy values formally defined as folding free energies were computed for both fully-extended (GU) and native-like folded (GN) states, starting from corresponding PDB files generated as described in the NSASA section, folding free energy differences (∆Gf°) were assessed as ∆Gf° = (GN°) - (GU°) .

(6)

Note that, in principle, the raw FoldX output (denoted in the original publication as folding free energy60) is supposed to represent the free energy difference between the given structure and a corresponding fully unfolded state. However, we noticed that the GU values generated in this way are not generally equal to zero and display a linear chain-length-dependence. To correct for this FoldX artifact, we proceeded to compute folding free energy differences ∆Gf°, as described

-8John Wiley & Sons, Inc.

Page 9 of 40

PROTEINS: Structure, Function, and Bioinformatics

above. PDB files for both the fully-extended and folded protein chains were generated with 3residue step-size increments. Temperature was set to 298 K for all the FoldX calculations.

Results and Discussion Properties of native globins. Globins are functionally essential all-helical proteins. The globin fold is ubiquitous, given that it is widely represented across living organisms belonging to all three kingdoms of life21. Three structurally related subclasses were identified so far, denoted as 3/3 globins21, 2/2 globins61 (formerly known as truncated globins), and archaeal globins62. Representative structures of the three classes of globins are shown in Figure 1. The canonical 3/3 globin fold comprises eight helices denoted A to H, in alphabetical order. The 3/3 globins are larger in size than the 2/2 globins. All 2/2 globins, except for the globin from Campylobacter jejuni (PDB ID: 2IG3), have the same helices as the 3/3 globin counterpart. However, the 2/2 globins have helices with a smaller average length. The only archaeal globin whose crystal structure has been determined so far, from Methanosarcina acetivorans, resembles the 3-3 fold except that it bears two additional helices, denoted as Z and H’, and it lacks the D helix27. The criteria adopted for the selection of the specific globins studied here are outlined in the Methods section. To test the overall properties of the globin amino acid sequence, we computed the average net charge per residue and the average hydrophobicity per residue of all the globins selected for this study. The results are plotted in Figure 2, which shows that all globins are characterized by a small positive or negative net charge per residue ( ≤ 0.06) and a relatively high hydrophobicity per residue (ca. 0.47±0.03). The solid line in the graph separates the region typical of intrinsically disordered (IDP) proteins (on the left of the solid line) from the region typical of fully folded proteins (on the right of the solid line)53. The plot shows that all the

-9John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 10 of 40

globins display typical properties of folded proteins, and not IDPs (Fig. 2a), consistent with the functional role of globins. On the other hand, panel b of Figure 2 shows that the net charge of globins is variable and can assume either positive or negative values. All examined prokaryotic and archaeal globins have a net negative charge while eukaryotic globins may bear either a net positive or negative charge. Overall, the data show that the globins have typical properties of folded, well-behaved proteins. Changes in fraction of NSASA as a function of protein chain elongation. In a previous study on four representative proteins from different folds, we showed that, as chain elongation nears completion, there is a dramatic increase in the degree of burial of nonpolar solvent-accessible surface area (NSASA) for the generation of the native conformation15. Here, we show that overall similar trends are followed by several proteins belonging to the globin fold, with some interesting nuances. Plots of the fraction of nonpolar solvent-accessible surface area (fNSASA) of fully extended chains (regarded as models for the unfolded state) and native-like conformations of different percent chain length are shown in Figure 3, for three representative globins from different subclasses. Importantly, the choice of fully extended and native-like as the two limiting conformations is not meant to imply that these are the only two relevant species populated in solution. In fact, actual experiments show that several C-terminally truncated forms of a mammalian globin aggregate, in the absence of the ribosomal machinery, even at moderate concentrations14. Indeed, C-terminally truncated proteins may also populate non-native monomeric conformations. Moreover, partial aggregation is also observed experimentally upon refolding of the full-length globin from denaturant, again at moderate concentrations63. Here, as well as in a prior study of more limited scope

15

, we focus on the driving force for the

- 10 John Wiley & Sons, Inc.

Page 11 of 40

PROTEINS: Structure, Function, and Bioinformatics

consolidation of a stable monomeric native fold as a function of chain elongation. In other words, we seek to identify the physical effects that are likely to stabilize the native state as the percent chain length increases, regardless of the other species that may also be present in solution. The graphs clearly show a tendency to more effectively bury NSASA towards the end of the chain. The 3-3 globins display this trend only upon addition of the last C-terminal residues, while the 2-2 and archaeal globins show a more gradual tendency to bury NSASA that starts at approximately 50% chain length. This result suggests that proteins tend to become folded due to the hydrophobic effect only late during the chain elongation process, as they are synthesized on the ribosome. The average relative differences between the fNSASA of native-like structures and the corresponding values for the extended state are plotted in Figure 4 for all three types of globins. Plots of individual proteins can be found in supplementary Figure S1. All three types of globins experience similar trends. First, the relative difference in fNSASA increases until approximately 20% of the chain length is generated. This trend reflects the fact that the early stages of chain elongation progressively expose more and more NSASA, given that the y axis has positive values (gray traces) and positive slopes. Second, more NSASA is then progressively buried in the native state (negative slopes), yet the native-like conformations are overall more solventexposed than the unfolded state (positive y values, gray traces). This trend supports no hydrophobic driving force for folding until a stalling point is reached, at approximately 50% chain length. Third, late in chain elongation all three types of globins experience a large decrease in average relative difference fNSASA (negative y values, black traces). The latter trend reflects the preferential burial of NSASA in the folded state, as the second half of the protein chain is

- 11 John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 12 of 40

generated. The above observations suggest that translation of the second half of the globin chain is a major natural driving force for the burial of NSASA. This trend likely helps the ultimate adoption of a fully native conformation upon chain completion. A plot of the residue-specific chain hydrophobicity of all the full-length globins studied here, reported in Figures S2 and S3 as average buried area and fractional average buried area64, respectively, shows that the overall nonpolar content of the globins is distributed approximately uniformly throughout the sequence. While the C-terminal region has a high hydrophobicity for most of the globins, its value does not exceed that of the average hydrophobicity of the rest of the chain. Therefore, we conclude that the particularly effective burial of nonpolar surface observed here for the C-terminal region (Figs. 3 and 4) is not merely the result of the nonpolar character of the globin sequence. Hence, it must be a consequence of the globin threedimensional structure. Previous studies on other globular proteins belonging to different folds 15 showed that an overall similar trend is also observed, though it is not as pronounced as in the case of the globins. Changes in fraction of NSASA upon inclusion of residues corresponding to individual helices. The fNSASA is plotted against the characteristic helices of the major globins in Figure 5. Plots of the individual globins can be found in Figure S4. This analysis was carried out to assess the burial of NSASA per helix across the globin fold. This procedure enables inspection of the specific helices characterized by burial or exposure of NSASA. The helices which show an average increase in the fraction of NSASA are not consistent between globin subclasses. This result can be explained by the fact that specific helices vary in average length and other properties for different types of globin. For instance, the A helix of 3-3 globins is on average 2.4 times larger than an A helix of 2-2 globin. More importantly, across the globin fold helices G

- 12 John Wiley & Sons, Inc.

Page 13 of 40

PROTEINS: Structure, Function, and Bioinformatics

and H show a stabilizing effect marked by a decrease in the difference of fraction of NSASA. These findings reinforce those of Figure 4, in that the decrease in fraction of NSASA occurs late in translation. The stabilizing effects of the C-terminal helices are most prominent in the 3-3 globins and the archaeal globin from Methanosarcina acetivorans, whose helices E and F, respectively, also show significant burial of nonpolar surface. Although the 2-2 globins exhibit a decrease in fNSASA for these helices, it is slightly less pronounced. We propose that this is due to the truncated nature of the 2-2 globins. On average the 3-3 globin’s H helix is approximately twice the size of the 2-2 globins while the G helix is 12% larger in the 3-3 globins. Because the terminal helices are shorter, changes across the helix are less prominent in the 2-2 globins. In any case, the overall trend of a large decrease in the difference in fNSASA late in chain elongation, shown in Figures 4 and 5, holds true for the 2-2 globins. Changes in the predicted folding entropy of the growing chain. The predicted folding entropy changes (∆Sf°) were estimated as a function of percent chain length and averaged among the members of each of the three globin subclasses, as shown in Figure 6. As the chain elongates and approaches approximately 80% of its full-length value, there is an inversion of ∆Sf° slope, from negative to positive. This trend is particularly evident for the 3-3 and archaeal globins, with the 2-2 globins showing the same effect to a milder degree. The rise in ∆Sf° is governed by the favorable burial of NSASA during the late stages of chain elongation (Figure 3) and is consistent with the decrease in the relative difference in fNSASA observed at long chain lengths in Figures 4 and 5. Due to the hydrophobic effect, hydration water is known to be released from a folding protein chain, as nonpolar surface area is buried65. This effect is expected to lower the negative contribution to the observed folding entropy due to the decrease in chain entropy upon folding65. Hence our data suggest that the globin chain is capable of undergoing substantial dehydration

- 13 John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 14 of 40

and folding during the late stages of chain elongation. The above findings highlight the importance of NSASA burial for the entropic contribution of protein folding energetics as a function of chain elongation, across the globin fold. Changes in the predicted free energy of the growing chain. Figure 7 shows the predicted standard-state Gibbs folding free energies (G°) of the fully extended and native chain against chain elongation for the three representative structures of the globin subclasses. The G° of the extended chain increases approximately linearly as a function of chain elongation, while the G° of the native-like state decreases progressively to a much more moderate degree. At approximately 30-40% chain elongation, the native state begins an overall trend of increasingly more favorable folding free energies. This trend is best illustrated in Figure 8, which shows the corresponding Gibbs free energy differences (∆Gf°) as a function of percent chain length, averaged across each globin subclass. At short chain lengths, the predicted ∆Gf° for all globin subclasses decreases linearly. To guide the eye, this linear trend, encompassing ca. 0-30% of the chain, is represented by a gray dashed line in Figure 8. As the chain elongates further, ∆Gf° progressively dips to more favorable negative values. Figures 7 and 8 illustrate the predicted instability of the extended state as a function of percent chain length and the progressive thermodynamic stabilization of the protein chain as more amino acids are added to the chain. Due to the many factors that contribute to protein stability66, it is difficult to accurately predict Gibbs free energies of proteins computationally67. Despite this important caveat, the software FoldX employed here was shown to perform well at estimating free energy differences between two well-defined structures60,67. FoldX is often used to assess changes in free energy upon point mutations. It was shown to predict non-alanine mutations better than other packages of similar nature and accurately predict experimental ∆∆G˚ trends upon generic point

- 14 John Wiley & Sons, Inc.

Page 15 of 40

PROTEINS: Structure, Function, and Bioinformatics

mutations67. Energy differences between wild type and mutant protein can be predicted because the structures are similar, differing by only one residue. Although the individual free energies for the two structures may not be accurate, the difference between the two values likely to reflects underlying trends. Specifically, given that (a) both folded and unfolded states of well-defined representative structure are evaluated at each chain length, and (b) the near-neighbor free energy difference progressions evaluated here pertain to structures of very similar chain length (differing by only a few residues), we propose that the FoldX predictions shown here provide reasonable 1st order estimates of folding free energy variations as the protein chain elongates. All parameters within FoldX were held constant throughout, and only proteins with high resolution crystal structures (2.4 ș or better) were employed. These procedures were shown to improve FoldX free energy calculations and minimize artifacts60. The observed folding free energy trends are consistent with the results of previous studies, which identified specific variations in the nature of the noncovalent contacts close to the chain termini68,69. In conclusion, even though the exact free energy values and free energy differences cannot be taken as numerically exact, the observed trends highlight the importance of chain elongation for the progressive stabilization of the native fold. Influence of chain-elongation directionality. The vectorial nature of protein biosynthesis demands a unique protein chain elongation direction, from N to C terminus. While an assessment of the effect of variations in directionality would be exceedingly difficult experimentally, requiring a complete remodeling of the translation machinery, variations in chain elongation direction can easily be assessed computationally. Figure 9 illustrates the effect of changes in chain directionality, from N to C and from C to N, for most of the parameters evaluated in this study. A representative exponent of the globin fold, leghemoglobin A from Glycine max, has

- 15 John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 16 of 40

been selected for this assessment. The data clearly shows that the chain elongation directionality bears no effect on any of the predicted trends, consistent with previous predictions on a different globin and three other protein folds15. Hence it is clear that the observed trends are not a consequence of the specific directionality of protein biosynthesis dictated by the ribosomal machinery. Relations with the in vitro refolding of full-length globins. This work addresses the driving forces for the consolidation of the native fold as incomplete globin chains elongate. Our findings are therefore not directly applicable to the extensive literature on the folding of full-length globins 22,23,70-75, which have a complete amino acid sequence. On the other hand, it is helpful to draw some comparisons. The refolding kinetics of full-length apomyoglobin from sperm whale is characterized by a compact76, obligatory77 fast-forming71 intermediate populated on the sub-ms timescale at ~5°C and involving residues that are particularly effective at burying nonpolar surface78,79. These residues belong to the A, B, G and H helices72. When the intrinsic helicity of residues in the H region is decreased, these residues fold more slowly80. However, the protein maintains a highly populated molten globular intermediate, which presumably preserves its overall compaction even across the H helical region, probably due to the persisting presence of nonpolar residues in the H region81. Globins from other organisms, e.g., soybean leghemoglobin, form an early intermediate comprising the G and H helical regions and a few residues belonging to helix E82. Similarly, horse apomyoglobin is believed to populate an early intermediate encompassing the A, G and H helical regions22,83. The apo form of the E. coli globin apoHmpH folds fast48 and populates early intermediates including the C-terminal region of the chain50.

- 16 John Wiley & Sons, Inc.

Page 17 of 40

PROTEINS: Structure, Function, and Bioinformatics

In summary, the above trends suggest that the C-terminal region of the globin sequence, comprising residues belonging to the G and H helices, is necessary for the early stages of the globin fold formation. Now, we show in this study that the native globin fold becomes selectively stabilized when residues belonging to the C-terminal region are added to the incomplete chain. Hence, we conclude that C-terminal residues are especially important to generate the native globin fold under two very different situations; namely (a) upon refolding of the full-length chain from denaturant and (b) upon progressive structure formation as the chain elongates. In this context, it is worth mentioning that N and C termini are close to each other in native globins as well as in many other single-domain proteins84. In addition, the contributions to the contact order and contact breadth parameters68 due to the chain termini are high, for the globin fold68. Hence it is tempting to suggest that the chain termini, with their significant ability to bury nonpolar surface, may be particularly important for establishing the globin topology and for the cooperative formation of the native globin fold85. Implications for protein folding in the cell. This study highlights the importance of the progressive chain elongation, particularly the last C-terminal residues, for the folding of nascent proteins. The ribosomal tunnel hosts 30-40 amino acids86,87, which correspond approximately to the last two helices of the globin fold. We found that these very last 30-40 C-terminal residues impart a significant thermodynamic stabilization to the native protein, once the full-length chain has been synthesized and has departed from the ribosome. This thermodynamic stabilization is achieved by a particularly effective burial of nonpolar surface and by an incomplete enthalpyentropy compensation (leading to a less ufavorable folding entropy) upon incorporation of the

- 17 John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 18 of 40

last C-terminal residues. Importantly, these residues become available for folding only after protein biosynthesis has ended, as the fully synthesized chain departs from the ribosome. One needs to take into account the fact that, in the cellular environment, the C terminus of the nascent chain is tethered to the ribosome and that the fraction of the nascent protein emerging out of the ribosomal tunnel may interact with the highly charged surface of the ribosome. The latter effect is not considered in this study. Hence it is fair to wonder whether interactions of the nascent protein with the ribosomal surface may modify the predicted trends. Experimental studies by Ellis et al.25,26 showed that the full-ribosome-associated nascent globins longer than 57 residues are compact and their local motions hint at no interactions with the ribosomal surface. Further, this protein undergoes an additional conformational change upon departure from the ribosome, consistent with the computational predictions reported here. On the other hand, Knight et al.88 showed that the natively unfolded nascent protein PIR and its mutants bearing a variable net charge appear to interact with the ribosomal surface quite extensively. Hence, overall, the available experimental data on cotranslational folding of globins suggest that interactions with the ribosomal surface are only relevant for folding-incompetent sequences or very thermodynamically unstable short sequences derived from folding-competent proteins. In addition, our model does not take nonnative conformations into account. While nonnative structures may indeed be generated during the cotranslational folding of globins, their identification is clearly nontrivial and no evidence for such conformations exists to date. Hence our model is to be regarded as a proposed first-order approximation in case nonnative conformations and the ribosomal surface were not to play a significant role. Our results suggest a potential mechanism for the folding of the apo form of globins in the cellular environment, where biosynthesis of the last few amino acids enables optimal NSASA

- 18 John Wiley & Sons, Inc.

Page 19 of 40

PROTEINS: Structure, Function, and Bioinformatics

burial and the ultimate adoption of a native-like structure when the second half of the protein chain, especially the last C-terminal residues, emerge out of the ribosomal tunnel. Only upon departure from the ribosome the apo-globin fold acquires the ability to achieve a stable nativelike fold. Our findings are consistent with the fact that heme cofactor incorporation may occur cotranslationally in Nature, at chain lengths of ca. 90 residues or longer89. At this stage of biosynthesis, the structure of the apo form of the protein has not yet acquired a significant stability and it is therefore competent to incorporate the heme cofactor. Heme incorporation may impart additional thermodynamic stabilization to the fully synthesized protein chain cotranslationally, before the last C-terminal amino acids have emerged out of the ribosomal tunnel. Hence cotranslational heme incorporation may help preventing competing aggregation processes that are likely to lower yields of bioactive protein upon departure from the ribosome. In all, the scenario presented in this work suggests cotranslational events that might be taking place in the cell in the absence of significant interactions with the ribosome, for foldingcompetent protein sequences longer than ca. 60 residues. The main take-home lesson here is the dramatic stabilization of the native globin fold expected upon release of the full-length nascent protein from the ribosome.

Conclusions This work shows that the late stages of globin chain elongation are characterized by particularly effective NSASA burial across the entire globin fold. We found that burial of nonpolar surface to generate native-like structures starts at approximately half of the globin chain length. In addition, we estimated the thermodynamic stability of incomplete globin chains of

- 19 John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 20 of 40

progressive length, and found a large predicted decrease in folding free energy (hence a large increase in thermodynamic stability) late in chain elongation. These results suggest that, in the cellular environment, proteins belonging to the globin fold may only reach a stable native-like structure once they have fully departed from the ribosome, in the absence of the heme cofactor. The overall similarity in the trends observed across the three globin subclasses suggests that the burial of nonpolar surface and the progressive structural stabilization with chain elongation are structural features conserved throughout evolution. Therefore, structural instability at short chain lengths and generation of a native-like stable fold at high percent lengths may have contributed significant evolutionary pressure across the globin fold, possibly with the intent of preventing deleterious nascent protein aggregation upon departure from the ribosome. The trends highlighted in this work may share some common features with proteins bearing different folds, including multi-domain proteins. In the latter case, it is expected that chain-dependent burial of nonpolar surface and thermodynamic stabilization occurs upon completion of each individual domain and/or upon establishment of inter-domain contacts. Additional studies are required to shed light on this topic. Conveniently, the methodology adopted in this study is entirely general and it can be readily applied to any protein of interest.

Acknowledgments We are grateful to Leona Bergmann for technical assistance, to Wade Hanson for preparing Figure S6 as part of his UW-Madison Pre-College Enrichment Opportunity Program for Learning Excellence (PEOPLE) summer research experience, and to Rudy Clausen for sharing a Python script. We thank Rayna Addabbo and Jenna Becker for a critical reading of the

- 20 John Wiley & Sons, Inc.

Page 21 of 40

PROTEINS: Structure, Function, and Bioinformatics

manuscript. This research was supported by NSF grant MCB-0951209 (to S.C.). M.R.B. was the recipient of Research Experience for Undergraduates (REU) funds from NSF.

- 21 John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

References

1. 2.

3.

4.

5.

6.

7. 8.

9.

10.

11.

12.

13.

14.

Taniuchi H, Anfinsen CB. An experimental approach to the study of the folding of staphylococcal nuclease. The Journal of Biological Chemistry 1969;244(14):3864-3875. Flanagan JM, Kataoka M, Shortle D, Engelman DM. Truncated staphylococcal nuclease is compact but disordered. Proceedings of the National Academy of Sciences USA 1992;89:748-752. Feng Y, Dongsheng L, Wang J. Native-like partially folded conformations and folding process revealed in the N-terminal large fragments of staphylococcal nuclease: A study by NMR spectroscopy. Journal of Molecular Biology 2003;330:821-837. Hirano S, Mihara K, Yamazaki Y, Kamikubo H, Imamoto Y, Kataoka M. Role of Cterminal region of staphylococcal nuclease for foldability, stability, and activity. Proteins: Structure, Function, and Genetics 2002;49:255-265. Jing G, Zhou B, Xie L, Li-jun L, Liu Z. Comparative studies of the conformation of the N-terminal fragments of staphylococcal nuclease R in solution. Biochimica et Biophysica Acta 1995;1250:189-196. Tian K, Zhou B, Geng F, Jing G. Folding of SNase R begins early during synthesis: the conformational feature of two short N-terminal fragments of staphylococcal nuclease R. International Journal of Biological Macromolecules 1998;23:199-206. Neira JL, Fersht AR. Acquisition of native-like interactions in C-terminal fragments of barnase. Journal of Molecular Biology 1999;287:421-432. Sancho J, Neira JL, Fersht AR. An N-terminal fragment of barnase has residual helical structure similar to that in a refolding intermediate. Journal of Molecular Biology 1992;224:749-758. de Prat Gay G, Ruiz-Sanz J, Neira JL, Corrales FJ, Otzen DE, Ladurner AG, Fersht AR. Conformational pathway of the polypeptide chain of chymotrypsin inhibitor-2 growing from its N terminus in vitro. Parallels with the protein folding pathway. Journal of Molecular Biology 1995;254:968-979. de Prat Gay G. Spectroscopic characterization of the growing polypeptide chain of the barley chymotrypsin inhibitor-2. Archives of Biochemistry and Biophysics 1996;335(1):1-7. Cavagnero S, Kurt N. Folding and misfolding as a function of polypeptide chain elongation: conformational trends and implications for intracellular events. In: Tsai AM, editor. Misbehaving Proteins: Protein (Mis)Folding, Aggregation and Stability. Springer; 2006. p 217-246. Hirano S, Mihara K, Yamazaki Y, Kamikubo H, Imamoto Y, Kataoka M. Role of Cterminal region of Staphylococcal nuclease for foldability, stability, and activity. Proteins-Structure Function and Bioinformatics 2002;49(2):255-265. Kurt N, Rajagopalan S, Cavagnero S. Effect of Hsp70 chaperone on the folding and misfolding of polypeptides modeling an elongating protein chain. Journal of Molecular Biology 2006;355(4):809-820. Chow CC, Chow C, Rhagunathan V, Huppert T, Kimball E, Cavagnero S. The chain length dependence of apomyoglobin folding: structural evolution from misfolded sheets to native helices. Biochemistry 2003;42(23):7090-7099. - 22 John Wiley & Sons, Inc.

Page 22 of 40

Page 23 of 40

PROTEINS: Structure, Function, and Bioinformatics

15.

16. 17. 18. 19. 20. 21. 22. 23.

24.

25.

26.

27.

28.

29.

30.

31.

Kurt N, Cavagnero S. The burial of solvent-accessible surface area is a predictor of polypeptide folding and misfolding as a function of chain elongation. J Am Chem Soc 2005;127(45):15690-1. Tanford C. The hydrophobic effect: Formation of micelles and biological membranes. New York: Wiley; 1980. Kauzmann W. Denaturation of proteins and enzymes. In: McElroy W, Glass B, editors. The mechanism of enzyme action. Batimore: The Johns Hopkins Press; 1954. p 71-110. Kauzmann W. Some factors in the interpretation of protein denaturation. Advances in Protein Chemistry 1959;14:1-63. Southall NT, Dill KA, Haymet ADJ. A view of the hydrophobic effect. Journal of Physical Chemistry B 2002;106(3):521-533. Vinogradov SN, Moens L. Diversity of globin function: enzymatic, transport, storage, and sensing. Journal of Biological Chemistry 2008;283(14):8773-8777. Lecomte JT, Vuletich DA, Lesk AM. Structural divergence and distant relationships in proteins: evolution of the globins. Curr Opin Struct Biol 2005;15(3):290-301. Gruebele M. Downhill protein folding: evolution meets physics. Comptes Rendus Biologies 2005;328(8):701-712. Dyson HJ, Wright PE. Elucidation of the protein folding landscape by NMR. Nuclear Magnetic Resonance of Biological Macromolecules, Part C. Volume 394, Methods in Enzymology. 2005. p 299-+. Bakke CK, Jungbauer LM, Cavagnero S. In vitro expression and characterization of native apomyoglobin under low molecular crowding conditions. Protein Expression and Purification 2006;45(2):381-392. Ellis JP, Culviner PH, Cavagnero S. Confined dynamics of a ribosome-bound nascent globin: Cone angle analysis of fluorescence depolarization decays in the presence of two local motions. Protein Science 2009;18:2003-2015. Ellis JP, Bakke CK, Kirchdoerfer RN, Jungbauer LM, Cavagnero S. Chain Dynamics of Nascent Polypeptides Emerging from the Ribosome. ACS Chem. Biol. 2008;3(9):555566. Nardini M, Pesce A, Thijs L, Saito JA, Dewilde S, Alam M, Ascenzi P, Coletta M, Ciaccio C, Moens L and others. Archaeal protoglobin structure indicates new ligand diffusion paths and modulation of haem-reactivity. EMBO Rep 2008;9(2):157-63. Giangiacomo L, Ilari A, Boffi A, Morea V, Chiancone E. The truncated oxygen-avid hemoglobin from Bacillus subtilis: X-ray structure and ligand binding properties. J Biol Chem 2005;280(10):9192-202. Milani M, Pesce A, Ouellet Y, Dewilde S, Friedman J, Ascenzi P, Guertin M, Bolognesi M. Heme-ligand tunneling in group I truncated hemoglobins. J Biol Chem 2004;279(20):21520-5. Trent JT, 3rd, Kundu S, Hoy JA, Hargrove MS. Crystallographic analysis of synechocystis cyanoglobin reveals the structural changes accompanying ligand binding in a hexacoordinate hemoglobin. J Mol Biol 2004;341(4):1097-108. Nardini M, Pesce A, Labarre M, Richard C, Bolli A, Ascenzi P, Guertin M, Bolognesi M. Structural determinants in the group III truncated hemoglobin from Campylobacter jejuni. J Biol Chem 2006;281(49):37803-12.

- 23 John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

32.

33.

34.

35.

36.

37.

38. 39.

40.

41. 42.

43.

44.

45.

Pesce A, Couture M, Dewilde S, Guertin M, Yamauchi K, Ascenzi P, Moens L, Bolognesi M. A novel two-over-two alpha-helical sandwich fold is characteristic of the truncated hemoglobin family. Embo J 2000;19(11):2424-34. Moschetti T, Mueller U, Schulze J, Brunori M, Vallone B. The structure of neuroglobin at high Xe and Kr pressure reveals partial conservation of globin internal cavities. Biophys J 2009;97(6):1700-8. Shepherd M, Barynin V, Lu C, Bernhardt PV, Wu G, Yeh SR, Egawa T, Sedelnikova SE, Rice DW, Wilson JL and others. The single-domain globin from the pathogenic bacterium Campylobacter jejuni: novel D-helix conformation, proximal hydrogen bonding that influences ligand binding, and peroxidase-like redox properties. J Biol Chem 2010;285(17):12747-54. Schreiter ER, Rodriguez MM, Weichsel A, Montfort WR, Bonaventura J. Snitrosylation-induced conformational change in blackfin tuna myoglobin. J Biol Chem 2007;282(27):19773-80. Harutyunyan EH, Safonova TN, Kuranova IP, Popov AN, Teplyakov AV, Obmolova GV, Rusakov AA, Vainshtein BK, Dodson GG, Wilson JC and others. The structure of deoxy- and oxy-leghaemoglobin from lupin. J Mol Biol 1995;251(1):104-15. Evans PA, Kautz RA, Fox RO, Dobson CM. A magnetization-transfer nuclear magnetic resonance study of the folding of staphylococcal nuclease. Biochemistry 1989;28:362370. Yang F, Phillips GN, Jr. Crystal structures of CO-, deoxy- and met-myoglobins at various pH values. J Mol Biol 1996;256(4):762-74. Birnbaum GI, Evans SV, Przybylska M, Rose DR. 1.70 A resolution structure of myoglobin from yellowfin tuna. An example of a myoglobin lacking the D helix. Acta Crystallogr D Biol Crystallogr 1994;50(Pt 3):283-9. Krzywda S, Murshudov GN, Brzozowski AM, Jaskolski M, Scott EE, Klizas SA, Gibson QH, Olson JS, Wilkinson AJ. Stabilizing bound O2 in myoglobin by valine68 (E11) to asparagine substitution. Biochemistry 1998;37(45):15896-907. Scouloudi H, Baker EN. X-ray crystallographic studies of seal myoglobin. The molecule at 2.5 A resolution. J Mol Biol 1978;126(4):637-60. Nardini M, Tarricone C, Rizzi M, Lania A, Desideri A, De Sanctis G, Coletta M, Petruzzelli R, Ascenzi P, Coda A and others. Reptile heme protein structure: X-ray crystallographic study of the aquo-met and cyano-met derivatives of the loggerhead sea turtle (Caretta caretta) myoglobin at 2.0 A resolution. J Mol Biol 1995;247(3):459-65. Pesce A, Dewilde S, Kiger L, Milani M, Ascenzi P, Marden MC, Van Hauwaert ML, Vanfleteren J, Moens L, Bolognesi M. Very high resolution structure of a trematode hemoglobin displaying a TyrB10-TyrE7 heme distal residue pair and high oxygen affinity. J Mol Biol 2001;309(5):1153-64. Ilari A, Bonamore A, Farina A, Johnson KA, Boffi A. The X-ray structure of ferric Escherichia coli flavohemoglobin reveals an unexpected geometry of the distal heme pocket. J Biol Chem 2002;277(26):23725-32. Bisig DA, Di Iorio EE, Diederichs K, Winterhalter KH, Piontek K. Crystal structure of Asian elephant (Elephas maximus) cyano-metmyoglobin at 1.78-A resolution. Phe29(B10) accounts for its unusual ligand binding properties. J Biol Chem 1995;270(35):20754-62.

- 24 John Wiley & Sons, Inc.

Page 24 of 40

Page 25 of 40

PROTEINS: Structure, Function, and Bioinformatics

46.

47. 48.

49.

50.

51. 52. 53. 54.

55. 56.

57. 58. 59.

60. 61. 62.

63.

Hargrove MS, Barry JK, Brucker EA, Berry MB, Phillips GN, Jr., Olson JS, ArredondoPeter R, Dean JM, Klucas RV, Sarath G. Characterization of recombinant soybean leghemoglobin a and apolar distal histidine mutants. J Mol Biol 1997;266(5):1032-42. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Research 2000;28(1):235-242. Eun Y-J, Kurt N, Sekhar A, Cavagnero S. Thermodynamic and kinetic characterization of apoHmpH, a fast-folding bacterial globin. Journal of Molecular Biology 2008;376(3):879-897. Ilari A, Bonamore A, Farina A, Johnson KA, Boffi A. The X-ray structure of ferric Escherichia coli flavohemoglobin reveals an unexpected geometry of the distal heme pocket. Journal of Biological Chemistry 2002;277(26):23725-23732. Zhu L, Kurt N, Choi J, Lapidus LJ, Cavagnero S. Sub-millisecond Chain Collapse of the Escherichia coli Globin ApoHmpH. Journal of Physical Chemistry B 2013;117(26):78687877. Forrester MT, Foster MW. Protection from nitrosative stress: A central role for microbial flavohemoglobin. Free Radical Biology and Medicine 2012;52(9):1620-1633. Kyte J, Doolittle RF. A simple model for displaying the hydrophobic character of a protein. Journal of Molecular Biology 1982;157:105-132. Oldfield CJ, Cheng Y, Cortese MS, Brown CJ, Uversky VN, Dunker AK. Comparing and combining predictors of mostly disordered proteins. Biochemistry 2005;44(6):1989-2000. Tsodikov OV, Record MT, Jr., Sergeev YV. Novel computer program for fast exact calculation of accessible and molecular surface areas and average surface curvature. J Comput Chem 2002;23(6):600-9. Richmond TJ, Richards FM. Packing of alpha-helices: geometrical constraints and contact areas. J Mol Biol 1978;119(4):537-55. Spolar RS, Ha JH, Record MT. Hydrophobic effect in protein folding and other noncovalent processes involving proteins. Proceedings of the National Academy of Sciences USA 1989;86(21):8382-8385. Pickett SD, Sternberg MJE. Empirical scale of side-chain conformational entropy in protein folding. Journal of Molecular Biology 1993;231(3):825-839. Dill KA. Theory for the folding and stability of globular proteins. Biochemistry 1985;24(6):1501-1509. Spolar RS, Livingstone JR, Record MT. Use of liquid-hydrocarbon and amide transfer data to estimate contributions to thermodynamic functions of protein folding from the removal of nonpolar and polar surface from water. Biochemistry 1992;31(16):3947-3955. Schymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, Serrano L. The FoldX web server: an online force field. Nucleic Acids Res 2005;33(Web Server issue):W382-8. Nardini M, Pesce A, Milani M, Bolognesi M. Protein fold and structure in the truncated (2/2) globin family. Gene 2007;398(1-2):2-11. Freitas TAK, Hou SB, Dioum EM, Saito JA, Newhouse J, Gonzalez G, Gilles-Gonzalez MA, Alam M. Ancestral hemoglobins in Archaea. Proceedings of the National Academy of Sciences of the United States of America 2004;101(17):6675-6680. Chow C, Kurt N, Murphy RM, Cavagnero S. Structural characterization of apomyoglobin self-associated species in aqueous buffer and urea solution. Biophysical Journal 2006;90(1):298-309.

- 25 John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

64. 65. 66. 67.

68.

69. 70.

71.

72. 73. 74.

75.

76. 77.

78.

79. 80.

Rose GD, Geselowitz AR, Lesser GJ, Lee RH, Zehfus MH. Hydrophobicity of amino acid residues in globular proteins. Science 1985;229(4719):834-838. Southall NT, Dill KA, Haymet ADJ. A view of the hydrophobic effect. The Journal of Physical Chemistry B 2002;106(3):521-533. Dill KA. Dominant forces in protein folding. Biochemistry 1990;29(31):7133-7155. Potapov V, Cohen M, Schreiber G. Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details. Protein Eng Des Sel 2009;22(9):553-60. Kurt N, Mounce BC, Ellison PA, Cavagnero S. Residue-specific contact order and contact breadth in single-domain proteins: Implications for folding as a function of chain elongation. Biotechnology Progress 2008;24(3):570-575. Krobath H, Shakhnovich EI, Faisca PF. Structural and energetic determinants of cotranslational folding. J Chem Phys 2013;138(21):215101. Meinhold DW, Wright PE. Measurement of protein unfolding/refolding kinetics and structural characterization of hidden intermediates by NMR relaxation dispersion. Proceedings of the National Academy of Sciences of the United States of America 2011;108(22):9078-9083. Uzawa T, Nishimura C, Akiyama S, Ishimori K, Takahashi S, Dyson HJ, Wright PE. Hierarchical folding mechanism of apomyoglobin revealed by ultra-fast H/D exchange coupled with 2D NMR. Proceedings of the National Academy of Sciences of the United States of America 2008;105(37):13859-13864. Jennings PA, Wright PE. Formation of a molten globule intermediate early in the kinetic folding pathway of apomyoglobin. Science 1993;262(5135):892-896. Hughson FM, Wright PE, Baldwin RL. Structural Characterization of a Partly Folded Apomyoglobin Intermediate. Science 1990;249(4976):1544-1548. Nishimura C, Dyson HJ, Wright PE. The kinetic and equilibrium molten globule intermediates of apoleghemoglobin differ in structure. Journal of Molecular Biology 2008;378(3):715-725. Nishimura C, Dyson HJ, Wright PE. Identification of native and non-native structure in kinetic folding intermediates of apomyoglobin. Journal of Molecular Biology 2006;355(1):139-156. Eliezer D, Jennings PA, Wright PE, Doniach S, Hodgson KO, Tsuruta H. The Radius of Gyration of an Apomyoglobin Folding Intermediate. Science 1995;270(5235):487-488. Tsui V, Garcia C, Cavagnero S, Siuzdak G, Dyson HJ, Wright PE. Quench-flow experiments combined with mass spectrometry show apomyoglobin folds through an obligatory intermediate. Protein Science 1999;8(1):45-49. Dyson HJ, Wright PE, Scheraga HA. The role of hydrophobic interactions in initiation and propagation of protein folding. Proceedings of the National Academy of Sciences of the United States of America 2006;103(35):13057-13061. Nishimura C, Lietzow MA, Dyson HJ, Wright PE. Sequence determinants of a protein folding pathway. Journal of Molecular Biology 2005;351(2):383-392. Cavagnero S, Dyson HJ, Wright PE. Effect of H helix destabilizing mutations on the kinetic and equilibrium folding of apomyoglobin. Journal of Molecular Biology 1999;285(1):269-282.

- 26 John Wiley & Sons, Inc.

Page 26 of 40

Page 27 of 40

PROTEINS: Structure, Function, and Bioinformatics

81.

82. 83.

84.

85.

86. 87. 88.

89.

Cavagnero S, Nishimura C, Schwarzinger S, Dyson HJ, Wright PE. Conformational and dynamic characterization of the molten globule state of an apomyoglobin mutant with an altered folding pathway. Biochemistry 2001;40(48):14459-14467. Nishimura C, Prytulla S, Dyson HJ, Wright PE. Conservation of folding pathways in evolutionarily distant globin sequences. Nature Structural Biology 2000;7(8):679-686. Ballew RM, Sabelko J, Gruebele M. Direct observation of fast protein folding: The initial collapse of apomyoglobin. Proceedings of the National Academy of Sciences of the United States of America 1996;93(12):5759-5764. Krishna MMG, Englander SW. The N-terminal to C-terminal motif in protein folding and function. Proceedings of the National Academy of Sciences of the United States of America 2005;102(4):1053-1058. Aksel T, Majumdar A, Barrick D. The Contribution of Entropy, Enthalpy, and Hydrophobic Desolvation to Cooperativity in Repeat-Protein Folding. Structure 2011;19(3):349-360. Malkin LI, Rich A. Partial resistance of nascent polypeptide chains to proteolytic digestion due to ribosomal shielding. Journal of Molecular Biology 1967;26(2):329-&. Voss NR, Gerstein M, Steitz TA, Moore PB. The Geometry of the Ribosomal Polypeptide Exit Tunnel. Journal of Molecular Biology 2006;360(4):893-906. Knight AM, Culviner PH, Kurt-Yilmaz N, Zou TS, Ozkan SB, Cavagnero S. Electrostatic Effect of the Ribosomal Surface on Nascent Polypeptide Dynamics. ACS Chemical Biology 2013;8(6):1195-1204. Komar AA, Kommer A, Krasheninnikov IA, Spirin AS. Cotranslational Folding of Globin. Journal of Biological Chemistry 1997;272(16):10646-10651.

- 27 John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Figure Legends

Figure 1. Representative three-dimensional structures of 3-3, 2-2 and archaeal globins. The images illustrate the crystal structure of: (A) trematode hemolglobin from Paramphistomum epiclitum (PDB ID: 1H97); (B) cyanoglobin from Synechocystis sp PCC 6803 (PDB ID:1S69); and (C) protoglobin from Methanosarcina acetivorans (PDB ID: 2VEB). The images were generated with PyMOL 1.2r2.

Figure 2. Plots illustrating the mean net charge and hydrophobicity per residue of the proteins studied in this work. The black solid lines in the plot separate natively folded from unfolded proteins as described by Dunker and coworkers (Oldfiled et al., 2005). Proteins falling to the left and right of the line are regarded as intrinsically disordered (IDPs) and fully folded, respectively. The dashed lines enclose regions of uncertain structural classification. All of the 22 globins (red) found in this study are grouped according to either (A) globin type or (B) kingdom of life.

Figure 3. Fraction of nonpolar solvent-accessible surface area (fNSASA) as a function of chain elongation for the native and fully extended conformations of representative structures from the three known subclasses of the globin fold: (A) trematode hemoglobin from Paramphistomum epiclitum (PDB ID: 1H97); (B) cyanoglobin from Synechocystis sp PCC 6803 (PDB ID:1S69); and (C) protoglobin from Methanosarcina acetivorans (PDB ID: 2VEB).

Figure 4. Graphs showing relative differences in fNSASA between the native (folded) and fullyextended (unfolded) states as a function of chain length, averaged across all the globins studied

- 28 John Wiley & Sons, Inc.

Page 28 of 40

Page 29 of 40

PROTEINS: Structure, Function, and Bioinformatics

in this work. Relative differences in fNSASA were computed as (folded-chain fNSASA – extendedchain fNSASA) / (extended-chain fNSASA). Gray and black segments denote regions characterized by either solvent-exposure or burial of nonpolar surface area upon folding, respectively. (A) Average over fifteen 3-3 globins; (B) six 2-2 globins; and (C) values for the archaeal globin from Methanosarcina acetivorans.

Figure 5. Graphs illustrating the relatives difference between the native- and extended-state fNSASA for each helix of the globin fold. Relative differences are determined by first computing the [(fNSASA at the end of the helix – fNSASA at the beginning of the helix) / (fNSASA at the beginning of the helix)] for the native and extended conformations. Values obtained for the extended chain were then subtracted from the values obtained for the native-like chain. Finally, averages across all corresponding helices within each type of globin were then computed. The resulting parameter, shown in the figure, illustrates the extent of fractional NSASA burial upon chain elongation per helix. In panel C, the region denoted as ‘n’ corresponds to the 21N-terminal unstructured residues that precede the A helix. This region of the archaeal globin was included because of its significant length, and is not present in 3-3 and 2-2 globins. Average values over (A) fifteen 3-3 globins, (B) six 2-2 globins and (C) the archaeal globin from Methanosarcina acetivorans.

Figure 6. Predicted average standard entropy change for folding as a function of percent chain length for three globin subclasses. All values were determined at 298 K, and assume native-like and fully extended conformations for the folded and unfolded chains, respectively. All calculations were carried out as described in the Methods section. Average over (A) fifteen 3-3

- 29 John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

globins; (B) six 2-2 globins; and (C) values for the archaeal globin from Methanosarcina acetivorans.

Figure 7. Predicted folding free energy for the native-like and fully extended chains of representative structures from three globin subclasses. All free energy calculations were carried out with FoldX, and PDB file for the fully extended chains were generated with PyMOL 1.2r2. (A) trematode hemoglobin from Paramphistomum epiclitum; (B) cyanoglobin from Synechocystis sp PCC 6803; and (C) protoglobin from Methanosarcina acetivorans.

Figure 8. Plots illustrating average predicted standard-state folding Gibbs free energy differences (∆G°) as a function of chain length. Native-like protein conformations were assumed for the folded form at all chain lengths. ∆G° values were estimated with FoldX. The dashed line denotes a linear extrapolation of the values for the initial 30% of the chain, to guide the eye. Data are shown for the average over (A) fifteen 3-3 globins; (B) six 2-2 globins; and (C) the archaeal globin from Methanosarcina acetivorans.

Figure 9. Comparisons between the chain-elongation behavior in the forward (from N to C terminus) and reverse (from C to N terminus) directions of a representative globin, leghemoglobin A from Glycine max (PDB ID:1BIN). (A) fraction of nonpolar solventaccessible surface area (fNSASA) of native-like and fully extended conformations. (B) relative differences in fNSASA between native (folded) and fully-extended (unfolded) conformations, computed as described in the legend of Figure 4. Gray and black segments denote regions characterized by either solvent-exposure or burial of nonpolar surface area upon folding,

- 30 John Wiley & Sons, Inc.

Page 30 of 40

Page 31 of 40

PROTEINS: Structure, Function, and Bioinformatics

respectively. (C) predicted standard folding entropy estimated as described in the legend of Figure 6 and the Methods section. (D) predicted standard-state folding Gibbs free energy differences (∆G°) computed as described in the Methods section.

- 31 John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Figure 1. Representative three-dimensional structures of 3-3, 2-2 and archaeal globins. The images illustrate the crystal structure of: (A) trematode hemolglobin from Paramphistomum epiclitum (PDB ID: 1H97); (B) cyanoglobin from Synechocystis sp PCC 6803 (PDB ID:1S69); and (C) protoglobin from Methanosarcina acetivorans (PDB ID: 2VEB). The images were generated with PyMOL 1.2r2. 170x363mm (600 x 600 DPI)

John Wiley & Sons, Inc.

Page 32 of 40

Page 33 of 40

PROTEINS: Structure, Function, and Bioinformatics

Figure 2. Plots illustrating the mean net charge and hydrophobicity per residue of the proteins studied in this work. The black solid lines in the plot separate natively folded from unfolded proteins as described by Dunker and coworkers (Oldfiled et al., 2005). Proteins falling to the left and right of the line are regarded as intrinsically disordered (IDPs) and fully folded, respectively. The dashed lines enclose regions of uncertain structural classification. All of the 22 globins (red) found in this study are grouped according to either (A) globin type or (B) kingdom of life. 113x176mm (600 x 600 DPI)

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Figure 3. Fraction of nonpolar solvent-accessible surface area (fNSASA) as a function of chain elongation for the native and fully extended conformations of representative structures from the three known subclasses of the globin fold: (A) trematode hemoglobin from Paramphistomum epiclitum (PDB ID: 1H97); (B) cyanoglobin from Synechocystis sp PCC 6803 (PDB ID:1S69); and (C) protoglobin from Methanosarcina acetivorans (PDB ID: 2VEB). 114x259mm (600 x 600 DPI)

John Wiley & Sons, Inc.

Page 34 of 40

Page 35 of 40

PROTEINS: Structure, Function, and Bioinformatics

Figure 4. Graphs showing relative differences in fNSASA between the native (folded) and fully-extended (unfolded) states as a function of chain length, averaged across all the globins studied in this work. Relative differences in fNSASA were computed as (folded-chain fNSASA – extended-chain fNSASA) / (extended-chain fNSASA). Gray and black segments denote regions characterized by either solvent-exposure or burial of nonpolar surface area upon folding, respectively. (A) Average over fifteen 3-3 globins; (B) six 2-2 globins; and (C) values for the archaeal globin from Methanosarcina acetivorans. 118x265mm (600 x 600 DPI)

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Figure 5. Graphs illustrating the relatives difference between the native- and extended-state fNSASA for each helix of the globin fold. Relative differences are determined by first computing the [(fNSASA at the end of the helix – fNSASA at the beginning of the helix) / (fNSASA at the beginning of the helix)] for the native and extended conformations. Values obtained for the extended chain were then subtracted from the values obtained for the native-like chain. Finally, averages across all corresponding helices within each type of globin were then computed. The resulting parameter, shown in the figure, illustrates the extent of fractional NSASA burial upon chain elongation per helix. In panel C, the region denoted as ‘n’ corresponds to the 21Nterminal unstructured residues that precede the A helix. This region of the archaeal globin was included because of its significant length, and is not present in 3-3 and 2-2 globins. Average values over (A) fifteen 3-3 globins, (B) six 2-2 globins and (C) the archaeal globin from Methanosarcina acetivorans. 121x281mm (600 x 600 DPI)

John Wiley & Sons, Inc.

Page 36 of 40

Page 37 of 40

PROTEINS: Structure, Function, and Bioinformatics

Figure 6. Predicted average standard entropy change for folding as a function of percent chain length for three globin subclasses. All values were determined at 298 K, and assume native-like and fully extended conformations for the folded and unfolded chains, respectively. All calculations were carried out as described in the Methods section. Average over (A) fifteen 3-3 globins; (B) six 2-2 globins; and (C) values for the archaeal globin from Methanosarcina acetivorans. 138x360mm (600 x 600 DPI)

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Figure 7. Predicted folding free energy for the native-like and fully extended chains of representative structures from three globin subclasses. All free energy calculations were carried out with FoldX, and PDB file for the fully extended chains were generated with PyMOL 1.2r2. (A) trematode hemoglobin from Paramphistomum epiclitum; (B) cyanoglobin from Synechocystis sp PCC 6803; and (C) protoglobin from Methanosarcina acetivorans. 114x251mm (600 x 600 DPI)

John Wiley & Sons, Inc.

Page 38 of 40

Page 39 of 40

PROTEINS: Structure, Function, and Bioinformatics

Figure 8. Plots illustrating average predicted standard-state folding Gibbs free energy differences (∆G°) as a function of chain length. Native-like protein conformations were assumed for the folded form at all chain lengths. ∆G° values were estimated with FoldX. The dashed line denotes a linear extrapolation of the values for the initial 30% of the chain, to guide the eye. Data are shown for the average over (A) fifteen 3-3 globins; (B) six 2-2 globins; and (C) the archaeal globin from Methanosarcina acetivorans. 130x329mm (600 x 600 DPI)

John Wiley & Sons, Inc.

PROTEINS: Structure, Function, and Bioinformatics

Page 40 of 40

Figure 9. Comparisons between the chain-elongation behavior in the forward (from N to C terminus) and reverse (from C to N terminus) directions of a representative globin, leghemoglobin A from Glycine max (PDB ID:1BIN). (A) fraction of nonpolar solvent-accessible surface area (fNSASA) of native-like and fully extended conformations. (B) relative differences in fNSASA between native (folded) and fully-extended (unfolded) conformations, computed as described in the legend of Figure 4. Gray and black segments denote regions characterized by either solvent-exposure or burial of nonpolar surface area upon folding, respectively. (C) predicted standard folding entropy estimated as described in the legend of Figure 6 and the Methods section. (D) predicted standard-state folding Gibbs free energy differences (∆G°) computed as described in the Methods section. 153x245mm (600 x 600 DPI)

John Wiley & Sons, Inc.

Burial of nonpolar surface area and thermodynamic stabilization of globins as a function of chain elongation.

Proteins are biosynthesized from N to C terminus before they depart from the ribosome and reach their bioactive state in the cell. At present, very li...
2MB Sizes 0 Downloads 3 Views