Aggregation landscapes of Huntingtin exon 1 protein fragments and the critical repeat length for the onset of Huntington’s disease Mingchen Chena,b and Peter G. Wolynesa,c,1 a

Center for Theoretical Biological Physics, Rice University, Houston, TX 77005; b Department of Bioengineering, Rice University, Houston, TX 77005; and c Department of Chemistry, Rice University, Houston, TX 77005 Contributed by Peter G. Wolynes, March 7, 2017 (sent for review February 9, 2017; reviewed by William A. Eaton and Joan-Emma Shea)

Huntington’s disease (HD) is a neurodegenerative disease caused by an abnormal expansion in the polyglutamine (polyQ) track of the Huntingtin (HTT) protein. The severity of the disease depends on the polyQ repeat length, arising only in patients with proteins having 36 repeats or more. Previous studies have shown that the aggregation of N-terminal fragments (encoded by HTT exon 1) underlies the disease pathology in mouse models and that the HTT exon 1 gene product can self-assemble into amyloid structures. Here, we provide detailed structural mechanisms for aggregation of several protein fragments encoded by HTT exon 1 by using the associative memory, water-mediated, structure and energy model (AWSEM) to construct their free energy landscapes. We find that the addition of the N-terminal 17residue sequence (NT17 ) facilitates polyQ aggregation by encouraging the formation of prefibrillar oligomers, whereas adding the C-terminal polyproline sequence (P10 ) inhibits aggregation. The combination of both terminal additions in HTT exon 1 fragment leads to a complex aggregation mechanism with a basic core that resembles that found for the aggregation of pure polyQ repeats using AWSEM. At the extrapolated physiological concentration, although the grand canonical free energy profiles are uphill for HTT exon 1 fragments having 20 or 30 glutamines, the aggregation landscape for fragments with 40 repeats has become downhill. This computational prediction agrees with the critical length found for the onset of HD and suggests potential therapies based on blocking early binding events involving the terminal additions to the polyQ repeats. Huntington’s disease | aggregation | solubility | aggregation free energy landscape | critical length

H

untingtin (HTT) is a huge protein (3,100 amino acid residues) that has been implicated in a variety of physiological functions (1, 2). Having an expanded polyglutamine (polyQ) region of 36 or more glutamine residues in the N terminus of HTT leads to Huntington’s disease as witnessed by the deposition of large intracellular protein aggregates or inclusion bodies, which are comprised primarily of N-terminal fragments coded by the exon 1 sequence of the mutant HTT (2–5). These N-terminal fragments, each composed of an N-terminal 17-residue sequence (NT17 ), a polyQ sequence that has expanded in length, and a proline-rich region afterward, originate from proteolysis of HTT (2, 3). The aggregation of the HTT exon 1 product is, therefore, believed to cause Huntington’s disease. Clinical studies have shown an inverse correlation between polyQ length and age of disease onset (5). By implicating polymer physics in the disease process, this regularity has intrigued biophysicists for decades. Expanded polyQ sequences, more generally, are associated with nine known neurodegenerative diseases (4). Biophysical chemists have, therefore, focused on first understanding the aggregation of pure polyQ peptides. The rate of aggregation of polyQ peptides is length-dependent, and primary nucleation is rate-limiting (6–8). Experiments on pure polyQ repeat peptides show that the critical nucleus size decreases from n* = 4 to n* = 4406–4411 | PNAS | April 25, 2017 | vol. 114 | no. 17

1 as the repeat length grows from Q18 to Q26 at laboratory concentration (8). Using the associative memory, water-mediated interactions, structure and energy model (AWSEM), previous simulations by our group have successfully explained how the change of critical nucleus size arises from the differences in the propensity of monomeric polyQ repeats of different lengths to form β-hairpins: the longer repeats fold into hairpins intramolecularly before they aggregate (9). The aggregation of the diseasecausing peptide is, however, further complicated by the presence of flanking amino acid sequences in fragments encoded by HTT exon 1. Experiments indicate that the addition of NT17 at the N terminus of polyQ enormously accelerates the aggregation, probably by encouraging the formation of prefibrillar oligomers (10–16), whereas the addition of the proline-rich region at the C terminus decreases the rate of aggregation apparently without changing fundamentally the mechanism (10, 16, 17). Structural characterization of the aggregates (13, 14, 18–20) has shown that, even when there are flanking sequences, polyQ remains the fiber core and adopts a β-hairpin conformation. In this paper, we use energy landscape analysis to provide a detailed molecular picture of the aggregation process of the peptide encoded by HTT exon 1, focusing on how the flanking sequences influence aggregation. Simulation of protein aggregation has been challenging because of both the limitations of computational resources and the lack of energy models with enough realism to fold globular proteins reliably and correctly (21). To overcome these problems, we use coarse-grained simulations of the aggregation of HTT exon 1-encoded fragments that use the AWSEM force field. While being efficient to simulate, the AWSEM force field can predict the structures of protein monomers (22–24) and predict details of assembly into oligomers (9, 25–28). We have already used the model to explore the detailed mechanisms of aggregation of pure polyQ peptides (9) and the peptide implicated in Alzheimer’s disease (28), Aβ40. The computed aggregation free Significance Formation of inclusion bodies is the pathological hallmark of Huntington’s disease. Huntingtin exon 1-encoded protein fragments aggregate in inclusion bodies, causing disease onset in a length-dependent manner. Using energy landscape theory, the critical length for disease onset is predicted to be between 30 and 40. Author contributions: M.C. and P.G.W. designed research; M.C. performed research; M.C. contributed new reagents/analytic tools; M.C. and P.G.W. analyzed data; and M.C. and P.G.W. wrote the paper. Reviewers: W.A.E., National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health; and J.-E.S., University of California, Santa Barbara. The authors declare no conflict of interest. 1

To whom correspondence should be addressed. Email: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1702237114/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1702237114

Chen and Wolynes

B 14

6 4

50

Free energy (kcal/mol)

12

40 10

30

8

VI

20 10

II 0 1

2

II

III

V

IV

6

VI’

4

4

5

0 -2 -4 -6 -8

-10 2

3

Oligomer Size

2

-12

6

1

0.1 mM before correction 0.001 mM 0.01 mM 0.1 mM 1.0 mM

2

3

4

5

6

Oligomer Size

IV

V

III

VI’

VI

terminal addition, monomers associate first to form α-helix– mediated oligomers before forming amyloids. After initial association, these oligomers undergo structural rearrangement toward the final fiber form (from structure VI0 to structure VI in Fig. 1). A similar sequence of events is also seen in the simulations for NT17 -Q30 (SI Appendix, Fig. S3) and NT17 -Q40 (SI Appendix, Fig. S6). In the simulations within a finite box, the cache of monomers becomes depleted as clusters grow, an effect that is negligible in laboratory experiments that study the early phases in aggregation. As in our previous papers (9, 28), we adopt the Reiss physical cluster theory (30) as detailed in Methods to correct for this finite size effect. The grand canonical free energy profile, Fn − nµ, becomes downhill at the nominal simulation concentration (0.1 mM) (purple line in Fig. 1B), whereas at the same concentration, the profile for the pure polyQ peptide is uphill. This change in slope agrees with experimental observations, indicating that NT17 -Q20 peptides aggregate much more readily than pure Q20 peptides. There is considerable evidence that the ratio of the concentration to the solubility limit plays a large role in aggregation in vitro (31) and perhaps, in vivo (32). We use the concentration dependence of the free energy profiles to estimate solubility limits. When extrapolated to 1 µM, the aggregation free energy profile of the N-terminal constructs becomes uphill; in contrast, the profile becomes very downhill at high supersaturation (1 mM) (31). The solubility of NT17 -Q20 estimated from our calculation, therefore, is in the range from 1 to 10 µM, agreeing with experimental evidence that peptides of a similar length, like NT17 -Q15 and NT17 -Q25 , only slowly aggregate at a concentration of around 5 µM (14). Kinetics of aggregation can be PNAS | April 25, 2017 | vol. 114 | no. 17 | 4407

BIOPHYSICS AND COMPUTATIONAL BIOLOGY

Fig. 1. The aggregation free energy landscape for NT17 -Q20 at 300 K. (A) The grand canonical free energy surface at the concentration of 100 µM at 300 K is plotted using the number of intermolecular hydrogen bonds and the oligomer size as the two dimensions. The number of intermolecular hydrogen bonds signals the association of oligomers into β-strands. The local basins for different oligomer states are labeled by their size. Representative structures of different oligomers are shown in each free energy basin of the aggregation progress: the N-terminal region is colored green, whereas the Q region is orange. Notice that the β-strands in each monomer remain extended. (B) The grand canonical free energy profiles for different oligomer states as corrected for the monomer concentration changes in the fixed number simulation allow one to infer the saturation value of the concentration of free monomers.

CHEMISTRY

N-Terminal Region Facilitates the Aggregation of NT 17 -Q20 Through Forming Prefibrillar Oligomers The HTT exon 1 gene product contains a 17-aa N terminus (NT17 ) that is followed by the polyQ tract. Proteolytic digestion of full-length HTT releases fragments that contain the NT17 terminus, polyQ repeats, and the following proline-rich segments. These fragments aggregate into a fiber form (2). Experiment and simulation of simple polyQ peptides show that the aggregation proceeds through a classical nucleated growth polymerization mechanism. Previous computations showed that the aggregation free energy landscape of Q20 at 300 K under experimental concentration (0.1 mM) leads to a critical nucleus of around three (SI Appendix, Fig. S1). In this section, we examine how the N-terminal addition affects aggregation. A solved crystal structure of the HTT exon 1 gene product with 17 glutamine repeats fused with maltose binding protein shows that the NT17 segment becomes α-helical (29). The NT17 tract has been shown to be critical for the formation of α-helix–rich oligomeric intermediates by Jayaraman et al. (14). We, therefore, first construct the aggregation free energy profile for six NT17 -Q20 monomers in a simulation box at the nominal laboratory concentration of the study by Wetzel (19). In addition to constant temperature studies, we use umbrella sampling with the fraction of native contacts of the simulated fiber structure as the biasing coordinate to construct free energy profiles. As shown in Fig. 1A, we can project the free energy landscape onto a 2D surface using the oligomer size and the number of interchain hydrogen bonds inside the six peptides as simultaneous progress variables. Compared with the pure Q20 peptide, the aggregation propensity of NT17 -Q20 is higher, greatly favoring the final fiber form. For pure Q20 , it is hard for the monomers to associate and cross the free energy barrier near the critical nucleus size of three (SI Appendix, Fig. S1). This barrier, however, in NT17 -Q20 aggregation has been eliminated by initially forming NT17 -mediated oligomers with various sizes (structures IV and VI0 in Fig. 1). Glutamine being amphiphilic, the linear association of pure polyQ peptides into their fiber form requires the formation of β-hydrogen bonds. With help from the NT17 -

A # interchain Hydrogen Bond

energy landscapes for pure Q20 and Q30 peptides found using AWSEM display two different critical nucleus sizes in agreement with experiments (9). To address the origin of the biophysically striking length dependence of aggregation of the fragment that is found in vivo, we now construct the free energy landscapes for aggregation of oligomers of HTT exon 1 protein fragments using the AWSEM force field. Although these fragments are much longer than the pure polyQ peptide, the codes remain computationally efficient. Our computations show, in agreement with experiment, that adding the N-terminal residues NT17 facilitates aggregation through forming prefibrillar species, whereas adding the prolines P10 after the polyQ repeat impedes the aggregation without obviously changing the overall aggregation mechanism of the polyQ core. The predicted grand canonical free energy profiles are found to be uphill for HTT exon 1 fragments, with Q20 and Q30 at the estimated concentration of 10 µM found inside the inclusion bodies formed during disease progress. At the same concentration, however, the free energy profile for a longer repeat construct, NT17 -Q40 -P10 , is predicted to be downhill. The simulations, therefore, predict the critical length for the aggregation of exon 1-encoded fragment to be between 30 and 40, remarkably consistent with the critical length for disease onset. The solubilities calculated from our simulations correspond very well with the experimental values. The detailed structural picture of the aggregation mechanisms of the HTT exon 1-encoded fragments suggests finding drugs that interfere with the early events that involve the terminal additions rather than the polyQ segments themselves.

12 10 8

30

VI

20 10

I 0 1

II

II 2

III 3

IV

6

V

4

5

0.1 mM before correction 0.001 mM 0.01 mM 0.1 mM 1.0 mM

0 -5 -10

2

4

Oligomer Size

5

6

IV

-15 1

2

3

4

5

6

Oligomer Size

V

VI

III

Fig. 2. The aggregation free energy landscape for Q20 -P10 at 300 K. (A) The grand canonical free energy surface at the concentration of 100 µM at 300 K is plotted using the number of intermolecular hydrogen bonds and the oligomer size as the two dimensions. The number of intermolecular hydrogen bonds signals the association of oligomers into β-strands. The local basins for different oligomer states are labeled by their size. Representative structures of different oligomers are shown in each free energy basin for the aggregation progress: the Q region is colored orange, whereas polyproline is blue. (B) The grand canonical free energy profiles for different oligomer states as corrected for the monomer concentration changes in the fixed number simulation allow one to infer the saturation value of the concentration of free monomers.

inferred using a diffusion picture based on the free energy profile. We plan to report on these results later. C-Terminal Polyproline Inhibits the Aggregation of Q20 -P 10 Without Altering the Aggregation Mechanism Proline-rich segments often follow the polyQ sequence in many proteins involved in neurodegenerative diseases (17, 33). Although the NT17 peptide addition encourages aggregation, the proline-rich terminus at the C end inhibits aggregation (17, 34–36). To computationally address the effect of the prolinerich addition on the aggregation of HTT exon 1-encoded protein fragments, we determined the aggregation free energy profile of six Q20 -P10 peptides in a simulation box at 300 K (nominal concentration of 0.1 mM). Compared with what was found for pure Q20 peptides, the 2D free energy landscape using the size of the oligomer and the number of interchain hydrogen bonds as the reaction coordinates displays a more uphill trend as aggregation proceeds (Fig. 2A). We visualized structures sampled in each oligomer basin. The polyproline region generally remains an unstructured coil, whereas sometimes, a polyproline helix forms (Fig. 2). As illustrated with the structures in Fig. 2, aggregation proceeds via linear addition of the Q20 -P10 peptides to existing oligomers, very much like the aggregation mechanism observed for Q20 . Simulations for Q30 -P10 and Q40 -P10 peptides show a similar inhibition of aggregation by the C-terminal addition. During aggregation, a free energy barrier of around 8kB T is encountered near the formation of a pentamer, suggesting that the critical nucleus size would be around five for this construct having only the polyproline tail. After making the correction for the change in the concentration of free monomers in the simulation box, the free energy profiles then exhibit a barrier around the pentamer. Only at a very high concentration of around 100 µM does the profile become flat, suggesting that the solubility of the construct having only the proline-terminal addition is around 100 µM. 4408 | www.pnas.org/cgi/doi/10.1073/pnas.1702237114

Aggregation Free Energy Landscape of Complete NT 17 -Q20 -P 10 Fragments Reveals a Complex Aggregation Mechanism The simulations suggest that the extended structure of the polyQ repeat is the preferred form of the monomer for constructs with the shorter polyQ repeats: Q20 , NT17 -Q20 , Q20 -P10 , and NT17 Q20 -P10 . The addition of the two termini does not significantly affect the preference of the repeat peptide to assume either a collapsed structure or an extended form in the monomer (SI Appendix, Fig. S8A). We wish to remind the reader that, for all constructs, the thermal ensemble is rather broad as evidenced by the modest free energy barriers seen in their free energy profiles (SI Appendix, Fig. S8). The extended β-strand structure and the β-hairpin are not the only structures populated in the simulated ensemble. Consistent with these results, experiments of Baias et al. (37) also suggest that the NT17 -Q17 -P7 peptide with an His tag at the C terminus adopts an unstructured state. Because the N-terminal addition and the polyproline addition act in opposite ways on the aggregation propensity, we next examined the aggregation free energy profile for the full construct of 20 repeat forms with both additions, the NT17 -Q20 P10 peptide, at an experimental concentration of 100 mM at 300 K. The 2D free energy profile is found to be uphill toward the final simulated fiber form (Fig. 3A). Representative structures from different oligomer basins are shown in Fig. 3. Even in these constructs with the proline addition, the addition of the NT17 residues eliminates the free energy barrier near the formation of a pentamer by encouraging association into prefibrillar oligomers (structure VI0 in Fig. 3). When we compare the landscape of the full construct with that for NT17 -Q20 peptides without the proline addition, we find that adding polyproline again makes the profile more uphill, slowing down aggregation without changing the sequence of events in aggregation. After correcting for monomer depletion, the profile of the full construct becomes slightly downhill at 100 µM. Extrapolation to

A

14

50

12

40 10

30

8

VI

6

20

V VI’ IV

10

III

I

0 1

2

3

4 2

4

Oligomer Size

5

6

B

12 10

Free energy (kcal/mol)

50 40

B 10

# interchain Hydrogen Bond

14

Free energy (kcal/mol)

# interchain Hydrogen Bond

A

8 6 4 2 0 -2 -4 -6 1

0.1 mM before correction 0.001 mM 0.01 mM 0.1 mM 1.0 mM

2

3

4

5

6

Oligomer Size

I

III

IV

V

VI’

VI

Fig. 3. The aggregation free energy landscape for NT17 -Q20 -P10 at 300 K. (A) The grand canonical free energy surface at the concentration of 100 µM at 300 K is plotted using the number of intermolecular hydrogen bonds and the oligomer size as the two dimensions. The number of intermolecular hydrogen bonds signals the association of oligomers into β-strands. The local basins for different oligomer states are labeled by their size. Representative structures of different oligomers are shown in each free energy basin of the aggregation progress: the N-terminal region is colored green, whereas the Q region is orange, and the polyproline is blue. (B) The grand canonical free energy profiles for different oligomer states as corrected for the monomer concentration changes in the fixed number simulation can be used to infer the saturation value of the concentration of free monomers.

Chen and Wolynes

12

10

50

10

40

VI VI’’ VI’

30 20 10 0 I 1

II 2

III

IV

3

4

V

8 6 4

6 4

3

II

VI’

4

5

6

Oligomer Size

Oligomer Size

I

III

VI’’

IV

V

VI

Fig. 4. The aggregation free energy landscape for NT17 -Q30 -P10 at 300 K. (A) The grand canonical free energy surface at the concentration of 100 µM at 300 K is plotted using the number of intermolecular hydrogen bonds and the oligomer size as the two dimensions. The number of intermolecular hydrogen bonds signals the association of oligomers in a β-strand form. The local basins for different oligomer states are labeled by their sizes. Notice that the monomers in all of the structures have already taken on a β-hairpin form. Representative structures of different oligomers are shown in each free energy basin of the aggregation progress: the N-terminal region is colored green, whereas the Q region is orange, and polyproline is blue. (B) The grand canonical free energy profiles for different oligomer states as corrected for the monomer concentration changes in the fixed number simulation can be used to infer the saturation value of the concentration of free monomers.

Chen and Wolynes

10

60 50

VI

8

VI’

6

40 30

10

IV

III

20

V

4

II

0 2

2 0 -2 -4 -6 -8 -10

2

3

II

4

Oligomer Size

III

VI’

5

-12

6

2

0.1 mM before correction 0.001 mM 0.01 mM 0.1 mM 1.0 mM

3

4

5

6

Oligomer Size

IV

V

VI

Fig. 5. The aggregation free energy landscape for NT17 -Q40 -P10 at 300 K. (A) The grand canonical free energy surface at the concentration of 100 µM at 300 K is plotted using the number of intermolecular hydrogen bonds and the oligomer size as the two dimensions. The number of intermolecular hydrogen bonds signals the association of oligomers into β-strands. The local basins for different oligomer states are labeled by their size. Representative structures of different oligomers are shown in each free energy basin of the aggregation progress: the N-terminal region is colored green, whereas the Q region is orange, and polyproline is blue. (B) The grand canonical free energy profiles for different oligomer states as corrected for the monomer concentration changes in the fixed number simulation show the saturation value of the concentration of free monomers. Notice that the monomers in the aggregate remain in the β-hairpin form, which by virtue of being both more rigid and longer, leads to more facile aggregate formation.

0

-4 2

6

12

70

4

2

-2

2

5

8

0.1 mM before correction 0.001 mM 0.01 mM 0.1 mM 1.0 mM

80

B

gating NT17 -Q20 makes the profile highly unfavorable for the final fiber state. The prefolded hairpin monomers still remain in the route to larger oligomers. Consistent with the α-helix– mediated oligomeric structures seen in NT17 -Q20 -P10 , the simulation yields a large variety of prefibrillar oligomers in the forms of tetramer, pentamer, and hexamer (structures IV, V, VI0 , and VI00 in Fig. 4). After correction for the chemical potential of free monomers, the profile becomes slightly downhill for NT17 -Q30 -P10 at 100 µM (purple line in Fig. 4B). Extrapolation to a lower concentration suggests that the solubility of the full 30-length polyQ construct is around 10 µM, consistent with laboratory results (15) (SI Appendix, Fig. S6). Aggregation Free Energy Landscapes of Long PolyQ Repeat Constructs Q40 , NT 17 -Q40 , Q40 -P 10 , and NT 17 -Q40 -P 10 The critical repeat length for onset of Huntington’s disease is 36. Because the full-length constructs containing Q20 and Q30 are below the disease threshold, we also simulated the aggregation of constructs including 40 repeats, which is above the disease threshold. The ensembles of all four types of sequences highly favor the β-hairpin structure of the monomer subunits according to our free energy calculations (SI Appendix, Fig. S8C). This hairpin motif persists in the structures for the individual subunits in the fiber forms. We first computed the aggregation free energy profile of the pure Q40 peptide at a laboratory concentration 0.1 mM at 300 K. Consistent with what we found for the aggregation of pure Q20 and Q30 , the simulations show that Q40 aggregates through PNAS | April 25, 2017 | vol. 114 | no. 17 | 4409

BIOPHYSICS AND COMPUTATIONAL BIOLOGY

B 12

14

CHEMISTRY

14

60

Free energy (kcal/mol)

# interchain Hydrogen Bond

A

90

Free energy (kcal/mol)

Aggregation Free Energy Landscapes of Intermediate-Length Repeat Constructs Q30 , NT 17 -Q30 , Q30 -P 10 , and NT 17 -Q30 -P 10 Although the monomeric structures of the constructs with 20 glutamines prefer the extended form, all of the constructs with 30 glutamines favor a β-hairpin form for the polyQ repeat region in the monomer (SI Appendix, Fig. S8B). Pure Q30 aggregation is a downhill process after the monomeric nucleus is formed at the laboratory concentration (100 µM) at 300 K (SI Appendix, Fig. S3) (9). The NT17 addition at the N terminus, by mediating the formation of prefibrillar oligomers, makes the free energy profile even more downhill in NT17 -Q30 compared with pure Q30 (structures III and VI0 in SI Appendix, Fig. S3). In contrast, adding P10 at the C terminus by itself makes the profile for Q30 -P10 more uphill toward the final fiber form compared with that for pure Q30 (SI Appendix, Fig. S4). The sequence of events in the aggregation of Q30 -P10 again resembles that for pure Q30 (SI Appendix, Fig. S4). As for the aggregation of NT17 -Q20 -P10 , the combined effects of the two terminal additions lead to a complicated aggregation free energy landscape with an obvious barrier now near the formation of a tetramer (Fig. 4A). Compared with Q30 P10 , adding the N terminus makes the profile less uphill by 3kB T at the size of a hexamer. At the same time, the further addition of the terminal polyproline segment to the fast aggre-

A # interchain Hydrogen Bond

a lower concentration indicates the solubility of the full construct to be around 10 µM, consistent with experiments that show that aggregation of the NT17 -Q20 -P10 peptide proceeds slowly at a concentration of 6 µM (10).

linear addition in a generally downhill fashion to its fiber state (SI Appendix, Fig. S5A). Again adding NT17 at the N terminus, mediating the formation of prefibrillar oligomers encourages aggregation and makes the free energy profile more downhill (structure VI0 in SI Appendix, Fig. S6). In contrast, the addition of P10 at the C terminus increases the solubility relatively for pure Q40 (SI Appendix, Fig. S4). The sequence of aggregation events after the early helical bundle association for Q40 -P10 nevertheless resembles that for pure Q40 (SI Appendix, Fig. S7). The combined additions at the two termini lead to a downhill aggregation at a concentration of 100 µM as shown in Fig. 5A. Much as for NT17 -Q20 -P10 and NT17 -Q30 -P10 , the NT17 addition by itself facilitates aggregation through forming prefibrillar oligomers (structures V and VI0 in Fig. 5), whereas the final P10 addition inhibits aggregation compared with NT17 -Q40 that lacks the proline addition. After correction for the finite size effects, the profile becomes downhill at 100 µM (purple line in Fig. 5B). To see the physiological consequence of these changing free energy profiles, we need to know the peptide concentrations in the relevant compartments of the brain. Within the cytosol, the concentration has been estimated to be around 0.15 µM. We can also estimate the concentration of HTT exon 1-encoded fragments enriched inside an inclusion body in brain tissue from multiple experiments (as we describe in SI Appendix) (3, 38, 39), suggesting that the concentration is around 10 µM inside an inclusion body. Extrapolating our predicted free energy curves to 10 µM, the NT17 -Q40 -P10 construct exhibits a downhill profile, whereas the predicted 1D aggregation free energy curves for NT17 -Q20 -P10 and NT17 -Q30 -P10 at this physiological concentration remain uphill. There is a change in the aggregation landscape at the disease threshold. Discussion The Coiled Coil-Mediated Aggregation Mechanism Caused by NT 17 vs. the Protective Role of the Polyproline Tail. The AWSEM simu-

lations have previously detailed the mechanism of aggregation for pure polyQ peptides, which occurs largely by linear addition (SI Appendix, Fig. S10A). Adding the C-terminal polyproline to pure polyQ does not change the mechanism, but this addition does significantly reduce the aggregation rate (SI Appendix, Fig. S10B). For the aggregation of HTT fragments having only the NT17 extension and no polyproline, the free energy profile becomes totally downhill at the simulation concentration (0.1 mM). This enhanced tendency to aggregate would lead to rapid aggregation, even for individuals having very short glutamine repeats. Adding the proline-rich sequence, therefore, seems to be protective through slowing down aggregation with these frayed ends. When the P10 fragment is added, the aggregation becomes downhill only when there are more than 40 repeats. The N-terminal peptide extension enhances aggregation through stabilizing prefibrillar structures held together by the α−helical coiled coil-like structures, raising the local concentration for final β-sheet formation (SI Appendix, Fig. S10C). By itself, the NT17 segment also has a very strong propensity to aggregate as a coiled coil; α-helical aggregation followed by a structural transition to the β-strand form occurs in another Q-rich protein, CPEB, which has been implicated in long-term memory (27). Coiled coils seem to play an essential role in aggregation of these proteins found in neurons (40).

fragments accumulate in compact inclusion bodies inside the neurons (42). Macdonald et al. (39) have provided a global value of the concentration of HTT averaged over the whole brain: 0.15 µM. According to this calculation at this concentration, the aggregation free energy profile would be very uphill, even in fulllength constructs containing longer repeats: NT17 -Q40 -P10 (Fig. 5B). The global average value of the concentration, however, does not account for the different microenviroments found in neurons (43). Inclusion bodies are associated with microtubules (2, 44), suggesting that HTT is transported along microtubules and concentrated into the inclusion bodies (44). Enrichment of HTT fragments inside the inclusion bodies then reaches the solubility limit (around 5 µM for fragments with 40 repeats). Locally, the solution becomes supersaturated, leading to macroscopic protein aggregates (31). The close association of HTT with the cytoskeleton makes it possible that stretching by mechanical force from the microtubules contributes to aggregate formation. Mechanical forcedriven aggregation was seen in our simulations of another Q-rich protein implicated in the formation of long-term memory, CPEB (27). Mechanical coupling of HTT with the cytoskeleton should be considered a candidate for the initial stage of forming inclusion bodies. Aggregation and Huntington’s Disease. Although macroscopic

aggregates are the hallmark of not only Huntington’s disease but also, several other neurodegenerative diseases (4), the significance of these macroscopic aggregates for pathogenesis remains under some dispute. Some have suggested that the inclusion bodies themselves containing aggregated protein fragments are not so pernicious (45). Instead, the toxicity of small oligomeric species of mutant protein fragments that have adopted specific conformational states may be critical (46, 47). Arrasate et al. (48) have shown that, rather than the total HTT level determining cell death, the amount of diffuse intracellular HTT predicts whether cell death occurs. They, therefore, proposed that inclusion bodies actually protect neurons by decreasing levels of toxic forms (48). The blurred definition between oligomers and small, yet still macroscopic aggregates may, however, be a source of confusion. The agreement of our concentration-dependent free energy profiles with the known genetic thresholds would seem to argue against the idea that small oligomeric species cause the disease. Our results are more consistent with the idea that aggregation does cause Huntington’s disease: either through structural defects caused by the aggregate itself, which may be entwined with the cytoskeleton, or by depleting the functional HTT inside the cytosol, eventually causing “loss of function” (2). Our results now connect the structural and thermodynamic details of aggregate formation with the genetically known onset length for the disease. The intermediate structures detailed in these simulations may be good targets for drug discovery. Methods A detailed description of the materials and methods is given in SI Appendix. Briefly, the simulations were carried out using AWSEM force field for proteins in the LAMMPS open source software package. Free energy profiles were computed by carrying out umbrella sampling using the structural similarity as the reaction coordinate.

gates are called inclusion bodies (41). In patients with Huntington’s disease, polyQ-expanded HTT exon 1-encoded protein

ACKNOWLEDGMENTS. We thank the Data Analysis and Visualization Cyberinfrastructure funded by National Science Foundation Grant OCI-0959097. This work was supported by National Institute of General Medical Sciences Grant R01 GM44557. Additional support was provided by D. R. BullardWelch Chair at Rice University Grant C-0016.

1. Harjes P, Wanker EE (2003) The hunt for huntingtin function: Interaction partners tell many different stories. Trends Biochem Sci 28:425–433. 2. Saudou F, Humbert S (2016) The biology of Huntingtin. Neuron 89:910–926.

3. Cooper JK, et al. (1998) Truncated N-terminal fragments of huntingtin with expanded glutamine repeats form nuclear and cytoplasmic aggregates in cell culture. Hum Mol Genet 7:783–790.

Inclusion Bodies, the Cytoskeleton, and Aggregation. Giant aggre-

4410 | www.pnas.org/cgi/doi/10.1073/pnas.1702237114

Chen and Wolynes

BIOPHYSICS AND COMPUTATIONAL BIOLOGY

27. Chen M, Zheng W, Wolynes PG (2016) Energy landscapes of a mechanical prion and their implications for the molecular mechanism of long-term memory. Proc Natl Acad Sci USA 113:5006–5011. 28. Zheng W, Tsai MY, Chen M, Wolynes PG (2016) Exploring the aggregation free energy landscape of the amyloid-β protein (1−40). Proc Natl Acad Sci USA 113:11835– 11840. 29. Kim MW, Chelliah Y, Kim SW, Otwinowski Z, Bezprozvanny I (2009) Secondary structure of Huntingtin amino-terminal region. Structure 17:1205–1212. 30. Reiss H, Bowles RK (1999) Some fundamental statistical mechanical relations concerning physical clusters of interest to nucleation theory. J Chem Phys 111:7501–7504. 31. Cellmer T, Ferrone FA, Eaton WA (2016) Universality of supersaturation in proteinfiber formation. Nat Struct Mol Biol 23:459–461. 32. Ciryam P, Kundra R, Morimoto RI, Dobson CM, Vendruscolo M (2015) Supersaturation is a major driving force for protein aggregation in neurodegenerative diseases. Trends Pharmacol Sci 36:72–77. 33. Hoffner G, Djian P (2014) Monomeric, oligomeric and polymeric proteins in huntington disease and other diseases of polyglutamine expansion. Brain Sci 4:91–122. 34. Dehay B, Bertolotti A (2006) Critical role of the proline-rich region in Huntingtin for aggregation and cytotoxicity in yeast. J Biol Chem 281:35608–35615. 35. Duennwald ML, Jagadish S, Muchowski PJ, Lindquist S (2006) Flanking sequences profoundly alter polyglutamine toxicity in yeast. Proc Natl Acad Sci USA 103:11045– 11050. 36. Darnell G, Orgel JPRO, Pahl R, Meredith SC (2007) Flanking polyproline sequences inhibit beta-sheet structure in polyglutamine segments by inducing PPII-like helix structure. J Mol Biol 374:688–704. 37. Baias M, et al. (2017) Structure and dynamics of the huntingtin exon-1 N-terminus: A solution NMR perspective. J Am Chem Soc 139:1168–1176. 38. Legleiter J, et al. (2010) Mutant huntingtin fragments form oligomers in a polyglutamine length-dependent manner in vitro and in vivo. J Biol Chem 285:14777–14790. 39. Macdonald D, et al. (2014) Quantification assays for total and polyglutamineexpanded huntingtin proteins. PLoS One 9:e96854. 40. Fiumara F, Fioriti L, Kandel ER, Hendrickson WA (2010) Essential role of coiled coils for aggregation and activity of Q/N-rich prions and polyQ proteins. Cell 143:1121–1135. 41. Kopito RR (2000) Aggresomes, inclusion bodies and protein aggregation. Trends Cell Biol 10:524–530. 42. Waelter S, et al. (2001) Accumulation of mutant huntingtin fragments in aggresomelike inclusion bodies as a result of insufficient protein degradation. Mol Biol Cell 12:1393–1407. 43. Aguzzi A, Altmeyer M (2016) Phase separation: Linking cellular compartmentalization to disease. Trends Cell Biol 26:547–558. 44. Muchowski PJ, Ning K, D’Souza-Schorey C, Fields S (2002) Requirement of an intact microtubule cytoskeleton for aggregation and inclusion body formation by a mutant huntingtin fragment. Proc Natl Acad Sci USA 99:727–732. 45. Klement IA, et al. (1998) Ataxin-1 nuclear localization and aggregation: Role in polyglutamine-induced disease in SCA1 transgenic mice. Cell 95:41–53. 46. Nagai Y, et al. (2007) A toxic monomeric conformer of the polyglutamine protein. Nat Struct Mol Biol 14:332–340. 47. Miller J, et al. (2011) Identifying polyglutamine protein species in situ that best predict neurodegeneration. Nat Chem Biol 7:925–934. 48. Arrasate M, Mitra S, Schweitzer ES, Segal MR, Finkbeiner S (2004) Inclusion body formation reduces levels of mutant huntingtin and the risk of neuronal death. Nature 431:805–810.

CHEMISTRY

4. Zoghbi HY, Orr HT (2000) Glutamine repeats and neurodegeneration. Annu Rev Neurosci 23:217–247. 5. Walker FO (2007) Huntington’s disease. Lancet 369:218–228. 6. Chen S, Ferrone FA, Wetzel R (2002) Huntington’s disease age-of-onset linked to polyglutamine aggregation nucleation. Proc Natl Acad Sci USA 99:11884–11889. 7. Bhattacharyya AM, Thakur AK, Wetzel R (2005) Polyglutamine aggregation nucleation: Thermodynamics of a highly unfavorable protein folding reaction. Proc Natl Acad Sci USA 102:15400–15405. 8. Kar K, Jayaraman M, Sahoo B, Kodali R, Wetzel R (2011) Critical nucleus size for disease-related polyglutamine aggregation is repeat-length dependent. Nat Struct Mol Biol 18:328–336. 9. Chen M, Tsai M, Zheng W, Wolynes PG (2016) The aggregation free energy landscapes of polyglutamine repeats. J Am Chem Soc 138:15197–15203. 10. Thakur AK, et al. (2009) Polyglutamine disruption of the huntingtin exon 1 N terminus triggers a complex aggregation mechanism. Nat Struct Mol Biol 16:380–389. 11. Ratovitski T, et al. (2009) Mutant huntingtin N-terminal fragments of specific size mediate aggregation and toxicity in neuronal cells. J Biol Chem 284:10855–10867. 12. Lakhani VV, Ding F, Dokholyan NV (2010) Polyglutamine induced misfolding of huntingtin exon1 is modulated by the flanking sequences. PLoS Comput Biol 6:e1000772. 13. Sivanandam VN, et al. (2011) The aggregation-enhancing huntingtin N-terminus is helical in amyloid fibrils. J Am Chem Soc 133:4558–4566. 14. Jayaraman M, et al. (2012) Slow amyloid nucleation via alpha-helix-rich oligomeric intermediates in short polyglutamine-containing huntingtin fragments. J Mol Biol 415:881–899. 15. Crick SL, Ruff KM, Garai K, Frieden C, Pappu RV (2013) Unmasking the roles of N- and C-terminal flanking sequences from exon 1 of huntingtin as modulators of polyglutamine aggregation. Proc Natl Acad Sci USA 110:20075–20080. 16. Shen K, et al. (2016) Control of the structural landscape and neuronal proteotoxicity of mutant Huntingtin by domains flanking the polyQ tract. Elife 5:e18065. 17. Bhattacharyya A, et al. (2006) Oligoproline effects on polyglutamine conformation and aggregation. J Mol Biol 355:524–535. 18. Hoop CL, et al. (2016) Huntingtin exon 1 fibrils feature an interdigitated β-hairpinbased polyglutamine core. Proc Natl Acad Sci USA 113:1546–1551. 19. Wetzel R (2012) Physical chemistry of polyglutamine: Intriguing tales of a monotonous sequence. J Mol Biol 421:466–490. 20. Kar K, et al. (2013) β-hairpin-mediated nucleation of polyglutamine amyloid formation. J Mol Biol 425:1183–1197. 21. Morriss-Andrews A, Shea JE (2015) Computational studies of protein aggregation: Methods and applications. Annu Rev Phys Chem 66:643–666. 22. Davtyan A, et al. (2012) AWSEM-MD: Protein structure prediction using coarsegrained physical potentials and bioinformatically based local structure biasing. J Phys Chem B 116:8494–8503. 23. Chen M, Lin X, Zheng W, Onuchic JN, Wolynes PG (2016) Protein folding and structure prediction from the ground up: The atomistic associative memory, water mediated, structure and energy model. J Phys Chem B 120:8557–8565. 24. Chen M, Lin X, Lu W, Onuchic JN, Wolynes PG (October 31, 2016) Protein folding and structure prediction from the ground up II: AAWSEM for α/β proteins. J Phys Chem B, 10.1021/acs.jpcb.6b09347. 25. Zheng W, Schafer NP, Wolynes PG (2013) Frustration in the energy landscapes of multidomain protein misfolding. Proc Natl Acad Sci USA 110:1680–1685. 26. Zheng W, Schafer NP, Wolynes PG (2013) Free energy landscapes for initiation and branching of protein aggregation. Proc Natl Acad Sci USA 110:20515–20520.

Chen and Wolynes

PNAS | April 25, 2017 | vol. 114 | no. 17 | 4411

Aggregation landscapes of Huntingtin exon 1 protein fragments and the critical repeat length for the onset of Huntington's disease.

Huntington's disease (HD) is a neurodegenerative disease caused by an abnormal expansion in the polyglutamine (polyQ) track of the Huntingtin (HTT) pr...
3MB Sizes 0 Downloads 10 Views