Computational design of a leucine-rich repeat protein with a predefined geometry.

Computational design of a leucine-rich repeat protein with a predefined geometry Sebastian Rämischa, Ulrich Weiningerb, Jonas Martinssona, Mikael Akkeb, and Ingemar Andréa,1 Departments of aBiochemistry and Structural Biology and bBiophysical Chemistry, Center for Molecular Protein Science, Lund University, SE-221 00 Lund, Sweden

Structure-based protein design offers a possibility of optimizing the overall shape of engineered binding scaffolds to match their targets better. We developed a computational approach for the structure-based design of repeat proteins that allows for adjustment of geometrical features like length, curvature, and helical twist. By combining sequence optimization of existing repeats and de novo design of capping structures, we designed leucine-rich repeats (LRRs) from the ribonuclease inhibitor (RI) family that assemble into structures with a predefined geometry. The repeat proteins were built from self-compatible LRRs that are designed to interact to form highly curved and planar assemblies. We validated the geometrical design approach by engineering a ring structure constructed from 10 self-compatible repeats. Protein design can also be used to increase our structural understanding of repeat proteins. We use our design constructs to demonstrate that buried Cys play a central role for stability and folding cooperativity in RI-type LRR proteins. The computational procedure presented here may be used to develop repeat proteins with various geometrical shapes for applications where greater control of the interface geometry is desired.

|

|

|

binding scaffold Rosetta buried cysteines computational protein design geometrical design

|

E

ngineered protein-binding scaffolds are increasingly used as therapeutics, diagnostic probes, intracellular reporter molecules, or fusion domains in protein crystallization (1). Nature provides a large variety of protein recognition scaffolds from which engineered systems could be built. Repeat proteins are used in a wide range of biological processes, including the immune response and regulatory cascades (2–5). They consist of simple, structurally similar building blocks, called repeats, that assemble into elongated tandem arrays (6). Their extended shapes result in proteins with extraordinarily large binding surfaces, which makes them ideal scaffolds for protein binding. Analogous to antibodies, repeat proteins can be divided into framework residues, which encode stability and structure, and variable positions, which are responsible for protein recognition (7). A striking difference from antibodies is that the global structure can vary considerably between repeat proteins, even within a family. This structural variability suggests that not only the directly interacting residues but also the overall shapes of these proteins are optimized for binding target molecules. Engineered repeat proteins have typically been developed by consensus sequence design, a method where highly conserved sequence positions are identified and scaffolds are built from identical repeats containing the most common residues at those positions (8). Consensus design has been successfully applied to create stable scaffolds from several repeat protein classes (9–12). However, this approach does not enable the design of binders with predefined shapes. Because the geometrical shape of an assembly is encoded by subtle structural differences between repeats and interrepeat interfaces, a structure-based design approach is required to design the assembly shape rationally. Optimizing the shape complementarity to a target molecule would enable development of scaffolds that are custom-made for their target proteins and could yield enhanced binding properties, such as simultaneous binding to www.pnas.org/cgi/doi/10.1073/pnas.1413638111

multiple functional sites in a single protein, binding site targeting, or specific recognition of protein oligomers. Leucine-rich repeat (LRR) proteins display a significant variation in shape (13, 14). The repeats in this protein class are typically composed of 20–30 residues, and they form helically twisted, solenoid-like structures with a continuous parallel β-sheet on the concave side; they can be elongated or highly curved (2). It has been shown that N- and C-terminal capping structures are crucial for folding and stability of LRR proteins (15, 16). The combination of a stable core of framework positions and variability in overall assembly structure makes LRRs ideal building blocks for the development of repeat proteins with rationally designed shapes. The feasibility of engineering LRR proteins has been demonstrated with consensus design applied to repeats from the ribonuclease inhibitor (RI) family (17), the nucleotide-binding oligomerization domain family (18), and the variable lymphocyte receptors (VLRs). The latter yielded a protein-binding scaffold named “repebodies” (19). In this work, we developed a computational approach for the structure-based design of repeat proteins that allows for adjustment of geometrical features like length, curvature, and helical twist. We used the method to design a self-compatible LRR that assembles into highly curved and planar repeat proteins. The repeats were designed to allow assembly into a closed-ring structure. Variants with five double repeats and added capping structures produced stable proteins. In the absence of caps, the same repeat variants can dimerize to form complete circles (fullring structures), thereby verifying the correctly designed geometry. The results demonstrate that stable proteins with predefined shapes can be built with limited sequence redesign if the conformation of the self-compatible repeat is selected carefully. Additionally, the results highlight the stabilizing effect of buried Significance Repeat proteins are used in nature to bind to proteins and peptides. The shape of their binding surfaces can vary substantially, even for proteins within the same family. This variability likely arose because they evolved to match the proteins they interact with geometrically. Repeat proteins are often engineered to develop binders specific to new target proteins. It would be highly beneficial to design repeat proteins with predefined geometrical shapes because such a method would enable development of engineered repeat proteins that are shape-optimized to their targets. Here, we demonstrate that repeat proteins with a predefined shape can be designed using a computational design method. The approach is exemplified by the design of a protein that forms a ring structure not seen in nature. Author contributions: S.R. and I.A. designed research; S.R., U.W., J.M., M.A., and I.A. performed research; S.R., U.W., M.A., and I.A. analyzed data; and S.R., U.W., M.A., and I.A. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. Freely available online through the PNAS open access option. 1

To whom correspondence should be addressed. Email: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1413638111/-/DCSupplemental.

PNAS Early Edition | 1 of 6

BIOPHYSICS AND COMPUTATIONAL BIOLOGY

Edited by David Baker, University of Washington, Seattle, WA, and approved October 30, 2014 (received for review July 17, 2014)

Cys in the core of RI-type LRR proteins and address the role of capping motifs in the folding of repeat proteins.

A

B

Results To develop an approach for the design of repeat proteins with a defined geometry, we started from the following considerations: Repeat proteins can be described as arrays of repeating structural units. However, there are subtle but important differences between the conformations of individual repeats that encode the compatibility with neighboring repeats as well as the overall shape of the protein. Features like buried hydrogen bonds (e.g., in Asn and Cys ladders, in structural water molecules, in contacts formed between interacting loop segments) serve as specificity elements that define the relative orientation of neighboring repeats. Thus, the central question when designing shape-optimized repeat proteins is how to engineer these specificity elements accurately. Because nature has already developed a variety of specific interaction geometries, we attempt to adopt these detailed features from repeats with a known structure. In structures of known LRR proteins, the curvaturedefining angles between neighboring repeats range from −0.5° to 37.2° and helical twist angles are between −11.2° and 10.5°. The space of available conformations for repeats within a certain family is quite limited and can be sampled from the structures of intact repeat proteins. Because each protein contains many repeats, a large number of possible backbone conformations can be assembled from a small set of protein structures. We developed a computational design method where repeat proteins with predefined shapes are assembled from structurally compatible building blocks taken from crystal structures. Subsequently, sequence redesign is used to optimize selfcompatibility. A summary of the procedure is shown in Fig. 1A; it consists of the following steps: i) The desired geometry of the protein is defined. ii) A library of structures of individual repeats is compiled from crystal structures of selected repeat proteins. iii) Repeats that are most likely to assemble with the predefined geometry are selected from the structural library. In this study, we limit ourselves to symmetrical assembly of single self-compatible repeats. iv) Cycles of rigid body docking and computational sequence design are used to optimize the interface between the repeats. v) The repeats are covalently connected via loops between consecutive repeats. vi) In most cases, capping repeats have to be added to the Nand C-terminal sides of the repeat protein, either by adopting existing caps or using de novo design. As a proof of principle, we applied the design protocol to create a protein built from a single type of self-compatible LRR that assembles to form a planar structure with high curvature. We set a challenging design goal by requiring that the interaction geometry between consecutive repeats would allow for assembly into a closed circular molecule. Once self-compatible repeats are designed that meet those criteria, planar repeat proteins with various number of repeats could be made. These strict geometrical constraints make it possible to address a number of issues. First, the ring form is not dependent on capping repeats. Thus, the question of whether caps are strictly required for folding can be addressed. Second, a structure composed of identical repeating units that does not require caps would be an excellent system to deconvolute intra- vs. interrepeat energies in LRR proteins, as previously done using capped consensus ankyrin repeat protein (20). Third, formation of a ring is a stringent test of our ability to control the interaction geometry between repeats precisely. If we could first design a stable capped half-ring, then only a fully planar half-ring with the designed 2 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1413638111

Fig. 1. Overview of the design procedure. (A) Illustration of the general workflow for designing repeat proteins (gray) with a predefined shape, optimized for a target molecule (rose). (B) Summary of the design as done in this study: Native A + B double repeats were extracted from the RI crystal structure. Symmetrical docking of each unit (units 1–7) was performed using four different symmetries (C8–C12). The thick black line indicates the best combination. Sequences of both surface residues (light blue) and core residues (green) were optimized; charged residues are shown in red (negative) and blue (positive). Symmetrical docking verified the planar assembly of 10 identically redesigned double repeats.

curvature would dimerize to form a full ring. To the best of our knowledge, a donut-shaped structure has not been observed among LRR proteins. Selection of an Optimal Building Block. For creating a small structural library of repeats, we focused on the RI family because the members of this family possess a relatively high curvature and comparably little twist (21). Proteins from the RI family are assembled from two alternating 28-aa and 29-aa repeats, referred to as A-type and B-type repeats. The different sequence lengths result in subtle conformational differences. A-type and B-type repeats always come in pairs, which suggests that they are not self-compatible. We initially attempted to design repeat proteins exclusively from either A or B repeats, but the resulting models were poorly packed and had little shape complementarity between repeats. Consequently, we used A + B double repeats as structural building blocks and compiled a library of seven members from the porcine RI (22) (Fig. 1B). Using symmetrical docking simulations (23, 24), we then identified the best candidate double repeat and the optimal unit number for assembling a ring-shaped structure. By applying different Rämisch et al.

rotational symmetries (C8, C9, C10, C11, and C12), closed planar ring-shaped structures with varying numbers of repeating units, and hence curvatures, were generated (Fig. 1B). The quality of an assembly was evaluated by the degree of shape complementarity, the strength of hydrogen bonds between β-strands, the amount of void volume, and the binding energy between repeats, as well as the distance between the N and C termini of consecutive repeats, which needed to be covalently connected. Best results were obtained for 10 repeating units (C10) with double repeat number 6 (residues 310–366) as the building block. The crystal structure of the horseshoe-shaped porcine RI has an accumulated helical twist of 16° and a curvature of 238° over seven double repeats. If extended to a complete ring, the two end repeats would sterically overlap by more than 13 Å and the rise along the helical axis would be 11 Å. Thus, the geometry of the design model, with a helical twist of 0° and a total curvature of 360° (252° over seven double repeats), differs considerably from the native RI protein from which the repeat was adopted. Sequence Optimization to Increase Self-Complementarity. To minimize the energy of the assembly, we performed cycles of sequence optimization using Rosetta’s fixed-backbone design algorithm (25) and symmetrical docking. To minimize the risk of aggregation, we enforced a net charge of −3 per double repeat (Fig. 1B). At the concave side, a repeating electrostatic interaction pattern was designed by selecting Lys, Asn, and Ser in the β-strand of repeat A and Glu, Ser, and Ser in repeat B. Because the model with the native core sequence already looked promising, and to avoid the risk of removing critical interactions, we tried to minimize sequence changes in the core. Fixed-backbone design calculations suggested that introduction of a Cys residue at position 56 would be greatly stabilizing. Cys is highly conserved at position 56 in RItype proteins, yet it is missing from the native sequence of the sixth RI double repeat. In the RI protein, five of the seven double repeats have a Cys at this position. In our most conservative design, called geometrically designed LRR-1 (gdLRR-1), we adopted the core residues from the sixth RI double repeat, except for position 56, where Thr was replaced by Cys. In both the crystal structure of RI and our design model, Cys56 mediates buried polar contacts. In the final models, the N and C termini of consecutive repeats ended up favorably positioned, so that consecutive repeats could be covalently connected by reintroducing Thr, which had been removed for docking of unconnected repeats. We constructed a second self-compatible repeat, gdLRR-2, based on the core sequence of the previously published RI consensus design (17). This core resulted in good packing with low energy. The consensus sequence lacks Cys56, as well as an additional Cys at position 9, which had been removed for experimental reasons. Cys9 is highly conserved in RI repeats, more so than Cys56. Based on the high degree of conservation and favorable energetics, we introduced Cys9 and Cys56 into the consensus core of gdLRR-2. All designed repeat sequences are shown in Fig. 2. Addition of Capping Motifs. To be able to produce stable and soluble repeat proteins with fewer repeats than needed for ring formation, capping structures at the N- and C-terminal ends of Rämisch et al.

the molecule are required (15, 16, 18). We analyzed the complementarity of native caps from RI-type LRR proteins to the designed LRRs by rigid-body docking in Rosetta. The C-terminal cap of porcine RI can dock with an almost identical interface as the one found in native RI, and with similar association energy. Thus, we chose to add the C-terminal 32 aa of porcine RI without further modifications. On the N-terminal side, we found little compatibility between the designed proteins and N-caps from all RI-type LRR proteins of known structure, something that could not be corrected by redesigning the interface residues. We chose to design N-caps de novo. Using a combination of interface design using Protein Data Bank ID code 1IO0, and simultaneously optimizing interface residues within both the cap and the first internal repeat (Fig. 3A), we obtained a model with interface energies similar to those interface energies found in native LRRs. Because the first repeat could be either an A-type or B-type repeat, two N-caps were designed: one coupled to an A-type repeat and one coupled to a B-type repeat. Ab initio folding simulations of the designed sequences attached to internal repeats revealed a strong convergence toward the designed conformations, as shown in Fig. 3B. For experimental characterization, we combined five self-compatible double repeats, based on gdLRR-1 or gdLRR-2, with the different caps (Fig. 3D); the four constructs are referred to as gdLRR-1-A-cap, gdLRR-2-A-cap, gdLRR-1-B-cap, and gdLRR-2-B-cap. For proteins with B-caps, we inserted an additional redesigned N-terminal B-type repeat (Fig. 3D). A structural model for one of them, gdLRR-1-A-cap, is shown in Fig. 3C. Biophysical Characterization. All four variants could be overexpressed in Escherichia coli with yields of 10–40 mg of pure protein from the soluble fraction of 1 L of culture volume (Fig. 4A).

A

B

C

D Fig. 3. De novo design of N-terminal caps. (A) Designed interface between the first internal A-repeat (gray) and the designed N-terminal cap (red). (B) Rosetta energy landscape from ab initio structure prediction of the designed N-terminal sequence. The y axis shows energy of 50,000 models, and the x axis indicates the Cα rmsd to the designed backbone conformation. (C) Model of the complete gdLRR-1-A-cap (blue), docked C-terminal cap from Protein Data Bank ID code 1IO0 (red), and de novo designed N-terminal capping motif. (D) Architecture of the final constructs; residues in the first internal repeat were redesigned together with the N-terminal capping motif, as indicated by red dots. Gray boxes represent A + B double repeats.



Fig. 2. Alignment of internal repeat sequences. Highlighted positions indicate differences between variants; core positions are highlighted in orange, and surface positions are highlighted in green. Positions that were left at their native identity are marked with an asterisk.

The purified proteins are highly soluble, can be kept in solution at room temperature for several weeks, and can be lyophilized without triggering aggregation. CD spectroscopy showed a broad negative absorption band with a minimum at 221 nm, indicating mixed α/β secondary structure content (Fig. 4C and Fig. S1). Temperature denaturation followed by CD indicates higher degrees of cooperativity for the variants with A-caps (Fig. 4C and Fig. S1). The temperature denaturation midpoints (Tm) were 41 °C for gdLRR-1-A-cap, 45 °C for gdLRR-1-B-cap, 56 °C for gdLRR-2-A-cap, and 58 °C for gdLRR-2-B-cap. More detailed analysis was carried out for one of the constructs, gdLRR-2-A-cap. FTIR spectroscopy was used to study secondary structure formation. The second-derivative FTIR spectrum showed characteristic bands in the amide-I region that indicate the presence of both β-strands (1,621 cm−1 and 1,631 cm−1) and the α-helix (1,655 cm−1). As a reference system, we used a highly soluble but unfolded design variant, gdLRR-3-A-cap, which is described in more detail below. The second-derivative FTIR spectrum of this protein showed a strong signal at 1,648 cm−1, which is indicative of a random coil (28) (Fig. 4B). Thus, FTIR analysis demonstrates that gdLRR-2A-cap is a well-folded protein of mixed α/β secondary structure. To characterize the tertiary structure of the folded state further, we studied the protein with heteronuclear 1H–13C NMR spectroscopy. The dispersion of methyl groups in NMR spectra is an excellent probe for the presence of tertiary structure. Without structure formation, all methyl groups of the same type cluster at chemical shifts that are characteristic of a random coil. In a folded protein, a distinct chemical environment of individual methyl groups leads to different chemical shifts resulting in a dispersion of signals. We expressed gdLRR-2-A-cap and, as a reference, the unfolded gdLRR-3-A-cap (see below) with [1-13C]-glucose as the sole carbon source. Expression results in specific labeling of isolated sites, including methyl groups (29). In agreement with the FTIR results, the 2D 1H–13C NMR spectrum of gdLRR-3-A-cap is characteristic of a random coil, with no more than 5% folded conformation (Fig. 4D, blue contours), whereas gdLRR-2-A-cap shows methyl signals that are well dispersed, indicating an intact tertiary structure for at least 99% of the ensemble (Fig. 4D, green contours). We further compared the experimental 1H–13C spectrum of gdLRR-2-A-cap with a theoretical spectrum, calculated for the Rosetta model using SHIFTX2 (30) (Fig. S2). In general, the experimental chemical shift dispersion compares favorably with the predicted spectrum, although a limited number of welldispersed predicted signals from residues in, or near, the cap regions are missing in the experimental spectrum. By contrast, the spectrum predicted for an intrinsically disordered protein of the same sequence, calculated using the ncIDP program (31), shows no similarity to the experimental spectrum (Fig. S2 B–D). Taken together, these results firmly support the conclusion that gdLRR2-A-cap forms a well-defined structure in solution. To investigate the extent of structural flexibility of gdLRR-2A-cap, we carried out NMR relaxation dispersion experiments,

A

C

B

4 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1413638111

which probe conformational exchange on the millisecond time scale. The results reveal that only a few methyl groups display a significant dispersion in their relaxation profiles, whereas the large majority of residues have flat profiles (Fig. S3). The absence of relaxation dispersion implies that the particular residue does not exchange between alternative conformations, or that the chemical shifts are identical in the different states. Conformational exchange is seen for two Leu residues and two of four Tyr residues in the protein (Fig. S3). Two Tyr residues are located in each of the caps; thus, at least one of the caps samples an alternative, high-energy state with a relative population of ∼3%. For most methyl groups that do not show exchange, the chemical shift is clearly indicative of a well-structured environment. Therefore, we conclude that the core of the protein is not in exchange with an unfolded species on the millisecond time scale. Dimerization of gdLRR-2. After developing a stable assembly of selfcompatible repeats with caps, we wanted to determine whether the interaction geometry of the repeat was correctly encoded. In the absence of both caps, only a structure with no twist and the correct curvature is expected to form a dimeric complex of two fivedouble repeat half-rings. We created a construct consisting exclusively of five gdLRR-2 double repeats for experimental testing. CD and FTIR spectra of this protein were similar to the capped variants (gdLRR-2-A-cap and gdLRR-2-B-cap) (Fig. 5 A and B). Dynamic light scattering (DLS) experiments suggested a diameter of the purified protein of ∼80 Å, which is very close to the longest interatomic distance within the model of the ring (77 Å). Analytical ultracentrifugation showed dimers, but no trace of monomers in solution (Fig. S4). To validate the structural model further, we studied the dimers with negative-stain EM. As shown in Fig. 5C, the micrographs demonstrate that the dimers form donut-shaped structures with an outer diameter of ∼80 Å and an inner diameter of 20–25 Å. These dimensions are in agreement with the results from DLS and the structural model (Fig. 5D). Temperature denaturation followed by CD revealed an apparent two-state behavior with a high degree of unfolding cooperativity (Fig. 5A). The dimer refolds upon cooling (Fig. S5) and has a Tm of 86 °C, which is significantly higher than the Tm of the capped variants. This result suggests that disassembly and unfolding happen simultaneously; dissociation into stable monomers below 86 °C is unlikely because the capped monomer starts to unfold above 60 °C and uncapped monomers are expected to be even less stable. The successful formation of a planar ring structure with the designed curvature does not exclude that the capped variants adopt a different geometry. However, because the capped monomers have a welldefined structure and the closed-ring structure of the noncapped dimer is very stable, we believe that energetically costly large-scale rearrangements upon dimerization are unlikely. Cys56 Stabilizes the Folded State of a Designed LRR. The successful designs were created by introduction of Cys into a conservatively redesigned core (Fig. 6A). However, we wanted to investigate

D Fig. 4. Biophysical characterization of gdLRR-2-Acap. (A) Overexpression of gdLRR-4-A-cap. P, pellet of insoluble fraction; S, soluble fraction. (B) Secondderivative FTIR spectra of gdLRR-2-A-cap (green) and gdLRR-3-A-cap (blue). Greek letters indicate the corresponding secondary structure. (C, Top) Far-UV CD spectrum of gdLRR-2-A-cap. (C, Bottom) Temperature denaturation monitored by ellipticity at 222 nm. Dots indicate measured values, and the solid line shows a fit to a two-state model. (D) Methyl regions of the 1H–13C heteronuclear single quantum coherence spectra of gdLRR-2-A-cap (green) and gdLRR-3-A-cap (blue).

Rämisch et al.

B

C

D

Fig. 5. Structure analysis of the cap-deficient gdLRR-2. (A) Thermal unfolding monitored by ellipticity at 222 nm. Dots represent the data, and the solid line shows the fitted function. (Inset) Far-UV CD spectra during the thermal melting experiment; colors indicate the temperature from dark blue (15 °C) to dark red (95 °C). (B) Second-derivative FTIR spectrum. Greek letters indicate secondary structure. (C) Negative-stain EM images at a magnification of 55,000×. (D) Model of the assembled gdLRR-2 dimer; one monomer is colored in blue, and one monomer is colored in red.

whether a stable and folded repeat protein could be designed without buried Cys. To this end, we carried out fixed-backbone design of the core, while disallowing Cys. Because the core of the RI protein is poorly packed, we attempted to stabilize the designed repeats by adding more hydrophobic residues. We constructed two new sequence variants (gdLRR-3 and gdLRR-4; Fig. 2) with more favorable energies compared with gdLRR-1 and gdLRR-2. In both constructs, the core Cys9 and Cys56 were replaced by Thr and Ile, respectively. The hydrophobic Ile fits very well into the cleft between the descending loops of two adjacent A-type and B-type repeats and forms favorable contacts to other core residues. For experimental testing, we constructed four capped constructs with five self-compatible double repeats, named gdLRR-3-A-cap, gdLRR-3-B-cap, gdLRR-4-A-cap, and gdLRR-4-B-cap. All four constructs could be expressed and purified to high yields. Analysis by CD spectroscopy indicated that all constructs were unfolded (Fig. S6). FTIR and NMR spectra of gdLRR-3-Acap confirmed a lack of secondary and tertiary structure (Fig. 3 B and D). The gdLRR-2 and gdLRR-3 repeats differ only at three core positions, two of which are positions 9 and 56. We speculated that the constructs based on gdLRR-3 were unfolded due to the absence of either one or both core Cys. To test this hypothesis, we mutated Ile56 to Cys in gdLRR-3-A-cap. The resulting protein, named gdLRR-3-A-cap-I56C, has a CD spectrum similar to the CD spectra of the folded variants (Fig. 6B). Thus, a single mutation in each repeat seems to induce secondary structure formation, which demonstrates the stabilizing effect of Cys56. Discussion A key for designing shape-optimized repeat proteins is to control accurately the subtle structural features that encode the overall shape of these molecules. Our approach is to adopt these detailed features from known repeat structures. The benefit of this conservative approach to shape-controlled engineering of repeat proteins is that it limits the need for precise design of backbone Rämisch et al.

conformations. With limited redesign of the core, the backbone conformation of the designed repeat is most likely close to the template, and hence the design model. The experimental analysis demonstrates that changing a single amino acid in the core sequence of the template repeat was sufficient to obtain a folded and stable protein, gdLRR-1. Moreover, limited redesign significantly increased stability and folding cooperativity (gdLRR-2). Both results highlight that the crucial step in designing RI repeat proteins was the selection of an optimal building block, rather than introducing new interactions between building blocks by extensive redesign of the core. At some positions, there is more tolerance for introducing amino acids that are rare within a particular LRR family, like Val20 and Ile44 in gdLRR-1 or Ala11 in gdLRR-3-C56. However, highly conserved positions are likely to be more sensitive to substitutions, for instance, Cys9. The structural role of Cys at position 9 of LRR proteins (C10 in the most common numbering scheme) has been extensively described (32, 33). It is part of an Asn/Cys ladder, which is located immediately following the β-strand, where it connects neighboring repeats via side-chain– to–main-chain hydrogen bonds. When comparing gdLRR-3-Acap-C56, which lacks Cys9, and gdLRR-2-A-cap, which does have Cys9, there is a considerable difference in unfolding cooperativity between these constructs. The increased cooperativity of gdLRR-2-A-cap gives some experimental evidence that the Asn/Cys ladder may be important for encoding folding cooperativity in RI-type LRR proteins. Less attention has been paid to Cys on the descending loops of A-type repeats, where a cleft to the preceding B-type repeat is found. In RI, as well as in our design models, Cys56 appears to stabilize the conformation of the descending loop by forming hydrogen bonds either to the backbone nitrogen at position i + 2 or to carbonyl oxygen in the neighboring repeat’s helix cap (Fig. 6A). Polar interactions at this site are likely to be important, because Cys56 is substituted by Thr or Ser in some repeats. We observed a significant increase in secondary structure content and unfolding cooperativity by mutating Ile56 to Cys in gdLRR-3. This observation is an experimental indication that polar interactions at this site are essential for stability. It has been suggested that destabilization of single repeats within an assembly is essential for the function of repeat proteins by introducing flexibility into the protein (34–36). Perhaps this need for selective instability explains the absence of highly conserved Cys in some native repeats like the template repeat used in this study. Selective mutations of core Cys may provide a means for selectively destabilizing specific repeats in designed binder proteins. Two previously reported consensus sequence designs of LRR proteins were based on VLR (19) and RI-type repeats (17). VLRs consist only of 24 residues, which does not allow for formation of canonical α-helices on the convex side of the

A

B

Fig. 6. Roles of buried Cys at positions 9 and 56. (A) Structural representation of Cys9 (Top) and Cys56 (Bottom) in the model of gdLRR-2. Yellow dashed lines indicate N. . .HO and N. . .H-N H-bonds, pink dashed lines indicate Cys H-bonds. (B) Comparison of CD spectra (Left) and temperature denaturation curves followed by ellipticity at 222 nm (Right) of gdLRR-3-A-cap (blue) and gdLRR-3-A-cap-I56C (green).



A

horseshoe. The shorter repeat sequence results in a protein with less void volume, which may explain the higher thermal stability of the designed VLR compared with RI repeats. The RI consensus design unfolds with low cooperativity (17). There is a strong similarity between the published CD spectra of the fiverepeat consensus design and the nonfolded gdLRR-3 and gdLRR-4 variants. The folded gdLRR-2 variants differ from the consensus sequence design at two core positions, Cys9 and Cys56. Our results suggest that introduction of these residues into the consensus sequence design may improve stability and folding cooperativity. However, the different capping repeats may also play a role in explaining the different biophysical properties compared to our constructs. It has been shown that the LRR protein internalin-B has a polarized folding pathway, starting from the N-terminal capping motif (15). A capless ring could be used to investigate whether caps are crucial for initiating folding or merely for stabilizing the folded state. Here, we show that a dimeric full-ring construct is stably folded without caps. Although shielding the hydrophobic core is almost certainly crucial for stability, specific capping motifs may not be required for initiating folding of LRR proteins. Here, we present a general design approach for development of geometrically optimized repeat proteins built from selfcompatible repeats. The proteins developed in this study were designed to assemble with a repeat-repeat interface geometry defined by cyclical symmetry. However, a wide variety of shapes can be designed by applying helical symmetry, which can readily be implemented into the design process. The design method could also be extended to include multiple different building 1. Boersma YL, Plückthun A (2011) DARPins and other repeat protein scaffolds: Advances in engineering and applications. Curr Opin Biotechnol 22(6):849–857. 2. Kobe B, Kajava AV (2001) The leucine-rich repeat as a protein recognition motif. Curr Opin Struct Biol 11(6):725–732. 3. D’Andrea LD, Regan L (2003) TPR proteins: The versatile helix. Trends Biochem Sci 28(12):655–662. 4. Li J, Mahajan A, Tsai M-D (2006) Ankyrin repeat: A unique motif mediating proteinprotein interactions. Biochemistry 45(51):15168–15178. 5. Al-Khodor S, Price CT, Kalia A, Abu Kwaik Y (2010) Functional diversity of ankyrin repeats in microbial proteins. Trends Microbiol 18(3):132–139. 6. Kajava AV (2012) Tandem repeats in proteins: From sequence to structure. J Struct Biol 179(3):279–288. 7. Main ERG, Phillips JJ, Millership C (2013) Repeat protein engineering: Creating functional nanostructures/biomaterials from modular building blocks. Biochem Soc Trans 41(5):1152–1158. 8. Forrer P, Stumpp MT, Binz HK, Plückthun A (2003) A novel strategy to design binding molecules harnessing the modular nature of repeat proteins. FEBS Lett 539(1-3):2–6. 9. Main ERG, Xiong Y, Cocco MJ, D’Andrea L, Regan L (2003) Design of stable α-helical arrays from an idealized TPR motif. Structure 11(5):497–508. 10. Kohl A, et al. (2003) Designed to be stable: Crystal structure of a consensus ankyrin repeat protein. Proc Natl Acad Sci USA 100(4):1700–1705. 11. Binz HK, Stumpp MT, Forrer P, Amstutz P, Plückthun A (2003) Designing repeat proteins: Well-expressed, soluble and stable proteins from combinatorial libraries of consensus ankyrin repeat proteins. J Mol Biol 332(2):489–503. 12. Urvoas A, et al. (2010) Design, production and molecular structure of a new family of artificial alpha-helicoidal repeat proteins (αRep) based on thermostable HEAT-like repeats. J Mol Biol 404(2):307–327. 13. Enkhbayar P, Kamiya M, Osaki M, Matsumoto T, Matsushima N (2004) Structural principles of leucine-rich repeat (LRR) proteins. Proteins 54(3):394–403. 14. Kajava AV (1998) Structural diversity of leucine-rich repeat proteins. J Mol Biol 277(3): 519–527. 15. Courtemanche N, Barrick D (2008) The leucine-rich repeat domain of Internalin B folds along a polarized N-terminal pathway. Structure 16(5):705–714. 16. Dao TP, Majumdar A, Barrick D (2014) Capping motifs stabilize the leucine-rich repeat protein PP32 and rigidify adjacent repeats. Protein Sci 23(6):801–11. 17. Stumpp MT, Forrer P, Binz HK, Plückthun A (2003) Designing repeat proteins: Modular leucine-rich repeat protein libraries based on the mammalian ribonuclease inhibitor family. J Mol Biol 332(2):471–487. 18. Parker R, Mercedes-Camacho A, Grove TZ (2014) Consensus design of a NOD receptor leucine rich repeat domain with binding affinity for a muramyl dipeptide, a bacterial cell wall fragment. Protein Sci 23(6):790–800. 19. Lee S-C, et al. (2012) Design of a binding scaffold based on variable lymphocyte receptors of jawless vertebrates by module engineering. Proc Natl Acad Sci USA 109(9): 3299–3304.

6 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1413638111

blocks for cases where single self-compatible repeats are not sufficient to produce a desired shape. However, not all shapes could be designed by this approach because of the finite variability in repeat conformations. Furthermore, repeats with a low amount of well-defined secondary structure elements are not amenable to fixed-backbone design, because sequence changes are likely to cause structural changes in loop regions. Nevertheless, there should be a sufficient number of structured repeats to enable the design of a wide range of shape-optimized protein binders and biomaterials. Materials and Methods All genes were codon-optimized for expression in E. coli and synthesized by Genscript. Proteins were expressed in LB for 4 h at 37 °C and purified by Histag affinity chromatography, His-tag removal by tobacco etch virus protease cleavage, and gel filtration. Isotopically labeled proteins were expressed in M9 minimal medium supplemented with [1-13C]-glucose. Fixed-backbone design and symmetrical docking simulations were performed using Rosetta++ and Rosetta3. N-caps were designed using RosettaScripts and RosettaRemodel, and convergence of the designed sequence toward the design model was tested by ab initio structure prediction in Rosetta3. Curvatures and helical twists were calculated using Angulator (37). Detailed descriptions of all methods and all sequences are given in SI Materials and Methods. ACKNOWLEDGMENTS. We thank Andreas Barth (Stockholm University) for help with FTIR, the Lund Protein Production Platform (Lund University, www.lu.se/lp3) for protein production, Eva-Christina Ahlgren and Gunnel Karlsson for help with EM, and Sarel Fleischman for valuable comments on the manuscript. The work was supported by the Swedish Research Council (Vetenskapsrådet), Crafoord Foundation, and Defense Advanced Research Projects Agency (Subcontract 747458).

20. Aksel T, Majumdar A, Barrick D (2011) The contribution of entropy, enthalpy, and hydrophobic desolvation to cooperativity in repeat-protein folding. Structure 19(3): 349–360. 21. Kobe B, Deisenhofer J (1993) Crystal structure of porcine ribonuclease inhibitor, a protein with leucine-rich repeats. Nature 366(6457):751–756. 22. Kobe B, Deisenhofer J (1996) Mechanism of ribonuclease inhibition by ribonuclease inhibitor protein based on the crystal structure of its complex with ribonuclease A. J Mol Biol 264(5):1028–1043. 23. André I, Bradley P, Wang C, Baker D (2007) Prediction of the structure of symmetrical protein assemblies. Proc Natl Acad Sci USA 104(45):17656–17661. 24. DiMaio F, Leaver-Fay A, Bradley P, Baker D, André I (2011) Modeling symmetric macromolecular structures in Rosetta3. PLoS ONE 6(6):e20450. 25. Das R, Baker D (2008) Macromolecular modeling with rosetta. Annu Rev Biochem 77: 363–382. 26. Fleishman SJ, et al. (2011) RosettaScripts: A scripting language interface to the Rosetta macromolecular modeling suite. PLoS ONE 6(6):e20161. 27. Huang P-S, et al. (2011) RosettaRemodel: A generalized framework for flexible backbone protein design. PLoS ONE 6(8):e24109. 28. Kong J, Yu S (2007) Fourier transform infrared spectroscopic analysis of protein secondary structures. Acta Biochim Biophys Sin (Shanghai) 39(8):549–559. 29. Lundström P, et al. (2007) Fractional 13C enrichment of isolated carbons using [1-13C]or [2-13C]-glucose facilitates the accurate measurement of dynamics at backbone Calpha and side-chain methyl positions in proteins. J Biomol NMR 38(3):199–212. 30. Han B, Liu Y, Ginzinger SW, Wishart DS (2011) SHIFTX2: Significantly improved protein chemical shift prediction. J Biomol NMR 50(1):43–57. 31. Tamiola K, Acar B, Mulder FAA (2010) Sequence-specific random coil chemical shifts of intrinsically disordered proteins. J Am Chem Soc 132(51):18000–18003. 32. Kobe B, Deisenhofer J (1995) Proteins with leucine-rich repeats. Curr Opin Struct Biol 5(3):409–416. 33. Buchanan SGSC, Gay NJ (1996) Structural and functional diversity in the leucine-rich repeat family of proteins. Prog Biophys Mol Biol 65(1-2):1–44. 34. Truhlar SME, Torpey JW, Komives EA (2006) Regions of IkappaBalpha that are critical for its inhibition of NF-kappaB.DNA interaction fold upon binding to NF-kappaB. Proc Natl Acad Sci USA 103(50):18951–18956. 35. Croy CH, Bergqvist S, Huxford T, Ghosh G, Komives EA (2004) Biophysical characterization of the free IkappaBalpha ankyrin repeat domain in solution. Protein Sci 13(7): 1767–1777. 36. Kloss E, Courtemanche N, Barrick D (2008) Repeat-protein folding: New insights into origins of cooperativity, stability, and topology. Arch Biochem Biophys 469(1):83–99. 37. Bublitz M, et al. (2008) Crystal structure and standardized geometric analysis of InlJ, a listerial virulence factor and leucine-rich repeat protein with a novel cysteine ladder. J Mol Biol 378(1):87–96.

Rämisch et al.

Computational de novo design of a self-assembling peptide with predefined structure.

Exploring the repeat protein universe through computational protein design.

Computational design of a self-assembling symmetrical β-propeller protein.

Computational design of a pH-sensitive IgG binding protein.

Computational protein design enables a novel one-carbon assimilation pathway.

Rosetta:MSF: a modular framework for multi-state computational protein design.

Improving computational efficiency and tractability of protein design using a piecemeal approach. A strategy for parallel and distributed protein design.

A combined NMR and computational approach to investigate peptide binding to a designed Armadillo repeat protein.

Removing T-cell epitopes with computational protein design.

A critical analysis of computational protein design with sparse residue interaction graphs.

Direct Calculation of Protein Fitness Landscapes through Computational Protein Design.

Computational design of a sulfoglucuronide derivative fitting into a hydrophobic pocket of dengue virus E protein.

Towards the computational design of protein post-translational regulation.

Fractal geometry: a design principle for living organisms.

Computational de novo design of a four-helix bundle protein--DND_4HB.

BWM*: A Novel, Provable, Ensemble-based Dynamic Programming Algorithm for Sparse Approximations of Computational Protein Design.

Discovery of substrates for a SET domain lysine methyltransferase predicted by multistate computational protein design.

Antibody humanization by structure-based computational protein design.

An efficient parallel algorithm for accelerating computational protein design.

Computational Design of the β-Sheet Surface of a Red Fluorescent Protein Allows Control of Protein Oligomerization.

cOSPREY: A Cloud-Based Distributed Algorithm for Large-Scale Computational Protein Design.

Boosting protein stability with the computational design of β-sheet surfaces.

Probing the minimal determinants of zinc binding with computational protein design.

Design of a 1 DOF MEMS motion stage for a parallel plane geometry rheometer.