This article was downloaded by: [Rutgers University] On: 14 August 2015, At: 20:26 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: 5 Howick Place, London, SW1P 1WG

Journal of Biomolecular Structure and Dynamics Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/tbsd20

Investigation of arc repressor DNA-binding specificity by comparative molecular dynamics simulations a

a

Wei Song & Jun-Tao Guo a

Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA Accepted author version posted online: 11 Dec 2014.Published online: 14 Jan 2015.

Click for updates To cite this article: Wei Song & Jun-Tao Guo (2015) Investigation of arc repressor DNA-binding specificity by comparative molecular dynamics simulations, Journal of Biomolecular Structure and Dynamics, 33:10, 2083-2093, DOI: 10.1080/07391102.2014.997797 To link to this article: http://dx.doi.org/10.1080/07391102.2014.997797

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Journal of Biomolecular Structure and Dynamics, 2015 Vol. 33, No. 10, 2083–2093, http://dx.doi.org/10.1080/07391102.2014.997797

Investigation of arc repressor DNA-binding specificity by comparative molecular dynamics simulations Wei Song and Jun-Tao Guo* Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA Communicated by Ramaswamy H. Sarma

Downloaded by [Rutgers University] at 20:26 14 August 2015

(Received 24 November 2014; accepted 8 December 2014) Transcription factors regulate gene expression through binding to specific DNA sequences. How transcription factors achieve high binding specificity is still not well understood. In this paper, we investigated the role of protein flexibility in protein–DNA-binding specificity by comparative molecular dynamics (MD) simulations. Protein flexibility has been considered as a key factor in molecular recognition, which is intrinsically a dynamic process involving fine structural fitting between binding components. In this study, we performed comparative MD simulations on wild-type and F10V mutant P22 Arc repressor in both free and complex conformations. The F10V mutant has lower DNA-binding specificity though both the bound and unbound main-chain structures between the wild-type and F10V mutant Arc are highly similar. We found that the DNA-binding motif of wild-type Arc is structurally more flexible than the F10V mutant in the unbound state, especially for the six DNA base-contacting residues in each dimer. We demonstrated that the flexible side chains of wild-type Arc lead to a higher DNA-binding specificity through forming more hydrogen bonds with DNA bases upon binding. Our simulations also showed a possible conformational selection mechanism for Arc-DNA binding. These results indicate the important roles of protein flexibility and dynamic properties in protein–DNA-binding specificity. Keywords: protein flexibility; binding specificity; transcription factor; molecular dynamics simulations; arc repressor

1. Introduction DNA-binding proteins display a diverse degree of binding specificity, ranging from highly specific to nonspecific interactions. Specific interactions between proteins and their target DNA sequences are of paramount importance in essential biological processes, such as maintenance of genetic integrity and transcriptional regulation. Aberrant changes in binding specificity caused by mutations in transcription factors (TFs) and their cognate DNA binding sites can have serious consequence (Filippova et al., 2002; Latchman, 1996; Luscombe & Thornton, 2002; Schott et al., 1998). For example, mutations on transcription factors p53 and CTCF alter their DNA-binding specificity, resulting in the development of cancer (Chene, 1999; Filippova et al., 2002; Gohler et al., 2005; Thukral, Lu, Blain, Harvey, & Jacobsen, 1995). Knowledge of the structural basis of binding specificity is, therefore, central to understanding TF–DNA interactions, the evolution and divergence of TF-DNA-binding specificity (Baker, Tuch, & Johnson, 2011). The significance also lies in its practical applications in rational design of new proteins with novel binding specificity in medicine and biotechnology (Ashworth et al., 2006; Porteus & Baltimore, 2003; Uil, Haisma, & Rots, 2003; Urnov et al., 2005). *Corresponding author. Email: [email protected] © 2015 Taylor & Francis

The binding specificity between transcription factors and DNA can be described in two interrelated terms: “sequence specificity” and “degree of specificity” (Figure 1). For example, homeodomain proteins Ubx (from Drosophila melanogaster) and Nkx3-1 (from Homo sapiens) bind to different sequence patterns (Figure 1(A) and (B)), yet the two binding profiles show similar degrees of binding specificity in terms of their high sequence conservations (Mathelier et al., 2014). On the other hand, Dbx1 (from Mus musculus), another homeodomain protein, has a similar binding sequence pattern (Figure 1(C), position 4-10 “TTAATTA”) to Ubx, but the positions are less conserved when compared with the Ubx-binding profiles as Dbx1 allows more variations at each position. In other words, the degree of binding specificity for Dbx1 is much lower than the other two homeodomains. In this study, we mainly focus on the structural determinants for various degrees of binding specificity as in the case between Ubx and Dbx1, not just the sequence-level specificity between Ubx and Nkx3-1. Previous studies have primarily focused on the determinants of sequence specificity of protein–DNA complex structures. No simple recognition rules between particular amino acids and specific bases have been found,

2084

W. Song and J.-T. Guo

Downloaded by [Rutgers University] at 20:26 14 August 2015

Figure 1. Binding specificity of DNA-binding proteins. The binding sequence logos for three homeodomains, (A) Ubx, (B) Nkx3-1 and (C) Dbx1 as annotated in Jaspar (Mathelier et al., 2014).

although some preferred pairings of amino acids and bases were observed (Luscombe, Laskowski, & Thornton, 2001; Matthews, 1988; Pabo & Nekludova, 2000). It has been suggested that the binding specificity between proteins and DNA is achieved by a combination of two major readout mechanisms, direct/base readout and indirect/shape readout (Luscombe & Thornton, 2002; Michael Gromiha, Siebers, Selvaraj, Kono, & Sarai, 2004; Rohs et al., 2009, 2010). The direct/base readout describes the contributions of atomic contacts from the “master” residues that interact directly with DNA bases, in either major or minor groove. About twothirds of the direct readout belongs to complex networks of hydrogen bonds (H-bonds), the major force of protein–DNA-binding specificity (Luscombe & Thornton, 2002). The indirect/shape readout, on the other hand, describes the contribution of DNA structure deformation and indirect contacts between proteins and DNA (Rohs, West, Liu, & Honig, 2009; Rohs et al., 2009, 2010). Direct/base readout is considered as a major contributor for specificity between DNA-binding protein families while indirect/shape readout contributes to the binding specificity differences within a protein family (Rohs et al., 2010). While the above efforts provide valuable insights towards understanding protein–DNA binding specificity, our current knowledge is mainly “static”, meaning the information is based on the snapshots of the representative protein–DNA complex structures. The protein–DNA recognition, however, is by nature a dynamic process that involves delicate structural fitting between proteins and DNA (Fuxreiter, Simon, & Bondos, 2011). More complete description in terms of dynamic features is needed to fully understand the specificity in protein– DNA recognition. Protein flexibility and intrinsic disorder have been considered as a key factor in molecular recognition, including TF–DNA interactions (Janin & Sternberg,

2013). The intrinsically disordered fragments or flexible regions in proteins can affect the conformational preferences and molecular recognition even though these flexible or disordered regions are not directly involved in the binding interface (Fuxreiter et al., 2011; Uversky & Dunker, 2013). Transcription factors exert their regulatory function by first recognizing and binding to specific DNA sequences to either activate or repress the expression of their target genes. In a genomic background, transcription factors need to quickly find their cognate DNA sites in order to efficiently respond to upstream regulatory signals. Flexible regions or intrinsically disordered segments of transcription factors can facilitate the binding site search by the “Monkey-bar” mechanism (Vuzman & Levy, 2012). Flexibility is also considered important in the transition from non-specific TF–DNA binding during the search phase to specific binding once a cognate binding target sequence is found (Fuxreiter et al., 2011; Kalodimos et al., 2004). During the transition, it is thought that flexibility allows the proteins to fine tune their conformations for better fits and hence leads to higher specificity (Schulz, 1979; Zhou, 2012). Proteins may use both side chain and backbone flexibilities to fit the interface more precisely and achieve higher specificity. To investigate the role of dynamic features, especially the flexibility in specific protein–DNA interactions, we performed comparative molecular dynamics (MD) simulations of transcription factor Arc repressor in P22 bacteriophage. P22 Arc is a repressor of the ribbonhelix-helix family (Andreeva et al., 2004; Schildbach, Karzai, Raumann, & Sauer, 1999), which binds to a 22-bp operator site (Figure 2(E), the bold bases represent the critical binding sites) as a homotetramer, or more precisely, a dimer of dimers (Figure 2(A)). The monomer Arc repressor is relatively unstructured and becomes more stable as a dimer, which has been demonstrated by

Downloaded by [Rutgers University] at 20:26 14 August 2015

Arc repressor DNA binding specificity by comparative molecular dynamics simulations

2085

Figure 2. Wild-type and mutant Arc repressors from enterobacteria phage P22. (A) Wild-type Arc-DNA complex structure 1BDT, tetramer; (B) Unbound wild-type Arc structure 1ARR, dimer, with Phe10 shown; (C) F10V mutant Arc-DNA complex structure 1BDV, tetramer; (D) Unbound F10V mutant Arc structure 1BAZ, dimer, with Val10 shown; (E) The operator sequence, where the bold bases represent the critical binding sites. The green DNA bases in complex structures represent TAGA box.

both experimental methods and computational disorder predictions (Burgering, Hald, Boelens, Breg, & Kaptein, 1995; Gu, Gribskov, & Bourne, 2006; Marcovitz & Levy, 2009; Peng, Jonas, & Silva, 1993). Each dimer binds to the major groove of the operator site of the TAGA box with an anti-parallel β-sheets formed between two Arc chains. P22 Arc is an ideal transcription factor for studying the role of flexibility in binding specificity. First of all, Schildbach et al. have demonstrated that an Arc F10V mutant, in which the phenylalanine (F) at position 10 is mutated to valine (V) (Figure 1(B) and (D)), has lower binding affinity to the cognate binding site than the wild-type Arc and has lower binding specificity as the mutant can bind both the cognate and non-cognate sequences “equally well” (Schildbach et al., 1999). Secondly, the unbound Arc and Arc-DNA complex structures for both the wild-type and F10V mutant have been solved (Figure 2(A)–(D)) (Berman et al., 2000; Schildbach et al., 1999). Thirdly, the mutation site at the 10th position, which locates in the recognition β-sheets (Figure 2(B) and (D)), is not a “master” residue that

interacts directly with DNA base(s), meaning the fragments that interact with the major groove are the same between the wild-type and the mutant P22 Arc (Raumann, Knight, & Sauer, 1995) (Supplementary Figure S1). Most importantly, comparisons of the X-ray crystal structures based on the backbone atoms of the Arc protein showed that both the free and Arc-DNA complex wild-type protein structures are highly similar to their counterparts in F10V mutant. Both the DNA backbone structures are also similar in Arc-DNA complex (the DNA sequences in both the wild-type and the F10V mutant complex structures are the same) (Schildbach et al., 1999). Therefore, this set of four structures provide a perfect system to investigate the dynamic effects on the protein–DNA-binding specificity as the wild-type and the mutant have highly similar static conformations but with different degrees of binding specificity. It was previously suggested that Phe10 contributes to the DNA-binding affinity and specificity through direct interaction with the DNA backbone (Schildbach et al., 1999). Considering that the Phe10 and DNA backbone

2086

W. Song and J.-T. Guo

interaction are similar when Arc binds to both cognate and non-cognate DNA sequences, the effect of the mutation on the flexibility/stability of the DNA binding motif might play a bigger role in the binding specificity change. In this study, we focus on the structural fluctuations and dynamic properties of the wild-type and F10V mutant and their contributions to specific DNA binding. 2. Materials and methods

one Arc dimer (containing chain A and chain B) that interacts with the half site (left side) of the 22-bp operator sequence (Figure 2(E)). The images of protein and protein–DNA complex structures were prepared using PyMOL (The Molecular Graphics System, Version 1.5.0.4 Schrodinger, LLC.) and VMD (Humphrey, Dalke, & Schulten, 1996). The DNA binding profiles were prepared with WebLogo (Crooks, Hon, Chandonia, & Brenner, 2004).

Downloaded by [Rutgers University] at 20:26 14 August 2015

2.1 Structures used in the study All initial structures for simulations are from Protein Data Bank (PDB) as shown in Figure 2 (Berman et al., 2000). The wild-type (1BDT) and F10V mutant (1BDV) Arc-DNA complex structures consist of two Arc dimers bound to a double-strand operator DNA, while the unbound structures (1ARR and 1BAZ) are dimers. The structures of missing residues in the N or C terminals in 1BDT, 1BDV and 1BAZ are modelled using their original PDB structures as templates with Modeller, a homology modelling programme (Šali & Blundell, 1993). The template structures are kept rigid during modelling process so that the main structures of the protein domains remain unchanged. Hydrogen atoms were added using H++ server (Gordon et al., 2005). 2.2 Molecular dynamics simulation and analysis All-atom MD simulations were carried out with GROMACS (Van Der Spoel et al., 2005) and Charmm-27 force field (MacKerell, Banavali, & Foloppe, 2000). The simulation of each conformation was repeated for four times. The systems were solvated in a cubic box of 12 nm for the Arc-DNA complex structures or 10 nm for the unbound Arc proteins, containing about 55,000 and 32,000 TIP3p water molecules, respectively (Jorgensen, Chandrasekhar, Madura, Impey, & Klein, 1983). The simulation was done in NPT ensemble with temperature kept at 298 K by Berendsen thermostat (Berendsen, Postma, van Gunsteren, DiNola, & Haak, 1984) and the pressure maintained at 1 bar by pressure coupling. Sodium chloride was added to neutralize the system to a concentration of 100 mM. Periodic boundary conditions and Particle Mesh Ewald algorithm were applied to calculate the electrostatic interactions (Darden, York, & Pedersen, 1993). A cut-off distance of 11 Å was used for van der Waals interactions. P-LINCS was used to restrain bonds involving hydrogen atoms (Hess, 2008). The time step for integrating equations of motion is 2 fs. Each simulation was run for 100 ns and the conformations were saved every 10 ps. The first 10 ns of each trajectory was not used for follow-up analysis. The H-bond is defined based on geometry criterion with a maximum donor– acceptor distance of 3.5 Å and a hydrogen-donor–acceptor angle of 30°. For comparison purposes, we only used

3. Results 3.1. Structural fluctuations and flexibility of wild-type and F10V mutant Arc repressors It has been demonstrated that the structures of the wildtype Arc are very similar to the F10V mutant, in both unbound Arc and Arc-DNA complexes, by comparing their main-chain atoms of residues 6-46 (Schildbach et al., 1999). We observed similar results in terms of the overall structural fluctuations of the Arc dimers for residues 6-46 (Supplementary Figure S2). For the bound Arc dimers, the wild-type and the mutant have similar average Cα RMSDs (Root Mean Square Deviation) and standard deviations (Supplementary Figure S2(B)). For the unbound structures, the wild-type Arc is more flexible than the F10V mutant. The structural variation of the unbound wild-type Arc in Figure S2(A) will be discussed in detail in Section 3.4. We next focus our analysis on the Arc-DNA-binding motif (the anti-parallel β-sheets formed between two Arc chains) and its interaction with the operator sequence.

3.1.1. Backbone structural fluctuations We first compared the structural variations for the backbone Cα atoms of residues 7 to 14 (in both chain A and B) between the wild-type and the F10V mutant Arc for both the unbound and bound structures. The structural variations of the 16 Cα atoms in one dimer during 90 ns simulations are shown in Figure 3. To compare the variations during simulation, we used both the average RMSD and coefficient of variation (CV) (a normalized measure of variation, CV = σ/μ, where σ is the standard deviation and μ is the mean value). For the unbound structures, the RMSD/CV of the DNA-binding motif in the wild-type and F10V mutant are 0.70 Å/0.23 and 0.78 Å/0.15, respectively, suggesting that the F10V mutant with smaller CV has slightly less fluctuations for the recognition β-sheets in the unbound conformations (Figure 3(A)). For the bound structures, RMSD/CV of the binding motif is 0.62 Å/0.18 and 0.57 Å/0.19 for wild-type and F10V mutant, suggesting no big difference observed for the backbone structures of the β-sheets between the wild-type and mutant Arc (Figure 3(B)).

Downloaded by [Rutgers University] at 20:26 14 August 2015

Arc repressor DNA binding specificity by comparative molecular dynamics simulations

Figure 3. Structural fluctuations of residues 7-14 in the unbound and bound Arc conformations. RMSD of the residues 7-14 for Cα in the (A) unbound and (B) bound structures; RMSD of the heavy atoms of the side chains of Gln9, Asn11, Arg13 in the (C) unbound and (D) bound structures.

3.1.2. Side-chain structural fluctuations The side-chain structural fluctuations were analysed using two different methods, RMSD of side-chain heavy atoms and the distributions of side-chain rotamers in the DNA-binding fragment. First, we analysed the variations of positions of all side-chain heavy atoms from the six base-interacting “master” residues Gln9, Asn11 and Arg13 (in both chain A and B) during MD simulations and demonstrated the difference in flexibility between the wild-type and F10V mutant. Unlike the similar Cα variations in the unbound structures (Figure 3(A)), the six residues in wild-type Arc have a much larger sidechain RMSD (3.09 Å) than that in the F10V mutant (1.90 Å) even though the CV is smaller, 0.1 vs. 0.15, suggesting that the side chains of wild-type Arc are more flexible than the F10V mutant in the unbound structures (Figure 3(C)). In the bound form, the wild-type has smaller side-chain RMSD compared to the F10V mutant (1.20 Å vs. 1.61 Å) while having similar CV (0.17 vs. 0.16) (Figure 3(D)), suggesting that the residues become more stable upon binding to DNA. We also investigated the side-chain angles for these six residues. The results are consistent with the side-chain RMSD in terms of structural variations. The differences of the side-chain flexibility of the master residues between unbound and bound Arc protein structures in terms of the side-chain rotamer distributions of χ1 are shown in Supplementary Figure S3. Out of the six master residues in one Arc dimer, three of

2087

them A:Gln9, A:Asn11 and B:Arg13 have noticeable differences in distributions of rotamer angles between the wild-type and mutant structures (where A:Gln9 means Gln9 in chain A) while the other three, B:Gln9, B:Asn11 and A:Arg13, have very similar angle distributions (data not shown). In the unbound wild-type Arc, while there are two peaks at −60° and 180° for A:Asn11 and three peaks at −60°, 60° and 180° for B: Arg13, there is only one peak at −60° for both residues in the mutant (Supplementary Figure S3(B) and (C)). More peaks for wild-type Arc suggest higher side-chain flexibility of the key residues in the unbound structures. In the bound Arc-DNA complex structures, there is only one peak for A:Asn11 in both wild-type and mutant Arc. For B:Arg13, there are two peaks at −60° and 60° in mutant Arc while only one peak at −60° in its wild-type, suggesting that the mutant side chains are more flexible than the wild-type once bound to DNA (Supplementary Figure S3(E) and (F)). The side-chain rotamers of A:Gln9 are different from A:Asn11 and B:Arg13. In the unbound form, both wild-type and mutant Arc have three peaks at −60°, 60° and 180° with different frequencies (Supplementary Figure S3(A)). However, in the bound form, the wild-type Arc is dominated by the −60° conformation while the majority of the F10V mutant is 180°. We suspect that this observation may reflect the H-bonds patterns. The distributions of χ1 for different number of H-bonds between A:Gln9 and DNA are calculated and shown in Supplementary Figure S4. We found that when there are no H-bond between A:Gln9 and DNA, the χ1 angles are mostly 180°. However, when one or more H-bonds are formed, the distribution of angles shifts from 180° to mainly around −60° and a few around 60° (See more results and discussions in Section 3.2). Taken together, the results from both the side-chain RMSD and rotamer analysis in recognition β-sheets are consistent in that the F10V mutation affects the structural flexibility of the DNA-binding motif in the unbound state, especially for the side chains. In the unbound state, wild-type Arc is more flexible than the F10V mutant, which may facilitate the conformational fitting and the formation of specific contacts between Arc and DNA during the binding process. While in the complex structures, the F10V mutant has relatively larger structural variations (or low stability) than the wild-type Arc. To elucidate these differences, we next analysed the interactions between Arc and DNA as well as interactions within protein itself. 3.2. Hydrogen bonds between Arc and operator DNA The number of H-bonds between Arc and DNA for the six key base-interacting residues, Gln9, Asn11 and Arg13 in both chain A and B is analysed because

Downloaded by [Rutgers University] at 20:26 14 August 2015

2088

W. Song and J.-T. Guo

H-bonds between protein side chains and DNA bases are considered as the major contributor of binding specificity (Luscombe & Thornton, 2002). The major H-bond pairs in wild-type Arc-DNA complex in our MD simulations are in agreement with the observed H-bond patterns in crystal structure (Raumann, Rould, Pabo, & Sauer, 1994). Analysis of all the snapshots during the simulations revealed the major difference in these patterns between the wild-type and the F10V mutant. In the wildtype Arc-DNA complex structure, residue A:Gln9 forms, on average, one H-bond with base A at position 18 of the DNA reverse chain (Figure 4(A)) and B:Arg13 forms another H-bond with base A at position 4 of the DNA forward chain (Figure 4(F)). However, these two H-bonds do not exist in the mutant F10V Arc-DNA complex structure. Other H-bonds formed by A:Asn11, A: Arg13 and B:Gln9 with DNA exist in both wild-type and the mutant complex structures (Figure 4(B) and (D)). The only exception is the position 6 of the DNA forward chain with B:Asn11. On average, B:Asn11 in the F10V mutant forms more H-bonds with base 6T than that in the wild-type Arc-DNA complex (Figure 4(E)). We also calculated the potential energy (Coulomb energy + Lennard-Jones energy) between Arc and DNA, which is −2846.56 +/− 155.26 kJ/mol for wild-type and −2380.44 +/− 232.40 kJ/mol for F10V mutant complex, respectively (Supplementary Figure S5). The combined results of H-bonds and energy suggest that the wild-type Arc has a higher binding affinity and binding specificity with DNA than the F10V mutant, which is in good agreement with the experimental data (Schildbach et al., 1999).

3.3. Hydrogen bonds in Arc dimer What caused these differences in H-bonds and binding energy between the wild-type and the F10V mutant? We have shown that the unbound wild-type Arc is more flexible than the F10V mutant while the bound wild-type complex is more stable than the mutant (Figures 3 and 4). One possibility is that the side chains of the base-interacting residues in the wild-type Arc, Gln9, Asn11 and Arg13 are more flexible and have a higher probability of forming H-bonds with the bases through a favourable fitting process while these same residues in the F10V mutant are not in the ideal positions to interact with the bases through H-bonds due to their “rigid” conformation. To address these questions, we analysed the interaction patterns among the six key residues within the Arc dimers in both unbound and bound conformations. Figure 5 shows differences in dynamic H-bond networks between the wild-type and the F10V mutant, painting a clear picture regarding how the point mutation F10V affects the binding affinity and specificity. In the unbound conformation, the key base-interacting residues of the wild-type Arc form fewer H-bonds among themselves than the F10V mutant, which makes the side chains of these residues relatively more flexible before binding to DNA (Figure 5(A)). In the Arc-DNA complexes, there are more H-bonds between the pairs of A:Gln9-B:Arg13, A:Asn11-B:Arg13 and A:Asn11-B: Asn11 in the F10V mutant than those in the wild-type Arc (Figure 5(B)). The flexible Gln9, Asn11 and Arg13 side chains in the wild-type Arc, on the other hand, have

Figure 4. Analysis of the number of H-bonds between master residues (Gln9, Asn11 and Arg13) of Arc protein and DNA. In Arc chain A, H-bonds are formed, (A) between Gln9 and base “A18” in the reverse DNA chain, (B) between Asn11 and base “C16” in the reverse DNA chain and (C) between Arg13 and base “G8” in the forward DNA chain. In Arc chain B, H-bonds are formed, (D) between Gln9 and bases “A7” in the forward DNA chain, (E) between Asn11 and bases “G5” and “T6” in the forward DNA chain and (F) between Arg13 and bases “A4” and “G5” in the forward DNA chain. F’: forward DNA chain; R’: reverse DNA chain. The index below the DNA bases is defied in Figure 2(E).

Arc repressor DNA binding specificity by comparative molecular dynamics simulations

2089

Downloaded by [Rutgers University] at 20:26 14 August 2015

Figure 5. Analysis of the number of H-bonds within the Arc dimer structures between different master residues. H-bonds (A) in the unbound Arc structures and (B) in the Arc-DNA complex structures. Amino acids are shown in one-letter code: Q-Gln; N-Asn; R-Arg. A:Q9 represents Gln9 in Arc chain A.

more chances to be H-bonded with the bases, resulting in higher binding affinity and specificity. In the F10V mutant, however, these master residues are more constrained since they predominantly interact with each other through several H-bonds, leaving no room for interacting with the DNA bases. The distributions of the number of H-bonds and dynamic interactions among the key residues during MD simulations are shown in Figure 6. In the F10V unbound structures, A:Asn11 forms more H-bonds with A:Gln9 and B:Asn11 than the wild-type. For example, A:Gln9 and A:Asn11 form one H-bond (26%) while there is none in the wild-type (Figure 6(A)). Though the two Asn11 residues between Arc chain A and chain B can form one H-bond in both wild-type and F10V mutant Arc structures, the frequencies are very different. In the wild-type unbound structure, only about 47% of the conformations have one H-bond. However, in the F10V mutant, over 83% of the unbound conformations have one H-bond, nearly doubling the number of H-bonds in the wild-type structure among all the snapshots of the conformations (Figure 6(B)). These dynamic features between the two Asn11 residues in the unbound structures are retained after binding to cognate DNA sequences (Figure 6(D)). In the F10V Arc-DNA complex structures, A:Gln9 forms one (29%) or two (13%) H-bonds with B:Arg13, while the wild-type Arc-DNA complexes have much smaller H-bond numbers between these two residues (Figure 6(C)). The conformational fluctuations and the H-bonds results from all the four cases, which contained the unbound wild-type and F10V mutant as well as the wildtype and mutant Arc-DNA complexes (Figures 3–6), revealed that the DNA-binding motif of the wild-type Arc is more flexible than the F10V mutant and forms fewer H-bonds among the key base-interacting residues

Figure 6. Analysis of H-bond distributions within the Arc dimer structures. In the unbound Arc dimer structures, H-bonds are formed, (A) between Q9 and N11 of chain A and (B) between N11 of chain A and chain B. In the bound structures, the H-bonds are formed, (C) between Q9 of chain A and R13 of chain B and (D) between N11 of chain A and chain B. Amino acids are shown in one-letter code: Q-Gln; N-Asn; R-Arg.

in both unbound and bound structures. With the presence of the cognate DNA sequences, the relatively flexible Gln9, Asn11 and Arg13 in the wild-type Arc form H-bonds with the core bases, resulting a complex structure with high binding affinity and hence high specificity. In the F10V mutant, the DNA-interacting fragment is less flexible with more H-bonds among the key baseinteracting residues affecting its interaction with DNA. Figure 7 shows one such an example involving residues

Downloaded by [Rutgers University] at 20:26 14 August 2015

2090

W. Song and J.-T. Guo

Figure 7. Snapshots of H-bonds formed between Arc and DNA or within the Arc dimer structures. Gln9 of chain A forms H-bonds, (A) with A18 of DNA in wild-type, but (D) with Arg13 of chain B in F10V mutant. Asn11 of chain B forms H-bonds, (B) with G5 of DNA in wild-type, but (E) with both Asn11 of chain A and T6 of DNA in F10V mutant. Arg13 of chain B forms H-bonds, (C) with A4 of DNA in wild-type, but (F) with Gln9 of chain A in F10V mutant. Amino acids are shown in one-letter code: Q-Gln; N-Asn; R-Arg. F’: forward DNA chain; R’: reverse DNA chain.

Gln9, Asn11 and Arg13. In the wild-type Arc-DNA complex structure, A:Gln9 forms two H-bonds with base A18 on the reverse DNA chain (Figures 7(A) and 4(A)). In the F10V mutant, instead of forming H-bond(s) with the DNA, A:Gln9 forms one H-bond with B:Arg13 (Figures 7(D) and 5(B)). Similarly, B:Arg13 has one H-bond with base A4 on the forward DNA chain in the wild-type complex (Figures 7(C) and 4(F)) while in the F10V mutant complex it forms an H-bond with A:Gln9 (Figures 7(F) and 5(B)). As for B:Asn11, it forms one hydrogen bond with G5 on the forward DNA chain in the wild-type (Figures 7(B) and 4(E)) while in the F10V mutant it forms one hydrogen bond with both A:Asn11 and T6 on the forward DNA chain, respectively (Figures 7(E), 5(B) and 4(E)). We have demonstrated through comparative MD simulations that the key DNA-interacting residues in the Arc dimer are more flexible and form more H-bonds with DNA when compared to the F10V mutant, leading to higher binding affinity and binding specificity.

Phe10 side-chains, grouped into three main stages, State I, State II and State III (Figure 8(A)), which corresponds to three conformational states (Figure 8(D)–(F)). The Phe10 side chains are packed in State I (Figure 8(D)), which is similar to the conformations in the unbound crystal structure (Figure 8(B)). In State III, two Phe10 side chains are in an open conformation (Figure 8(F)), which is similar to that in the bound crystal structure (Figures 8(C)). These results are in good correspondence to the conformational selection mechanism for Arc binding to DNA. The structure of the bound state already exists in the conformation ensemble of the unbound state. The open position of Phe10 side chain may contact the DNA backbone and help the wild-type Arc to form a stable binding with DNA. The conformational selection process of P22 Arc-DNA binding may also be followed by the conformational adjustment (Grünberg, Leckner, & Nilges, 2004; Wlodarski & Zagrovic, 2009), where the three master residues adjust their relative positions with DNA bases to form more specific contacts.

3.4. Conformational selection for Arc-DNA binding Two popular molecular recognition models are the induced fit and conformational selection mechanisms (Csermely, Palotai, & Nussinov, 2010). In the simulation of the unbound wild-type Arc dimer, we observed that the wild-type backbone is more flexible and has an obvious structural change at the end of simulation (Figure S2 (A)). We extended this simulation to 200 ns and calculated the RMSD value of the side-chain heavy atoms for the two Phe10 residues in the Arc dimer. The results show a clear change of the relative positions of the two

4. Discussion Structural features leading to specific protein–DNA recognition remains elusive. In this paper, we presented results of dynamic protein–DNA interactions and demonstrated the contribution of protein flexibility to TF–DNA-binding specificity through comparative molecular dynamics study of a set of P22 Arc repressors, wild-type and F10V mutant Arc conformations before (unbound) and after (bound) binding to cognate DNA sequences. Our simulation results are in high agreement

Downloaded by [Rutgers University] at 20:26 14 August 2015

Arc repressor DNA binding specificity by comparative molecular dynamics simulations

2091

Figure 8. Positions of Phe10 in wild-type Arc dimer reveal conformational selection mechanism. (A) There are three stages for RMSD of side-chain heavy atoms of two Phe10. The crystal structures of wild-type Arc dimer with two side chains of Phe10 (B) packed in unbound state and (C) open in bound state. The snapshots of relative positions of Phe10 side chains in unbound state, showing (D) packed, (E) only one open and (F) both open states. F10/F10’: two Phe10 in the Arc dimer.

with the experimental data that indicated the F10V mutant has lower DNA binding affinity and specificity than the wild-type Arc repressor (Schildbach et al., 1999). Moreover, we presented new results to elucidate the mechanisms of decreasing binding affinity and specificity caused by the F10V mutation. The uniqueness of this work in studying the role of flexibility in specific protein–DNA binding is that there are fewer variables to consider when analysing the results between wild-type and F10V mutant structures. Both the wild-type and F10V mutant complex structures has the same DNA sequences (1BDT and 1BDV) and previous studies have shown that their unbound and bound Arc protein structures are almost the same (Schildbach et al., 1999). More importantly, the “master” residues in the DNA recognition β-sheets of the Arc dimer are the same between the wild-type and F10V mutant. Otherwise, it would not be surprising to witness any changes in DNA binding specificity if one or more “master” residues are mutated, which is not the case in this work. Since in both the unbound and bound states, the wild-type and F10V mutant main-chain structures are very similar, Schildbach et al. suggested that the interaction between Phe10 and the sugar phosphate backbone contribute to the high binding specificity in the wild-type Arc (Schildbach et al., 1999). We showed here that protein flexibility might play an even bigger role in achieving high binding specificity. Even though the backbone structures of the major groove interacting motif are highly similar between the wild-type and the F10V

mutant, the H-bond networks within the Arc dimers are quite different in both the unbound and bound structures. Before binding to the DNA, the key base-interacting residues, Gln9, Asn11 and Arg13 are more rigid in the F10V mutant structures and form more H-bonds among the dimer than those in the wild-type structures. The side chains of these residues in the wild-type Arc, on the other hand, are more flexible, priming them for base contact by H-bonds, a key component contributing to the binding specificity. For the wild-type P22 Arc in the unbound state, the positions of Phe10 side chains are changing between packed and open states, which suggests a possible conformational selection mechanism in Arc-DNA binding. One possible mechanism of this F10V mutationinduced binding specificity change is allostery. Allostery is an intrinsic property of many proteins that are essential for signal transduction, catalysis and gene regulation (Tsai, del Sol, & Nussinov, 2008). The structural perturbation from allostery may not show any change of the backbone shape as we demonstrated in this study (Tsai et al., 2008). Instead, it modulates the interaction networks within the structure and the flexibility/stability of the protein, which in turn affects the protein’s interaction with DNA. Our simulation results revealed that the F10V mutation affects side-chain fluctuations of the key base-interacting residues, Gln9, Asn11 and Arg13, in the free unbound state (Figure 3(C)) and forms more H-bonds among themselves (Figure 5(A)) when compared with the wild-type Arc protein. The decreased side-chain flexibility results in fewer H-bonds between

2092

W. Song and J.-T. Guo

Arc and cognate DNA sequence (Figure 4), which clearly show the critical role of flexibility in protein– DNA-binding specificity. List of abbreviations CV MD PDB RMSD TF

coefficient of variation molecular dynamics protein data bank root mean squared deviation transcription factor

Funding

Downloaded by [Rutgers University] at 20:26 14 August 2015

This work was supported by the National Science Foundation [grant numbers DBI0844749, DBI1356459] to J.G.

References Andreeva, A., Howorth, D., Brenner, S. E., Hubbard, T. J., Chothia, C., & Murzin, A. G. (2004). SCOP database in 2004: Refinements integrate structure and sequence family data. Nucleic Acids Research, 32, D226–229. Ashworth, J., Havranek, J. J., Duarte, C. M., Sussman, D., Monnat, R. J., Jr, Stoddard, B. L., & Baker, D. (2006). Computational redesign of endonuclease DNA binding and cleavage specificity. Nature, 441, 656–659. Baker, C. R., Tuch, B. B., & Johnson, A. D. (2011). Extensive DNA-binding specificity divergence of a conserved transcription regulator. Proceedings of the National Academy of Sciences, 108, 7493–7498. Berendsen, H. J. C., Postma, J. P. M., van Gunsteren, W. F., DiNola, A., & Haak, J. R. (1984). Molecular dynamics with coupling to an external bath. The Journal of Chemical Physics, 81, 3684–3690. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., … Bourne, P. E. (2000). The Protein Data Bank. Nucleic Acids Research, 28, 235–242. Burgering, M. J., Hald, M., Boelens, R., Breg, J. N., & Kaptein, R. (1995). Hydrogen exchange studies of the Arc repressor: Evidence for a monomeric folding intermediate. Biopolymers, 35, 217–226. Chène, P. (1999). Mutations at position 277 modify the DNAbinding specificity of human p53 in vitro. Biochemical and Biophysical Research Communications, 263(1), 1–5. Crooks, G. E., Hon, G., Chandonia, J. M., & Brenner, S. E. (2004). WebLogo: A sequence logo generator. Genome Research, 14, 1188–1190. Csermely, P., Palotai, R., & Nussinov, R. (2010). Induced fit, conformational selection and independent dynamic segments: An extended view of binding events. Trends in Biochemical Sciences, 35, 539–546. Darden, T., York, D., & Pedersen, L. (1993). Particle mesh Ewald: An N⋅log(N) method for Ewald sums in large systems. The Journal of Chemical Physics, 98, 10089–11009. Filippova, G. N., Qi, C. F., Ulmer, J. E., Moore, J. M., Ward, M. D., Hu, Y. J., … Lobanenkov, V. V. (2002). Tumorassociated zinc finger mutations in the CTCF transcription factor selectively alter tts DNA-binding specificity. Cancer Research, 62, 48–52. Fuxreiter, M., Simon, I., & Bondos, S. (2011). Dynamic protein-DNA recognition: Beyond what can be seen. Trends in Biochemical Sciences, 36, 415–423.

Gohler, T., Jager, S., Warnecke, G., Yasuda, H., Kim, E., & Deppert, W. (2005). Mutant p53 proteins bind DNA in a DNA structureselective mode. Nucleic Acids Research, 33, 1087–1100. Gordon J. C., Myers J. B., Folta T., Shoja V., Heath L. S., & Onufriev, A. (2005). H++: A server for estimating pKas and adding missing hydrogens to macromolecules. Nucleic Acids Research, 33, W368–W371. Grünberg, R., Leckner, J., & Nilges, M. (2004). Complementarity of structure ensembles in protein-protein binding. Structure, 12, 2125–2136. Gu, J., Gribskov, M., & Bourne, P. E. (2006). Wiggle-predicting functionally flexible regions from primary sequence. PLoS Computational Biology, 2, e90. Hess, B. (2008). P-LINCS: A parallel linear constraint solver for molecular simulation. Journal of Chemical Theory & Computation, 4, 116–122. Humphrey, W., Dalke, A., & Schulten, K. (1996). VMD: Visual molecular dynamics. Journal of Molecular Graphics, 14, 33–38. Janin J., Sternberg M. J. (2013). Protein flexibility, not disorder, is intrinsic to molecular recognition. F1000 Biology Reports, 5: 2. doi:10.3410/B5-2 Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W., & Klein, M. L. (1983). Comparison of simple potential functions for simulating liquid water. The Journal of Chemical Physics, 79, 926–935. Kalodimos, C. G., Biris, N., Bonvin, A. M., Levandoski, M. M., Guennuegues, M., Boelens, R., & Kaptein, R. (2004). Structure and flexibility adaptation in nonspecific and specific protein-DNA complexes. Science, 305, 386–389. Latchman, D. S. (1996). Transcription-factor mutations and disease. New England Journal of Medicine, 334, 28–33. Luscombe, N. M., Laskowski, R. A., & Thornton, J. M. (2001). Amino acid-base interactions: A three-dimensional analysis of protein-DNA interactions at an atomic level. Nucleic Acids Research, 29, 2860–2874. Luscombe, N. M., & Thornton, J. M. (2002). Protein-DNA interactions: Amino acid conservation and the effects of mutations on binding specificity. Journal of Molecular Biology, 320, 991–1009. MacKerell, A. D., Jr, Banavali, N., & Foloppe, N. (2000). Development and current status of the CHARMM force field for nucleic acids. Biopolymers, 56, 257–265. Marcovitz, A., & Levy, Y. (2009). Arc-repressor dimerization on DNA: Folding rate enhancement by colocalization. Biophysical Journal, 96, 4212–4220. Mathelier, A., Zhao, X., Zhang, A. W., Parcy, F., WorsleyHunt, R., Arenillas, D. J., … Wasserman, W. W. (2014). JASPAR 2014: An extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Research, 42, D142–D147. Matthews, B. W. (1988). Protein-DNA interaction. No code for recognition. Nature, 335, 294–295. Michael Gromiha, M., Siebers, J. G., Selvaraj, S., Kono, H., & Sarai, A. (2004). Intermolecular and intramolecular readout mechanisms in protein-DNA recognition. Journal of Molecular Biology, 337, 285–294. Pabo, C. O., & Nekludova, L. (2000). Geometric analysis and comparison of protein-DNA interfaces: Why is there no simple code for recognition? Journal of Molecular Biology, 301, 597–624. Peng, X., Jonas, J., & Silva, J. L. (1993). Molten-globule conformation of Arc repressor monomers determined by highpressure 1H NMR spectroscopy. Proceedings of the National Academy of Sciences, 90, 1776–1780.

Downloaded by [Rutgers University] at 20:26 14 August 2015

Arc repressor DNA binding specificity by comparative molecular dynamics simulations Porteus, M. H., & Baltimore, D. (2003). Chimeric nucleases stimulate gene targeting in human cells. Science, 300, 763. Raumann, B. E., Knight, K. L., & Sauer, R. T. (1995). Dramatic changes in DNA-binding specificity caused by single residue substitutions in an Arc/Mnt hybrid repressor. Nature Structural Biology, 2, 1115–1122. Raumann, B. E., Rould, M. A., Pabo, C. O., & Sauer, R. T. (1994). DNA recognition by β-sheets in the Arc represser– operator crystal structure. Nature, 367, 754–757. Rohs, R., Jin, X., West, S. M., Joshi, R., Honig, B., & Mann, R. S. (2010). Origins of specificity in protein-DNA recognition. Annual Review of Biochemistry, 79, 233–269. Rohs, R., West, S. M., Liu, P., & Honig, B. (2009). Nuance in the double-helix and its role in protein-DNA recognition. Current Opinion in Structural Biology, 19, 171–177. Rohs, R., West, S. M., Sosinsky, A., Liu, P., Mann, R. S., & Honig, B. (2009). The role of DNA shape in protein-DNA recognition. Nature, 461, 1248–1253. Šali, A., & Blundell, T. L. (1993). Comparative protein modelling by satisfaction of spatial restraints. Journal of Molecular Biology, 234, 779–815. Schildbach, J. F., Karzai, A. W., Raumann, B. E., & Sauer, R. T. (1999). Origins of DNA-binding specificity: Role of protein contacts with the DNA backbone. Proceedings of the National Academy of Sciences, 96, 811–817. Schott, J. J., Benson, D. W., Basson, C. T., Pease, W., Silberbach, G. M., Moak, J. P., … Seidman, J. G. (1998). Congenital heart disease caused by mutations in the transcription factor NKX2-5. Science, 281, 108–111. Schulz G. E. (1979). Nucleotide binding proteins. In M. Balaban (Ed.), Molecular mechanisms of biological recognition (pp. 79–94). North-Holland: Elsevier/North-Holland Biomedical Press.

2093

Thukral, S. K., Lu, Y., Blain, G. C., Harvey, T. S., & Jacobsen, V. L. (1995). Discrimination of DNA binding sites by mutant p53 proteins. Molecular & Cellular Biology, 15, 5196–5202. Tsai, C. J., del Sol, A., & Nussinov, R. (2008). Allostery: Absence of a change in shape does not imply that allostery is not at play. Journal of Molecular Biology, 378(1), 1–11. Uil, T. G., Haisma, H. J., & Rots, M. G. (2003). Therapeutic modulation of endogenous gene function by agents with designed DNA-sequence specificities. Nucleic Acids Research, 31, 6064–6078. Urnov, F. D., Miller, J. C., Lee, Y. L., Beausejour, C. M., Rock, J. M., Augustus, S., … Holmes, M. C. (2005). Highly efficient endogenous human gene correction using designed zinc-finger nucleases. Nature, 435, 646–651. Uversky V. N., Dunker A. K. (2013). The case for intrinsically disordered proteins playing contributory roles in molecular recognition without a stable 3D structure. F1000 Biology Reports, 5: 1. doi:10.3410/B5-1 Van Der Spoel, D., Lindahl, E., Hess, B., Groenhof, G., Mark, A. E., & Berendsen, H. J. (2005). GROMACS: Fast, flexible, and free. Journal of Computational Chemistry, 26, 1701–1718. Vuzman, D., & Levy, Y. (2012). Intrinsically disordered regions as affinity tuners in protein-DNA interactions. Molecular BioSystems, 8, 47–57. Wlodarski, T., & Zagrovic, B. (2009). Conformational selection and induced fit mechanism underlie specificity in noncovalent interactions with ubiquitin. Proceedings of the National Academy of Sciences, 106, 19346–19351. Zhou, H. X. (2012). Intrinsic disorder: Signaling via highly specific but short-lived association. Trends in Biochemical Sciences, 37, 43–48.

Investigation of arc repressor DNA-binding specificity by comparative molecular dynamics simulations.

Transcription factors regulate gene expression through binding to specific DNA sequences. How transcription factors achieve high binding specificity i...
571KB Sizes 0 Downloads 8 Views