G Model

ARTICLE IN PRESS

YSCDB-1684; No. of Pages 12

Seminars in Cell & Developmental Biology xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Seminars in Cell & Developmental Biology journal homepage: www.elsevier.com/locate/semcdb

Review

Intrinsically disordered proteins and multicellular organisms A. Keith Dunker a,∗ , Sarah E. Bondos b , Fei Huang a , Christopher J. Oldfield a a Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University Schools of Medicine and Informatics, Indianapolis, IN 46202, United States b Department of Molecular and Cellular Medicine, Texas A&M Health Science Center, College Station, TX 77843, United States

a r t i c l e

i n f o

Article history: Available online xxx Keywords: Signaling Differentiation Intrinsic disorder Cell cycle Alternative splicing Post-translational modification

a b s t r a c t Intrinsically disordered proteins (IDPs) and IDP regions lack stable tertiary structure yet carry out numerous biological functions, especially those associated with signaling, transcription regulation, DNA condensation, cell division, and cellular differentiation. Both post-translational modifications (PTMs) and alternative splicing (AS) expand the functional repertoire of IDPs. Here we propose that an “IDP-based developmental toolkit,” which is comprised of IDP regions, PTMs, especially multiple PTMs, within these IDP regions, and AS events within segments of pre-mRNA that code for these same IDP regions, allows functional diversification and environmental responsiveness for molecules that direct the development of complex metazoans. © 2014 Elsevier Ltd. All rights reserved.

Contents 1. 2.

3.

4.

5. 6.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Characterization of IDPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1. Predictors of IDPs and IDP regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2. Molecular functions of IDPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3. The IDP-based signaling/regulation toolkit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4. IDPs and biological processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multicellular organisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1. Simple versus complex multicellular organisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2. Molecular features underlying complex multicellularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IDPs and complex multicellular organisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1. Connecting the cells: the adherins and the essential roles of IDP regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2. Inter-cell communication: the nuclear hormone receptors and the roles of IDP regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3. Developmental pathways: Wnt signaling and the roles of IDP regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4. Regulation of developmental programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5. Cell-type-specific molecular biology and biochemistry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6. Importance of PTMs and AS events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparisons with non-metazoan multicellular organisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary and future experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

∗ Corresponding author at: Health Information and Translational Sciences Building, 410 W. 10th Street, Suite 5000, Indianapolis, IN 46202, United States. Tel.: +1 317 278 9220/9650; mobile: +1 317 278 9217. E-mail addresses: [email protected] (A.K. Dunker), [email protected] (S.E. Bondos), [email protected] (F. Huang), cjoldfi[email protected] (C.J. Oldfield). http://dx.doi.org/10.1016/j.semcdb.2014.09.025 1084-9521/© 2014 Elsevier Ltd. All rights reserved.

Please cite this article in press as: Dunker AK, et al. Intrinsically disordered proteins and multicellular organisms. Semin Cell Dev Biol (2014), http://dx.doi.org/10.1016/j.semcdb.2014.09.025

G Model YSCDB-1684; No. of Pages 12

ARTICLE IN PRESS A.K. Dunker et al. / Seminars in Cell & Developmental Biology xxx (2014) xxx–xxx

2

1. Introduction Intrinsically disordered proteins (IDPs) and IDP regions lack specific 3D structure yet carry out biological function (reviewed in [1,2]). We initially asked three questions: (1) Why don’t IDPs fold into 3D structures? (2) How abundant are IDPs? (3) What are their biological functions? Short answers to the three questions are as follows: first, IDPs fail to form stable 3D structure to a significant degree because of their amino acid compositions [3,4], but additional factors complicate the situation. For example, IDP regions exhibit dependence on flanking sequences [5], an effect seen earlier with regard to secondary structure formation [6]. Another example is provided by zinc fingers in which the binding of a single zinc ion causes an ∼30 residue IDP segment to become structured [7]. A third example is provided by the disorder-to-order transitions of leucine zippers [8], for which comparison of multiple zipper sequences shows considerable helix variation in the monomeric, disordered state [9]. Second, IDPs and IDP regions are common in all three domains of life, but are most abundant in eukaryotes [10–13]; and, third, the functions of IDPs and IDP regions complement those of structured proteins [11,14]. More specifically, a typical structured protein catalyzes a chemical reaction, binds a short peptide or small molecule, or transports an ion or molecule across a cell membrane [14]; whereas a typical IDP conveys a signal or regulates an enzyme activity, frequently but not always via non-catalytic often transient complex formation [11,14,15]. A significant fraction of proteins associated with cell division, gene regulation, and cellular differentiation in eukaryotes have substantial IDP regions and use these IDP regions to carry out their molecular functions [11,14]. Further studies suggest that the functional repertoires of IDPs are expanded by both post-translational modifications (PTMs) within IDP regions [16] and by alternative splicing (AS) of pre-mRNA that codes for IDP regions [17]. Here we build upon these various observations to propose that an “IDP-based developmental toolkit” –comprised of IDPs and IDP regions, post-translational modifications (PTMs), especially multiple PTMs within these IDP regions [18], and AS events that alter the pre-mRNA regions coding for IDP-segments– enabled evolution of complex multicellular organisms. Furthermore, we propose that an earlier, simplified toolkit containing just IDPs and their modulation by PTMs, contributed to the evolution of simple multicellular organisms that arose among the prokaryotes. 2. Characterization of IDPs IDPs and IDP regions have been experimentally identified by means of various biophysical or biochemical techniques such as sensitivity to protease digestion [19–21], optical rotatory dispersion [19,22], circular dichroism [23,24], intrinsic viscosity [22], X-ray crystallography [25–27], small angle X-ray scattering [28,29], gel exclusion chromatography [30], and nuclear magnetic resonance spectroscopy [31–33] among others [34,35]. Importantly, these experimental analyses have yielded a significant-sized collection of IDPs and IDP regions that provide the basis for developing an understanding of their defining characteristics as well as their biological functions. 2.1. Predictors of IDPs and IDP regions IDPs and IDP regions are generally richer than structured proteins in polar amino acids [3], in proline [36], and in net charge [4]. These compositional differences between structured proteins and IDPs enabled the development of disorder predictors [37,38]. Furthermore, just net charge and hydrophobicity yield fairly accurate disorder predictors [4], but machine learning algorithms that add

additional compositional features perform slightly better [39,40]. Currently more than 50 disorder predictors have been published [41]. Further improvement of IDP and IDP region predictors might occur when sequence information is properly encoded into IDP prediction algorithms, but to the best of our knowledge, such improvements have not yet been accomplished for any general disorder predictor. 2.2. Molecular functions of IDPs Manual curation of almost 100 IDPs and IDP regions identified 28 molecular functions [15]. A continuation of this work led to the manually curated Disordered Protein Database (DisProt) [42] which can be found at http://www.disprot.org and which now contains almost 700 chains with almost 1540 IDP regions for which 39 functions have been identified. A few examples illustrate the types of functions carried out by IDPs, including: (1) Providing flexible linkers between structured domains; (2) Providing rubber-like entropic springs; (3) Containing the sites for post-translational modifications (PTMs) such as acetylation, ADP-ribosylation, glycosylation, methylation, or phosphorylation; (4) Containing sites for regulatory protease digestion; (5) Containing autoinhibitory domains; (6) Containing sites for binding to partners such as DNA, tRNA, rRNA, mRNA, protein or metal ions such as zinc; (7) Containing signals such as the one for nuclear localization; and (8) Enabling movement through narrow pores. Some PTMs, such as phosphorylation of serines or threonines, exhibit a large preference for residues located in IDP regions [43]. Other PTMs, such as acetylation, show little or no preference for residues located in IDP regions [44]. Specific residues can undergo different modifications such as methylation or acetylation for the same lysine side chain. Such “shared residues” show a strong preference for being located in an IDP region, perhaps because the same region has to be recognized by two different modifying enzymes, which would be aided by an IDP region’s flexibility [18]. IDP binding to a protein, nucleic acid or metal ion partner such as zinc often involves a disorder-to-order transition of the IDP [45,46]. Parts of the IDP can remain unstructured in the complex but make positive or negative contributions to the binding constant. Such complexes are called “fuzzy” [47], where a particularly good example of a fuzzy complex is that formed by Sic1 and Cdc4 [33]. In this example, multiple phosphorylated motifs on Sic1 bind to a single target site on Cdc4. The Sic1 chain, including the unbound motifs, is disordered. The binding constant increases with the number of phosphorylations due to increases in the local concentration of available phosphorylated motifs. It is an open question whether IDPs can form complexes using interfaces that are fuzzy throughout, without any localized, fixed structure [48]. IDP flexibility enables fitting onto differently shaped surfaces; thus, one IDP can bind to different partners and thereby add complexity to protein–protein interaction (PPI) pathways and networks [49,50]. Likewise, IDP flexibility enables binding to unstructured RNA molecules via mutual folding [51]. With respect to DNA, IDP regions have been suggested to increase affinity and specificity by increasing the overall size of the DNA–protein interface and by enabling inter-segment transfer events by providing DNA recognition subdomains [52]. A PTM changes the shape, dynamics, and chemical properties of an IDP’s surface, with the modified IDP binding to a partner different from that of the unmodified IDP [50,53]. Thus, signaling complexity is increased by PTM modulation of IDP-based binding sites. Multiple PTMs occur in some protein regions, such as histone tails [54] and the termini of p53 [55]. Different patterns of PTMs occur on different copies of the same protein, leading to different downstream signals [54]. That is, a protein with 3 phosphorylation

Please cite this article in press as: Dunker AK, et al. Intrinsically disordered proteins and multicellular organisms. Semin Cell Dev Biol (2014), http://dx.doi.org/10.1016/j.semcdb.2014.09.025

G Model YSCDB-1684; No. of Pages 12

ARTICLE IN PRESS A.K. Dunker et al. / Seminars in Cell & Developmental Biology xxx (2014) xxx–xxx

sites, each of which could be phosphorylated or unphosphorylated, has 23 = 8 different PTM states. Such state differences often lead to different downstream consequences [54,55]. The dependence of outcome on PTM patterns has been called “the histone code” [54] or a “PTM code” [56]. Such multiple, localized PTMs often occur in IDP regions, are found in many different proteins, and are often associated with predicted or observed binding sites for partners [18]. Thus, IDPs and PTMs collaborate to create enormous signaling diversification. The function of a protein can be modified by AS of its pre-mRNA. Regions of pre-mRNA that undergo AS code for IDP segments more often than for structured segments [57]. Since IDP regions frequently contain partner binding sites, AS in an IDP region would frequently alter the set of protein–protein or protein–nucleic acid interactions, thus “rewiring” the networks [17]. Also, AS can change the length of an IDP linker joining two functional entities such as domains or motifs, thus leading to significant functional consequences [58,59]. 2.3. The IDP-based signaling/regulation toolkit

3

Table 1 The 20 biological processes most strongly correlated with predicted disorder. Keywords 1. Differentiation 2. Transcription 3. Transcription regulation 4. Spermatogenesis 5. DNA condensation 6. Cell cycle 7. mRNA processing 8. mRNA splicing 9. Mitosis 10. Apoptosis 11. Protein transport 12. Meiosis 13. Cell division 14. Ubl conjugation pathway 15. Wnt signaling pathway 16. Neurogenesis 17. Chromosome partition 18. Ribosome biogenesis 19. Chondrogenesis 20. Growth regulation

Number of families 422 16,653 189 189 130 612 249 180 215 211 579 170 385 244 41 74 67 71 6 45

Average sequence length

Z-score

439 443 280 280 300 494 516 459 620 465 422 639 452 526 477 667 495 392 333 355

18.8 14.6 14.3 13.9 13.4 12.2 10.9 10.1 9.4 9.3 8.8 8.7 8.5 8.1 6.6 6.6 6.4 5.9 5.6 5.1

Due to their flexibility, IDPs and IDP regions by themselves provide the basis for complex sets of interactions between signaling and regulatory proteins and many types of partners. These capabilities are especially important for eukaryotic transcription factors [60]. This innate signaling complexity is further enhanced by PTMs, especially multiple PTMs, and by AS events. Thus, these activities are proposed to all work in concert to provide the basis for a highly responsive, extremely complex cell signaling system [53,61]. The lack of pre-mRNA and its AS in prokaryotes means that these organisms would be limited to a simpler toolkit containing just IDPs and their modulation by PTMs. The importance of the AS component of this toolkit is underscored by the following observations. Unicellular eukaryotes tend to lack introns, and thus AS is absent as a mechanism for modulating their proteins. On the other hand, multicellular eukaryotes tend to have both introns and AS. Thus a correlation between organism complexity and alternative splicing has been suggested for some time [57,62]. Recently, AS was assessed for 47 eukaryotic species with the finding of a strong correlation between AS and organism complexity as estimated by cell type number [63].

Modified from Ref. [14] with permission. Copyright 2007 American Chemical Society. The keywords and data were derived from Swiss-Prot release 48, 2005.

2.4. IDPs and biological processes

Phylogenetic analysis suggests that simple multicellular organisms have arisen independently at least 25 times [64]. Such organisms are characterized by having all of their cells in contact with the external milieu [65]. Simple multicellular organisms show very little differentiation and have only a few different types of cells [66]. Indeed, even unicellular organisms can exhibit a limited amount of differentiation into different cell types [67]. Representative examples of simple multicellular organisms among the prokaryotes include filamentous cyanobacteria, actinobacteria, and myxobacteria, and examples among the eukaryotes include filamentous diatoms and volvox [68]. Complex multicellular organisms have arisen independently about ten times among the eukaryotes, including once for the animals; twice each for red, brown and green algae and three times for the fungi [69]. The absence of complex multicellular organisms among the prokaryotes leads to a fundamental question: what differences between eukaryotes and prokaryotes enabled the former but not the latter to evolve into complex multicellular organisms? A discussion of this question is given below in Section 5.

Disorder predictors have been used to identify additional functions that are likely carried out by IDPs and IDP regions [11,14]. In brief, sets of specific function-associated proteins and random-function proteins are assembled. Functions likely to be disorder-associated are then identified by increased disorder prediction for the function-specific set as compared to the random-function sets. For important additional details see [11,14]. Experiments using this general approach were carried out for the proteins in yeast [11] and later for the proteins in the Swiss-Prot Database [14]. Table 1 from [14] shows the results for the top 20 biological processes associated with IDP regions. Biological processes ranked 1, 4, 10, 12, 15, 16, 19 and 20 are all associated with cellular differentiation, whereas processes 5, 6, 9, 13, and 17 are associated with cell division. Processes 2, 3, 7, 8, and 18 are associated with protein synthesis, which underlies both cell division and differentiation. The earlier study on yeast [11] identifies many of the same processes relating to cell division and protein synthesis as being associated with IDPs, but of course fails to identify processes associated with cellular differentiation because yeast is a unicellular organism. Overall, the data in Table 1 suggest that IDP-dependent signaling is often involved in the developmental biology of multicellular

organisms. To understand this involvement more deeply we need to understand, first, which signaling processes are distinguished by their selective use by multicellular organisms rather than unicellular organisms, and, second, the mechanisms by which IDPs support the processes that underlie the development of multicellular organisms. 3. Multicellular organisms Here we describe current views regarding the evolution of multicellular organisms and the molecules and mechanisms needed to bring them about. These considerations set the stage for determining whether or not IDPs are involved in processes used specifically for developmental biology. 3.1. Simple versus complex multicellular organisms

3.2. Molecular features underlying complex multicellularity Complex multicellular organisms have been suggested to have evolved from simpler unicellular ancestors by the addition of

Please cite this article in press as: Dunker AK, et al. Intrinsically disordered proteins and multicellular organisms. Semin Cell Dev Biol (2014), http://dx.doi.org/10.1016/j.semcdb.2014.09.025

G Model YSCDB-1684; No. of Pages 12 4

ARTICLE IN PRESS A.K. Dunker et al. / Seminars in Cell & Developmental Biology xxx (2014) xxx–xxx

the following three components: (1) Biopolymers that physically connect the various cells to each other; (2) Molecules for communication between cells; and (3) Programs that control the development into different types of cells [68–70]. The developmental programs require spatial and temporal regulation. The end result is an organism that has different types of cells (or tissues), each responding to a particular set of signals and each carrying out a particular set of biochemical reactions. Thus, in addition to the first three components, herein we add two additional components: (4) Regulation of the developmental programs and (5) Cell-typespecific molecular biology and biochemistry. These five capabilities acquired during the evolution to complex multicellular organisms are discussed in the following section. 4. IDPs and complex multicellular organisms In this section we briefly discuss the roles of IDPs and IDP regions in proteins associated with formation of cell connections, with cell communication, with the important developmental pathways, with regulation of developmental pathways, and with the development of cell-type-specific biochemistry. Our focus is on the metazoans in part because more data are available for these organisms. 4.1. Connecting the cells: the adherins and the essential roles of IDP regions Metazoan cell adhesion is based on multiple members of several protein families. Individual family members have been shown to use IDP regions to carry out crucial aspects of their functions, including a member of the immunoglobulin superfamily [71], an integrin [72], a selectin [73], and a cadherin [74]. Space limitations prevent discussion of all of these examples. The “calcium-dependent adhesion proteins” or cadherins have been chosen for further discussion because they are thought to play particularly important roles in cell segregation and boundary formation during development [75]. The cadherins are used throughout the metazoans, from sponges to flies to mammals, to hold cells together [76]. The calciumdependent structured binding domains, called cadherin repeats, stick to each other thereby providing an adhesive interaction. These repeats are shared by all the members of the cadherin superfamily, which includes not only the classical cadherins but also the protocadherins, the desmosomal proteins and the ungrouped family members [76]. The classical cadherins are subdivided according to their tissue distribution such as epithelial (E-cadherin), placental (P-cadherin), neuronal (N-cadherin), retinal (R-cadherin), and muscle (M-cadherin) [77]. The cadherins are all transmembrane proteins containing an extracellular domain with multiple adhesive cadherin repeats, typically 4–6 but as high as 34 [76]. Most cadherins contain a single transmembrane segment and a cytosolic domain on the inside of the plasma membrane [77]. The cytosolic domains of E-cadherin and of desmoglein 1 have been experimentally characterized as being IDP regions by means of NMR spectroscopy as well as other methods including disorder prediction [74,78]. Prediction and experimental examination of several example proteins suggest that the cytosolic domains of single pass membrane proteins are usually IDP domains [79]. Indeed, the various cell adhesion proteins listed above are all single pass membrane proteins and representative members of these protein families exhibit IDP cytoplasmic domains [71–73]. The structure of the complex between the binding region of the cytosolic domain of E-cadherin and ␤-catenin has been determined by X-ray crystallography [80] as shown in Fig. 1. The E-cadherin

Fig. 1. Intrinsic disorder in cellular adhesion proteins exemplified by the cadherin family. (A) Ribbon representation of the complex (PDB ID 1I7X) between the disordered intracellular C-terminus of murine cadherin (red) bound to the ARM repeat domain of ␤-catenin (blue). (B) Predicted and experimental intrinsic disorder and order in two cadherin family members, where transmembrane helices are shown aligned to the vertical gray bar with extracellular domains and cytosolic domains on the left and right, respectively. PONDR® VSL2B [158] predictions of intrinsic disorder are shown as heat maps (value key lower right), where disorder scores near 1 (red) and 0 (blue) indicate predicted disorder and order, respectively. Experimentally determined order and disorder regions are shown as blue and red, respectively, domain boxes below the prediction. (C) Predicted intrinsic disorder in five cadherin family members.

structure in the complex has a large interaction area with its partner and lacks globularity, and, if computationally removed from the complex without structural change, has a very large surface area. These structural characteristics have been shown to be associated with intrinsic disorder before binding [81]. Indeed, experiments show that this region of E-cadherin lacks structure [74], and the binding of E-cadherin to ␤-catenin involves a disorder-to-order transition [80]. Anchoring the adhesion protein to the cytoskeleton strengthens the inter-cell connection. As can be seen in the crystal structure, use of an IDP region for this anchor enables a large, secure molecular interface [80]. Another feature of IDP regions is their ability to change shape and thereby bind to multiple partners [49,50]. Using this feature of IDP regions, the cadherins bind to large numbers of partners [74,78,80]. Overall, we conclude that IDP regions play an essential role in cell adhesion mediated by the cadherins. 4.2. Inter-cell communication: the nuclear hormone receptors and the roles of IDP regions Like the cadherins, nuclear hormone receptors are used by all the metazoans from sponges to flies, to mammals [82,83]. For mammals, many different nuclear hormone receptors (NHRs) and their

Please cite this article in press as: Dunker AK, et al. Intrinsically disordered proteins and multicellular organisms. Semin Cell Dev Biol (2014), http://dx.doi.org/10.1016/j.semcdb.2014.09.025

G Model YSCDB-1684; No. of Pages 12

ARTICLE IN PRESS A.K. Dunker et al. / Seminars in Cell & Developmental Biology xxx (2014) xxx–xxx

Fig. 2. Intrinsic disorder and binding partners for the estrogen receptor. Predicted disorder (same heat map as in Fig. 1) and domain organization are shown in the upper and lower bars. Domains, from N- to C-terminus, are: N-terminal domain (NTD), DNA binding domain (DBD), hinge domain, ligand binding domain (LBD), and C-terminal domain (CTD). A cartoon representation of these domains is shown, with structures for the DBD (PDB ID 1HCQ) and LBD (PDB ID 3ERD) and ovals for the intrinsically disordered domains NTD, hinge, and CTD. Both structures contain two copies of the domain, distinguished by darker or lighter color. For the LBD, two functional elements are indicated: H12 (bright green) and bound cofactor (red). The number of interaction partners for each domain was obtained from Didier Picard (http://www.picard.ch/downloads/ERinteractors.pdf).

cognate hormones have been identified, including the estrogen receptor and its 17-␤-estradiol ligand, the androgen receptor and its testosterone ligand, the glucocorticoid receptor and its cortisol ligand, and the retinoic acid receptor and its vitamin A ligand. Mammals typically have close to 50 NHRs, and many are called orphans because their cognate ligands have not yet been determined [83]. Following the binding to their cognate ligands, the various NHRs move to the nucleus and function as transcription factors that carry out downstream gene regulation [82,83]. NHRs typically contain ∼600 amino acid residues that have been partitioned into five or six separate regions or domains [84] as shown in Fig. 2, which uses data for the estrogen receptor (ER) as a representative example. These domains are: the N-terminal domain (NTD, also called A/B), which is non-conserved and has variable length; the DNA binding domain (DBD, also called C); the hinge domain (also called D); the cognate ligand binding domain (LBD, also called E); and the C-terminal domain (CTD, also called F), which has variable sequence. The NTD, DBD, hinge domain, LBD, and CTD all contain highly flexible regions that are important for function. NHRs have two modes of activation: constitutive activity is controlled by activation function 1 (AF1), and ligand-dependent activation is controlled by activation function 2 (AF2) [85]. AF1 and AF2 have been attributed to the NTD and LBD, respectively (indicated in Fig. 2) [86]. Additional activation functions have been identified for some NHRs [87]. Eukaryotic transcription factors are rich in IDP regions [88,89]. The NHRs are not exceptions to this trend [90]. Prediction has been used to characterize the disorder and structure in almost 400 vertebrate and invertebrate members of the NHR superfamily, including all 48 known human NHRs [91]. These predictions gave good agreement with the structure and disorder assignments from X-ray diffraction and NMR studies on 23 NHRs. Overall, these predictions suggest that the structure/disorder pattern along the NHR sequence has been largely conserved for most of the family members [91]. Studies of functional mechanisms of NHRs have revealed that intrinsic disorder permeates nearly all aspects of NHR function. The biological roles of the intrinsically disordered regions of NHRs – the NTD, the hinge domain, and the CTD – vary widely between NHRs and are not well understood. Intrinsic disorder of

5

the NTDs has been confirmed for several NTDs [92–94], where the types of disordered found range from a small amount of residual secondary structure [93] to (pre-)molten globular domains [94]. The general role of the NTD seems to be recruitment of co-factors [95]. Binding of the NTD to co-factors is generally accompanied by some gain of structure in the NTD [96], suggestive of a disorderedto-order transition region upon binding. However, the lack of any structures for bound NHR NTDs [95] suggests that this gain of structure is only partial, possibly forming a fuzzy complexes [47]. Overall, the NTD binds almost 50 different protein partners via these IDP-based mechanisms (Fig. 2). Observations made for several NHRs suggest that NTD function may be much more complex than molecular recognition. The NTD has been suggested to be one of the keys to variable hormone responses [97]. In glucocorticoid receptor (GR), phosphorylation has been observed to induce a significant increase in secondary and tertiary structure, where induced structure, rather than phosophoserine recognition, is implicated in increased affinity for partners [98]. A similar increase in structure is seen in the GR NTD in the presence of the jun dimerization protein 2, but only in constructs containing the NHR DBD, suggesting that the DBD somehow mediates the gain in structure [99]. In progesterone receptor, interaction between the NTD and TATA-box binding protein results in modulation of the LBD [100], presumably altering the affinity of the LBD for co-regulators. Clearly, the biological roles of the NHR NTD are complex, and it remains to be seen whether its mechanisms are dominated by common IDP-based themes, or are more idiosyncratic. The roles of the intrinsically disordered hinge and CTD domains are similarly ill defined. The hinge region can enable or prevent interaction of AF1 and AF2, through sequence differences or possibly post-translational modification, thereby modulating activity [101]. The hinge region has also been implicated in the regulation of nuclear import and is required for binding to response elements [102]. A similarly broad range of roles have been observed for the CTD: it has been observed to interfere with NHR dimerization [103] and implicated in reducing the affinity of the LDB for its ligands [104]. These findings for the hinge and CTD suggest that these regions play regulator roles, but through a wide variety of mechanisms. The DNA binding domain contains two zinc fingers [98] connected by a flexible IDP linker. The flexible linker enables the two zinc fingers to undergo independent rotations and transitions so that each one can dock onto its own DNA binding site while being linked to the other. Zinc finger sequences are typically IDPs that undergo disorder-to-order transitions concomitant with zinc binding [7]. With regard to the ER molecule, this region binds to >30 different partners (Fig. 2), likely enabled by its IDP-based flexibility. Though many ligand-bound structures of the LBD have been determined, it has been described as highly dynamic in the unbound state, only forming a well-defined structure upon ligand binding [105]. In the bound form, the ER LBD contains a helical region, called H12, which is highly flexible without bound ligands [106] and has been found to be unstructured in a crystal structure [107]. H12 docks onto different sites in response to agonist versus antagonist binding [108,109]. When the agonist binds, the IDP segment defines one edge of the docking site for the various coactivator molecules. On the other hand, when an antagonist binds, the IDP segment adopts a structure that occupies much of the coactivator binding site and thus prevents the binding of the coactivator molecules [108,109]. One co-activator of the ER molecule is GRIP1. This co-activator binds to the structured LBD via a short helix containing an LLXXLL motif [109]. Indeed, several other X-ray crystal structures of NHR LBDs bound to their co-activators or co-repressors involve the binding of short helical segments rather than globular domains. In our

Please cite this article in press as: Dunker AK, et al. Intrinsically disordered proteins and multicellular organisms. Semin Cell Dev Biol (2014), http://dx.doi.org/10.1016/j.semcdb.2014.09.025

G Model YSCDB-1684; No. of Pages 12 6

ARTICLE IN PRESS A.K. Dunker et al. / Seminars in Cell & Developmental Biology xxx (2014) xxx–xxx

studies of more than 200 crystal structures of globular proteins bound to short protein segment of various types of secondary structure, we find that the binding sites nearly always map to longer regions of predicted disorder [46], suggesting that most of the coactivators and co-repressors of NHRs likely use disordered regions to bind to the structured LBD. In summary, IDP domains underpin NHR function using a wide variety of mechanisms as described above. The data in Fig. 2 suggest that >100 protein partners bind to ER regions likely to be IDP segments and that >45 co-activators or co-repressors likely use IDP segments to bind to an ER LBD docking site. Thus, NHRs use the flexibility of IDP segments to bind to a large number of partners, and a similarly large number of partners use IDP segments to bind to the structured LBD. So the complex signaling carried out by the NHRs use IDP domains in many crucial ways. 4.3. Developmental pathways: Wnt signaling and the roles of IDP regions A number of developmental pathways have also been implicated in other biological processes. For example, Wnt, TGF-␤, Notch, Hedgehog, and PAR have been implicated in both embryogenesis and cancer [110]. The first three pathways and also JAK/STAT, MAPK/ERK, PI3K/AKT and NF-␬␤ pathways have been associated with both embryonic stem cell development and cancer [111]. Two additional developmental pathways are RTK and NHR [112]. Studies suggest disorder to be important for several of the aforementioned eleven pathways, including Wnt [113], TGF-␤ [114], Notch [115], NF-␬␤ [116], and NHR [90,91]. In addition, JAK/STAT and RTK pathways are both inhibited by specific members of a family of proteins that contain substantial IDP regions [117]. We identified key proteins and carried out disorder prediction for the remaining pathways. Predictions suggest that IDP regions are important for Hedgehog, JAK/STAT, PAR, MAPK/ERK, RTK, and PI3K/AKT. Thus, prior publications or predictions suggest that all eleven of these pathways utilize disorder. We chose to discuss the Wnt pathway in some detail because of its identification in Table 1 [14] and because of our previous work on this pathway [113]. Just as for the cadherins and the NHRs, Wnt has been found across a wide variety of metazoans [118] including the sponges [119]. Wnt is a secreted glycoprotein of ∼350 amino acid residues [120]. In the canonical Wnt pathway, this molecule binds simultaneously to two cell surface receptors, Frizzled and LRP [118]. Wnt binding to Frizzled also initiates two additional pathways: the calcium regulation pathway and the planar cell polarity pathway [118]. Here the focus is on the canonical pathway. In the canonical Wnt pathway, another protein, ␤-catenin, is kept at a low level by protease digestion. This digestion is regulated by a process involving the following steps: (1) multiple phosphorylations; (2) ubiquitination; and (3) proteasomal digestion [118]. The phosphorylation is brought about by a “destruction complex” the components of which are ␤-catenin, two kinases, and two scaffold proteins named axin, and adenomatous polyposis coli (APC) protein [121]. The ␤-catenin molecule was mentioned above due to its other role via its association with the cytoplasmic domain of the cadherins [74]. Following the binding of Wnt to its pair of cell surface receptors, the destruction complex dissociates or otherwise becomes inactive, unphosphorylated ␤-catenin is no longer ubiquitinylated and digested by the proteasome, causing ␤-catenin to accumulate. The accumulated, unphosphorylated ␤-catenin then translocates to the nucleus and alters gene regulation within the cell, thus leading to the subsequent change in the cell’s behavior [118]. IDP regions play multiple roles in the destruction of ␤-catenin [113]. First, the multiple phosphorylations of ␤-catenin, like other multiple localized PTMs [18], all occur in an IDP region.

Fig. 3. Axin-facilitated phosphorylation of ␤-catenin by GSK3␤ and CK1␣. The Nterminal RGS domain and C-terminal DIX domains of axin are shown connected by a ∼500 residue intrinsically disordered region, represented by a line varied in color from blue at the N-terminus to green at the C-terminus. This disordered region binds the other components of the complex: ␤-catenin, GSK3␤, and CK1␣. CK1␣ binds the intrinsically disordered region of axin in two regions flanking the GSK3␤ and CK1␣ binding sites, bringing all axin binding partners in close proximity. An experimentally characterized, intrinsically disordered, GSK3␤ and ␤-catenin binding portion of the intrinsically disordered region [125] is shown as a dotted line. Reproduced from [124] with permission.

The ubiquitinylation of ␤-catenin, like most other ubiquitinlylation events [122], is localized within an IDP region. Initiation of proteasomal digestion has been shown to be facilitated by long disordered termini extending beyond the polyubiqutination site [123], suggesting the possibility that the long IDP tail of ␤-catenin could be important for this step as well [113]. The axin scaffold protein binds ␤-catenin and the two kinases using an ∼500 residue IDP region [124]. By this non-covalent linkage, the ␤-catenin substrate and the two kinase enzymes are brought together into a small volume, thus increasing their local concentrations. The flexibility of the ␤-catenin disordered tail and the disordered region in axin allow random motions so that, given the high local concentrations, the kinases and their substrates can rapidly find each other by tethered searches, thus speeding up the rate of phosphorylation. This proposed mechanism is supported by crystal structures showing ␤-catenin and one of the kinases bound to different short segments from axin’s long disordered region [112] and also by acceleration of phosphorylation when ␤-catenin and one of the kinases are mixed with a short segment from axin that includes both binding sites [125]. We described this overall complex as a “stochastic machine [124],” which is illustrated in Fig. 3. The other member of the destruction complex, APC, is also highly disordered and uses this disorder to contribute to the overall canonical Wnt pathway. For example, APC contains multiple ␤catenin and axin binding sites, all located in very long IDP regions, and some of the ␤-catenin binding sites can be tuned by phosphorylation [126]. These multiple binding sites help to concentrate axin and ␤-catenin thereby recruiting these proteins into destruction complexes. Finally, the disordered regions of ␤-catenin very likely play yet-to-be-determined roles with regard to gene regulation function. 4.4. Regulation of developmental programs Transcription factors that contain the highly conserved DNAbinding homeodomain, which is encoded by the DNA homeobox [127], are truly ancient and observed in both unicellular and multicellular eukaryotes [128]. The homeodomain contains 60 amino

Please cite this article in press as: Dunker AK, et al. Intrinsically disordered proteins and multicellular organisms. Semin Cell Dev Biol (2014), http://dx.doi.org/10.1016/j.semcdb.2014.09.025

G Model YSCDB-1684; No. of Pages 12

ARTICLE IN PRESS A.K. Dunker et al. / Seminars in Cell & Developmental Biology xxx (2014) xxx–xxx

7

Table 2 PTM, AS, IDP and developmental biology. PubMed searches Cadherin Cadherin + PTM Cadherin + AS Nuclear hormone receptor (NHR) NHR + PTM NHR + AS Wnt signaling Wnt signaling + PTM Wnt signaling + AS HOX HOX + PTM HOX + AS

Number of hits 23,469 2899 152 76,665 5514 420 15,689 1856 89 4431 253 46

PubMed searches carried out on August 3, 2014. PTM = (phosphorylation OR acetylation OR methylation); AS = alternative splicing; NHR = (nuclear hormone OR estrogen) AND receptor.

Fig. 4. Regulation of development exemplified by two HOX proteins. The ribbon structure of the DNA (gray) bound complex of Ubx (light blue) and Exd (green) is shown, including the YPWM motif of UBX (dark blue) and disordered linker (red dotted line). The prediction of intrinsic disorder for the entire Ubx sequence is shown below the structure (same heat map as in Fig. 1), and the sequence regions corresponding to the structure are indicated by the highlights. This 3D structure was obtained from PDB ID 1B8I.

acids, except for a variant called TALE, which has a three amino acid loop extension [129]. In multicellular organisms these proteins regulate various developmental programs by modulating expression patterns of target genes in a temporal, spatial and tissuespecific manner [130]. The HOX transcription factors, which are a metazoan-specific clade in the homeodomain protein superfamily, regulate bilateran body plan development along the anteriorposterior axis [131] throughout the animal kingdom [132]. Homeodomains are composed of an N-terminal flexible arm, followed by three ␣-helices. When bound to DNA, the C-terminal ␣-helix lies in the major groove of DNA, while the flexible IDP N-terminus lies in the minor groove and forms many of the sequence-specific contacts with the DNA [133,134]. These minor groove contacts are thus very important for distinguishing DNA binding sites between the otherwise highly conserved homeodomains within the Hox family. Electrostatic interactions anchor the N-terminal arm to the minor groove to accelerate complex formation and facilitate contacts along the protein–DNA interface [135]. The bipartite nature of the homeodomain, and the flexible linker connecting each half, allows homeodomains to use an efficient “brachiation” mechanism to search DNA for binding sites [136]. Thus the disordered N-terminal arm of the homeodomain contributes both kinetically and thermodynamically to DNA binding. HOX proteins contain intrinsically disordered regions that have multiple molecular functions. For example, in the Drosophila melanogaster Hox protein Ultrabithorax (Ubx), the majority of the transcription activation domain is intrinsically disordered [137]. In addition, all disordered sequences modulate the DNA binding affinity of the homeodomain [138]. In the process of gene regulation, HOX proteins often collaborate with other transcription factors, most of which are bound by the intrinsically disordered regions [139]. In one example important for Drosophila development, Ultrabithorax (Ubx) binds another homeodomain protein, Extradenticle (Exd), to regulate a subset of Ubx target genes. Ubx generally binds Exd via a short “YPWM” motif, which is joined to the homeodomain by an IDP-based flexible tether [140] as shown in Fig. 4. This motif also inhibits binding by Ubx monomers to Ubx-Exd heterodimer binding sites, thus preventing mis-regulation [58]. Inhibition of DNA binding by the YPWM motif can be enhanced by shortening

the flexible linker. Thus the intrinsically disordered sequences in Ubx play multiple roles in transcription regulation and coordinate different molecular functions. The intrinsically disordered regions in Ubx are also alternatively spliced and phosphorylated, allowing tissue-specific cell processes to regulate multiple Ubx activities [141,142]. The importance of the disordered regions of Ubx for gene regulation is underscored by the observation that the disordered character of these regions has been conserved through evolution [138]. 4.5. Cell-type-specific molecular biology and biochemistry Different types of cells are thought to carry out different biochemical processes because specific sets of genes are turned on and off in each different type of cell. In addition, IDP-associated tissuespecific AS events and IDP-associated tissue-specific PTMs provide supplemental mechanisms for generating cell-specific molecular biology and biochemistry. Studies of tissue-specific AS shows these typically map to IDP regions containing binding sites for protein partners [17,143]. Thus, tissue-specific AS “re-wires” protein–protein interaction (PPI) networks and pathways, thereby altering the biochemistry of the cell. To illustrate this idea, we show how alternative splicing of the pre-mRNA coding for BRCA1 [57] changes both the PPI network and also the network involving DNA interactions (Fig. 5). This example shows that AS can modulate not only PPI networks but also gene regulatory networks as well. The progesterone receptors, which are members of the NHR superfamily, undergo multiple PTMs that are regulated in a species-, tissue-, and cell-specific manner. The PTMs include phosphorylation, acetylation, ubiquitination, and SUMOylation. Just as for other proteins that undergo multiple PTMs [18], the interplay among these PTMs leads to complex downstream signaling. Such complexities can then help to explain how tissue- and gene-specific differences in regulation are achieved in the same organism [144]. 4.6. Importance of PTMs and AS events To test whether PTMs and AS events modulate the cadherins, the NHRs, the Wnt pathway, and the HOX proteins, we carried out simple keyword text mining. The results are given in Table 2, showing that PTMs and AS events are very likely involved in modulating the various developmental functions carried out by these proteins. Following are a few observations gleaned from these publications. Phosphorylation of certain sites within the IDP cytoplasmic domain of E-cadherin leads an almost three orders of magnitude increase in ␤-catenin binding affinity. This tighter binding is

Please cite this article in press as: Dunker AK, et al. Intrinsically disordered proteins and multicellular organisms. Semin Cell Dev Biol (2014), http://dx.doi.org/10.1016/j.semcdb.2014.09.025

G Model YSCDB-1684; No. of Pages 12

ARTICLE IN PRESS A.K. Dunker et al. / Seminars in Cell & Developmental Biology xxx (2014) xxx–xxx

8

Fig. 5. Modulation of interaction networks by intrinsic disorder and alternative splicing exemplified by BRCA1. BRCA1 contains a 1543 residue long disordered region flanked by ordered domains at the N- and C-terminus. Predictions of intrinsic disorder (same heat map as in Fig. 1) show good agreement with experimental characterization. Interactions of BRCA1 with several partners are indicated by labeled boxes corresponding to the regions sufficient for mediating the interaction. Many alternatively spliced variants of BRCA have been observed, three of which – S1, S8, and S12 – are shown indicating the included regions (thick green bars) and excluded regions (dotted line). Removal of regions in splice variants corresponds to loss of interactions, which can result in orthogonal interactions for splice variants, e.g. S8 and S12. Modified from [57].

accompanied by increased formation of structure in the IDP region of cadherin [145]. Phosphorylation of the estrogen receptor is shown to alter ligand, DNA, and co-activator binding. These phosphorylationdependent binding alterations are suggested to be significant in endocrine therapy of breast cancer [146]. Diabetes leads to modified distributions of axin spliceoforms. These modified distributions seem to contribute to the development of diabetes-associated retinopathy [147]. 5. Comparisons with non-metazoan multicellular organisms With regard to their protein-encoding genes, prokaryotes have both IDPs and PTMs, but lack introns and the intron-dependent AS mechanisms. Prokaryotes have evolved to form simple multicellular organisms multiple times but have never evolved into complex multicellular organisms. Thus, we speculate that the signaling enabled by IDPs and PTMs is sufficient for the development of simple multicellular organisms among the prokaryotes but not for the development of complex multicellular organisms. Of course, simple multicellular eukaryotes such as slime mold have IDPs that are modulated by both PTMs and AS [12]. One possibility is that the amount of IDP is reduced in such organisms as compared to the more complex multicellular eukaryotes. This seems to be the case for slime mold [62]. The speculation that IDPs plus PTMs enabled the development of simple multicellular organisms among the prokaryotes makes testable predictions. For example, contrary to what was believed for a number of years, prokaryotes contain tyrosine kinases as well as serine, threonine, apartate and histidine kinases [161]. The phosphorylation events carried out by tyrosine kinases are

involved in developmental processes such as spore formation in Myxococcus xanthus [161]. However, it is unclear whether these developmental-associated PTM sites are located in IDP regions as required by our proposal. To estimate the structural environments of the development-associated PTMs in prokaryotes, we will collect the sequence locations of these PTMs and then use text-mining to search for relevant structural studies and also use disorder prediction to determine whether these sites are located mostly in regions likely to be disordered. Among the complex multicellular organisms, we have focused on metazoans mainly because the needed data are readily available. As for plants, the proteome of the model organism Arabidopsis thaliana as well as the proteomes of several other plants contain substantial fractions of IDP residues [148], large numbers of PTMs [149], and their pre-mRNAs undergo AS thereby modulating their IDPs [63]. Thus, plants contain the key components of our proposed developmental toolkit. Additional parallels between plants and metazoans suggest that the same toolkit may underlie development for both groups. For example, both plants and metazoans use small molecules to send inter-cellular signals that are transduced by binding to disorder-containing receptor proteins and both use homeodomaincontaining transcription factors for regulating development. With respect to inter-cellular communication, the plant phytohormone gibberellic acid (GA) helps to regulate seed germination, elongation growth, flowering time, and floral development [150]. Like various mammalian hormones, the function of GA is mediated by a receptor protein, which was originally identified on the basis of the rice GA-insensitive dwarf1 (gid1) mutant [151]. The gid1 gene codes for the GID1 receptor. The GID1–GA complex interacts with DELLA repressor proteins [152,153], which are so named because of a sequence

Please cite this article in press as: Dunker AK, et al. Intrinsically disordered proteins and multicellular organisms. Semin Cell Dev Biol (2014), http://dx.doi.org/10.1016/j.semcdb.2014.09.025

G Model

ARTICLE IN PRESS

YSCDB-1684; No. of Pages 12

A.K. Dunker et al. / Seminars in Cell & Developmental Biology xxx (2014) xxx–xxx

motif that starts with the DELLA motif [154]. This interaction leads to degradation of the DELLA repressor proteins via the ubiquitination/proteasome digestion pathway. Overall, the GA phytohormone responses generally correlate with DELLA repressor abundances [150]. Thus, the interaction between the GID1–GA complex and the DELLA repressor proteins is a key interaction involved in plant development. The interaction between the DELLA repressor proteins and the GID1–GA complex involves two steps. First, the GID1 protein contains a disordered NTD that becomes structured upon binding by GA. The structured NTD interacts with and covers the GA molecule, thus forming a platform for DELLA protein binding. DELLA domains are typically disordered [152,153] but become structured upon binding to the binding platform provided by the GID1 NTD [155]. This two-step sequence by which DELLA proteins bind to the GID1–GA complex mimics the binding of estradiol to the LBD of estrogen followed by the binding of the IDP tail of GRIP1. While GID1 is completely unrelated to the LBD of the NHRs, and while the DELLA repressors are plant-specific and unrelated to the NHR co-activators or repressors, both sets of interactions have interfaces formed from IDPs as the result of disorder-to-order transitions upon binding. This convergent evolutionary use of disorder for one of the key steps leading to multicellular organisms argues that, indeed, IDPs provide important signaling diversification capabilities that enabled the evolution of complex multicellular organisms. With respect to the regulation of developmental programs by transcription factors, both animals and plants mainly use homeodomain-containing proteins for this purpose. Human homeobox-containing genes code for more than 230 different homeodomain-containing transcription factors. Molecular phylogenetic analysis partitions these proteins into distinct classes called Antp, LIM, PRd, POU, HNF, SINE, TALE, CUT, ZF and CERS [156]. A similar analysis of 10 plant genomes identifies almost 300 homeodomain-containing plant transcription factors, with 148 being the largest number in any one genome. These plant transcription factors partition into classes called HD-ZIP I, II, III, and IV, PLINCF, WOX, KNOX, BEL, DDT, PHD, NDX, LD, PINTOX, and SAWADEE. The last eight (e.g. KNOX to SAWADEE) all have the three amino acid loop extensions of the TALE class, suggesting that this group is considerably expanded in plants as compared to animals [157]. PONDR® VSL2b [158] disorder predictions suggest that the various homeodomain-containing transcription factors from plants contain significant amounts of disorder. Likewise literature searches show these factors are both phosphorylated and alternatively spliced. For example, in rice KNOX undergoes tissue-specific alternative splicing [159]. The DELLA proteins discussed above also undergo phosphorylation which is a key step leading to their degradation [160]. These various data support the use of the IDP-PTM-AS-based developmental toolkit in plants as well as metazoans. If indeed the IDP-PTM-AS toolkit for signaling diversification has underpinned the development of every complex multicellular organism, then we would expect to see these same functionalities used by the key development-related proteins for the 8 other sets of organisms that evolved into complex multicellular entities. Alternatively, some complex multicellular organisms have only a few different cell types, and these may need only an IDP-PTM toolkit as suggested for the simple multicellular organisms.

6. Summary and future experiments IDPs have been found to be common among the eukaryotes [10–12] and to be heavily involved in cell signaling [15,42]. In

9

studies of signaling by IDPs, AS [57] and PTMs [18,43] are found to be important mechanisms for increasing signaling diversity. At a higher level, the signaling capacities of IDPs are found to be especially important for differentiation, transcription, transcriptional regulation, and cell division among others [11,14]. These observations suggested that IDPs might have played an important role in the original evolution of multicellular organisms. Multicellular organisms require cell adhesion, intercellular communication, and developmental programs [68–70]. In addition, multicellular organisms must have means to regulate the developmental programs, and, in the end, multicellular organisms display cell-type-specific biochemistry. Key proteins for all five functions associated specifically with multicellular organisms were shown to utilize IDPs and their modulation by AS and PTMs to carry out their respective functions. Thus, herein we propose that an IDP-AS-PTM-based toolkit provided signaling mechanisms with sufficient capacity for diversification such that natural selection could lead to the development of complex multicellular organisms. These proposals suggest that, in future studies of developmental proteins and pathways, much greater attention should be paid to whether and how disorder is used. An important approach for confirming the biological role of a given protein is to knock it out or make mutants and then determine whether the deletion or mutation yields the expected biological perturbation. An analogous strategy would be to delete or mutate an IDP or IDP region thought to be crucial for a given developmental step and then see if the results are as expected. Similar strategies could be developed to determine the importance of specific PTM or AS events. With respect to the complex multicellular organisms for which there is currently a dearth of data, as developmental studies on these organisms go forward from our point of view it will be important to determine whether the entire IDP-PTM-AS toolkit is utilized for development, whether the IDP-PTM subset is used, or whether some entirely different mechanisms are used.

References [1] Wright PE, Dyson HJ. Intrinsically unstructured proteins: re-assessing the protein structure–function paradigm. J Mol Biol 1999;293:321–31, http://dx.doi.org/10.1006/jmbi.1999.3110. [2] Oldfield CJ, Dunker AK. Intrinsically disordered proteins and intrinsically disordered protein regions. Annu Rev Biochem 2014;83:553–84, http://dx.doi.org/10.1146/annurev-biochem-072711-164947. [3] Xie Q, Arnold G, Romero P, Obradovic Z, Garner E, Dunker A. The sequence attribute method for determining relationships between sequence and protein disorder. Workshop Genome Inform 1998;9: 193–200. [4] Uversky VN, Gillespie JR, Fink AL. Why are natively unfolded proteins unstructured under physiologic conditions? Proteins 2000;41:415–27. [5] Crick SL, Ruff KM, Garai K, Frieden C, Pappu RV. Unmasking the roles of Nand C-terminal flanking sequences from exon 1 of huntingtin as modulators of polyglutamine aggregation. Proc Natl Acad Sci 2013;110:20075–80, http://dx.doi.org/10.1073/pnas.1320626110. [6] Kihara D. The effect of long-range interactions on the secondary structure formation of proteins. Protein Sci Publ Protein Soc 2005;14:1955–63, http://dx.doi.org/10.1110/ps.051479505. [7] Frankel AD, Berg JM, Pabo CO. Metal-dependent folding of a single zinc finger from transcription factor IIIA. Proc Natl Acad Sci 1987;84:4841–5. [8] Bracken C. NMR spin relaxation methods for characterization of disorder and folding in proteins. J Mol Graph Model 2001;19:3–12. [9] Das RK, Crick SL, Pappu RV. N-terminal segments modulate the ␣-helical propensities of the intrinsically disordered basic regions of bZIP proteins. J Mol Biol 2012;416:287–99, http://dx.doi.org/10.1016/j.jmb.2011.12.043. [10] Dunker AK, Obradovic Z, Romero P, Garner EC, Brown CJ. Intrinsic protein disorder in complete genomes. Workshop Genome Inform 2000;11:161–71. [11] Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J Mol Biol 2004;337:635–45, http://dx.doi.org/10.1016/j.jmb.2004.02.002. [12] Oates ME, Romero P, Ishida T, Ghalwash M, Mizianty MJ, Xue B, et al. D2 P2 : database of disordered protein predictions. Nucleic Acids Res 2013;41:D508–16, http://dx.doi.org/10.1093/nar/gks1226.

Please cite this article in press as: Dunker AK, et al. Intrinsically disordered proteins and multicellular organisms. Semin Cell Dev Biol (2014), http://dx.doi.org/10.1016/j.semcdb.2014.09.025

G Model YSCDB-1684; No. of Pages 12 10

ARTICLE IN PRESS A.K. Dunker et al. / Seminars in Cell & Developmental Biology xxx (2014) xxx–xxx

[13] Peng Z, Yan J, Fan X, Mizianty MJ, Xue B, Wang K, et al. Exceptionally abundant exceptions: comprehensive characterization of intrinsic disorder in all domains of life. Cell Mol Life Sci CMLS 2014, http://dx.doi.org/10.1007/s00018-014-1661-9. [14] Xie H, Vucetic S, Iakoucheva LM, Oldfield CJ, Dunker AK, Uversky VN, et al. Functional anthology of intrinsic disorder. 1. Biological processes and functions of proteins with long disordered regions. J Proteome Res 2007;6:1882–98, http://dx.doi.org/10.1021/pr060392u. [15] Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM, Obradovic´ Z. Intrinsic disorder and protein function. Biochemistry 2002;41:6573–82. [16] Iakoucheva LM, Radivojac P, Brown CJ, O’Connor TR, Sikes JG, Obradovic Z, et al. The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res 2004;32:1037–49, http://dx.doi.org/10.1093/nar/gkh253. [17] Buljan M, Chalancon G, Eustermann S, Wagner GP, Fuxreiter M, Bateman A, et al. Tissue-specific splicing of disordered segments that embed binding motifs rewires protein interaction networks. Mol Cell 2012;46:871–83, http://dx.doi.org/10.1016/j.molcel.2012.05.039. [18] Pejaver V, Hsu W-L, Xin F, Dunker AK, Uversky VN, Radivojac P. The structural and functional signatures of proteins that undergo multiple events of post-translational modification. Protein Sci 2014;23:1077–93, http://dx.doi.org/10.1002/pro.2494. [19] McMeekin TL. Milk proteins. J Food Prot 1952;15:57–63. [20] Fontana A, de Laureto PP, Spolaore B, Frare E. Identifying disordered regions in proteins by limited proteolysis. Methods Mol Biol Clifton NJ 2012;896:297–318, http://dx.doi.org/10.1007/978-1-4614-3704-8 20. [21] Johnson DE, Xue B, Sickmeier MD, Meng J, Cortese MS, Oldfield CJ, et al. High-throughput characterization of intrinsic disorder in proteins from the Protein Structure Initiative. J Struct Biol 2012;180:201–15, http://dx.doi.org/10.1016/j.jsb.2012.05.013. [22] Jirgensons B. Optical rotation and viscosity of native and denatured proteins. X. Further studies on optical rotatory dispersion. Arch Biochem Biophys 1958;74:57–69. [23] Weinreb PH, Zhen W, Poon AW, Conway KA, Lansbury Jr PT. NACP, a protein implicated in Alzheimer’s disease and learning, is natively unfolded. Biochemistry 1996;35:13709–15, http://dx.doi.org/10.1021/bi96. [24] Uversky VN, Gillespie JR, Millett IS, Khodyakova AV, Vasiliev AM, Chernovskaya TV, et al. Natively unfolded human prothymosin ␣ adopts partially folded collapsed conformation at acidic pH. Biochemistry 1999;38:15009–16, http://dx.doi.org/10.1021/bi990752+. [25] Arnone A, Bier CJ, Cotton FA, Day VW, Hazen Jr EE, Richardson DC, et al. A high resolution structure of an inhibitor complex of the extracellular nuclease of Staphylococcus aureus. I. Experimental procedures and chain tracing. J Biol Chem 1971;246:2302–16. [26] Kollman JM, Pandi L, Sawaya MR, Riley M, Doolittle RF. Crystal structure of human fibrinogen. Biochemistry 2009;48:3877–86, http://dx.doi.org/10.1021/bi802205g. [27] Oldfield CJ, Xue B, Van Y-Y, Ulrich EL, Markley JL, Dunker AK, et al. Utilization of protein intrinsic disorder knowledge in structural proteomics. Biochim Biophys Acta 2013;1834:487–98, http://dx.doi.org/10.1016/j.bbapap.2012.12.003. [28] Bernadó P, Svergun DI. Structural analysis of intrinsically disordered proteins by small-angle X-ray scattering. Mol Biosyst 2011;8:151–67, http://dx.doi.org/10.1039/C1MB05275F. [29] Hegde ML, Tsutakawa SE, Hegde PM, Holthauzen LMF, Li J, Oezguen N, et al. The disordered C-terminal domain of human DNA glycosylase NEIL1 contributes to its stability via intramolecular interactions. J Mol Biol 2013;425:2359–71, http://dx.doi.org/10.1016/j.jmb.2013.03.030. [30] Uversky VN, Li J, Souillac P, Millett IS, Doniach S, Jakes R, et al. Biophysical properties of the synucleins and their propensities to fibrillate: inhibition of alpha-synuclein assembly by beta- and gamma-synucleins. J Biol Chem 2002;277:11970–8, http://dx.doi.org/10.1074/jbc.M109541200. [31] Kriwacki RW, Hengst L, Tennant L, Reed SI, Wright PE. Structural studies of p21Waf1/Cip1/Sdi1 in the free and Cdk2-bound state: conformational disorder mediates binding diversity. Proc Natl Acad Sci 1996;93:11504–9. [32] Jensen MR, Ruigrok RWH, Blackledge M. Describing intrinsically disordered proteins at atomic resolution by NMR. Curr Opin Struct Biol 2013;23:426–35, http://dx.doi.org/10.1016/j.sbi.2013.02.007. [33] Mittag T, Orlicky S, Choy W-Y, Tang X, Lin H, Sicheri F, et al. Dynamic equilibrium engagement of a polyvalent ligand with a single-site receptor. Proc Natl Acad Sci 2008;105:17772–7, http://dx.doi.org/10.1073/pnas.0809222105. [34] Uversky VN, Dunker AK, editors. Intrinsically disordered protein analysis – volume 1, Methods and experimental tools; in Methods in Molecular Biology, vol. 895. New York: Humana Press; 2012. [35] Uversky VN, Dunker AK, editors. Intrinsically disordered protein analysis – volume 2, Methods and experimental tools; in Methods in Molecular Biology, vol. 896. New York: Humana Press; 2012. [36] Theillet FX, Kalmar L, Tompa P, Han KY, Selenko P, Dunker AK, et al. The alphabet of intrinsic disorder: 1. Act like a Pro: on the abundance and roles of proline residues in intrinsically disordered regions. Intrinsically Disord Proteins 2013;1:e24360. [37] Dosztányi Z, Csizmók V, Tompa P, Simon I. The pairwise energy content estimated from amino acid composition discriminates between folded

[38]

[39]

[40]

[41]

[42]

[43]

[44] [45] [46]

[47]

[48]

[49]

[50]

[51] [52]

[53]

[54] [55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

[63]

and intrinsically unstructured proteins. J Mol Biol 2005;347:827–39, http://dx.doi.org/10.1016/j.jmb.2005.01.071. Romero P, Obradovic Z, Kissinger C, Villafranca JE, Dunker AK. Identifying disordered regions in proteins from amino acid sequence. Int Conf Neural Netw 1997;9:0–5, http://dx.doi.org/10.1109/ICNN.1997.611643. Monastyrskyy B, Kryshtafovych A, Moult J, Tramontano A, Fidelis K. Assessment of protein disorder region predictions in CASP10. Proteins 2013, http://dx.doi.org/10.1002/prot.24391. Peng Z-L, Kurgan L. Comprehensive comparative assessment of insilico predictors of disordered regions. Curr Protein Pept Sci 2012;13: 6–18. He B, Wang K, Liu Y, Xue B, Uversky VN, Dunker AK. Predicting intrinsic disorder in proteins: an overview. Cell Res 2009;19:929–49, http://dx.doi.org/10.1038/cr.2009.87. Sickmeier M, Hamilton JA, LeGall T, Vacic V, Cortese MS, Tantos A, et al. DisProt: the Database of Disordered Proteins. Nucleic Acids Res 2007;35:D786–93, http://dx.doi.org/10.1093/nar/gkl893. Gao J, Thelen JJ, Dunker AK, Xu D. Musite, a tool for global prediction of general and kinase-specific phosphorylation sites. Mol Cell Proteomics MCP 2010;9:2586–600, http://dx.doi.org/10.1074/mcp.M110.001388. Gao J, Xu D. Correlation between posttranslational modification and intrinsic disorder in protein. Pac Symp Biocomput 2012:94–102. Spolar RS, Record Jr MT. Coupling of local folding to site-specific binding of proteins to DNA. Science 1994;263:777–84. Mohan A, Oldfield CJ, Radivojac P, Vacic V, Cortese MS, Dunker AK, et al. Analysis of molecular recognition features (MoRFs). J Mol Biol 2006;362:1043–59, http://dx.doi.org/10.1016/j.jmb.2006.07.087. Tompa P, Fuxreiter M. Fuzzy complexes: polymorphism and structural disorder in protein–protein interactions. Trends Biochem Sci 2008;33:2–8, http://dx.doi.org/10.1016/j.tibs.2007.10.003. Uversky VN, Dunker AK. The case for intrinsically disordered proteins playing contributory roles in molecular recognition without a stable 3D structure. F1000 Biol Rep 2013;5:1, http://dx.doi.org/10.3410/B5-1. Dunker AK, Cortese MS, Romero P, Iakoucheva LM, Uversky VN. Flexible nets. The roles of intrinsic disorder in protein interaction networks. FEBS J 2005;272:5129–48, http://dx.doi.org/10.1111/j.1742-4658.2005.04948.x. Oldfield CJ, Meng J, Yang JY, Yang MQ, Uversky VN, Dunker AK. Flexible nets: disorder and induced fit in the associations of p53 and 14-3-3 with their partners. BMC Genomics 2008;9(Suppl 1.):S1, http://dx.doi.org/10.1186/1471-2164-9-S1-S1. DiNitto JP, Huber PW. Mutual induced fit binding of Xenopus ribosomal protein L5 to 5S rRNA. J Mol Biol 2003;330:979–92. Vuzman D, Levy Y. Intrinsically disordered regions as affinity tuners in protein–DNA interactions. Mol Biosyst 2012;8:47–57, http://dx.doi.org/10.1039/c1mb05273j. Hsu W-L, Oldfield CJ, Xue B, Meng J, Huang F, Romero P, et al. Exploring the binding diversity of intrinsically disordered proteins involved in one-to-many binding. Protein Sci 2013;22:258–73, http://dx.doi.org/10.1002/pro.2207. Strahl BD, Allis CD. The language of covalent histone modifications. Nature 2000;403:41–5, http://dx.doi.org/10.1038/47412. Meek DW, Anderson CW. Posttranslational modification of p53: cooperative integrators of function. Cold Spring Harb Perspect Biol 2009;1, http://dx.doi.org/10.1101/cshperspect.a000950. Lothrop AP, Torres MP, Fuchs SM. Deciphering posttranslational modification codes. FEBS Lett 2013;587:1247–57, http://dx.doi.org/10.1016/j.febslet.2013.01.047. Romero PR, Zaidi S, Fang YY, Uversky VN, Radivojac P, Oldfield CJ, et al. Alternative splicing in concert with protein intrinsic disorder enables increased functional diversity in multicellular organisms. Proc Natl Acad Sci 2006;103:8390–5, http://dx.doi.org/10.1073/pnas.0507916103. Liu Y, Matthews KS, Bondos SE. Internal regulatory interactions determine DNA binding specificity by a HOX transcription factor. J Mol Biol 2009;390:760–74, http://dx.doi.org/10.1016/j.jmb.2009.05.059. Morgan JL, Song Y, Barbar E. Structural dynamics and multiregion interactions in dynein–dynactin recognition. J Biol Chem 2011;286:39349–59, http://dx.doi.org/10.1074/jbc.M111.296277. Fukuchi S, Homma K, Minezaki Y, Gojobori T, Nishikawa K. Development of an accurate classification system of proteins into structured and unstructured regions that uncovers novel structural domains: its application to human transcription factors. BMC Struct Biol 2009;9:26, http://dx.doi.org/10.1186/1472-6807-9-26. Dunker AK, Silman I, Uversky VN, Sussman JL. Function and structure of inherently disordered proteins. Curr Opin Struct Biol 2008;18:756–64, http://dx.doi.org/10.1016/j.sbi.2008.10.002. Schad E, Tompa P, Hegyi H. The relationship between proteome size, structural disorder and organism complexity. Genome Biol 2011;12:R120, http://dx.doi.org/10.1186/gb-2011-12-12-r120. Chen L, Bush SJ, Tovar-Corona JM, Castillo-Morales A, Urrutia AO. Correcting for differential transcript coverage reveals a strong relationship between alternative splicing and organism complexity. Mol Biol Evol 2014;31:1402–13, http://dx.doi.org/10.1093/molbev/msu083.

Please cite this article in press as: Dunker AK, et al. Intrinsically disordered proteins and multicellular organisms. Semin Cell Dev Biol (2014), http://dx.doi.org/10.1016/j.semcdb.2014.09.025

G Model YSCDB-1684; No. of Pages 12

ARTICLE IN PRESS A.K. Dunker et al. / Seminars in Cell & Developmental Biology xxx (2014) xxx–xxx

[64] Grosberg RK, Strathmann RR. The evolution of multicellularity: a minor major transition? Annu Rev Ecol Evol Syst 2007;38:621–54, http://dx.doi.org/10.1146/annurev.ecolsys.36.102403.114735. [65] Schlichting CD. Origins of differentiation via phenotypic plasticity. Evol Dev 2003;5:98–105. [66] Bell G, Mooers AO. Size and complexity among multicellular organisms. Biol J Linn Soc 1997;60:345–63, http://dx.doi.org/10.1006/bijl.1996.0108. [67] Herskowitz I. A regulatory hierarchy for cell specialization in yeast. Nature 1989;342:749–57, http://dx.doi.org/10.1038/342749a0. [68] Niklas KJ. The evolutionary-developmental origins of multicellularity. Am J Bot 2014;101:6–25, http://dx.doi.org/10.3732/ajb.1300314. [69] Niklas KJ, Newman SA. The origins of multicellular organisms. Evol Dev 2013;15:41–52, http://dx.doi.org/10.1111/ede.12013. [70] Knoll AH. The multiple origins of complex multicelAnnu Rev Earth Planet Sci 2011;39:217–39, lularity. http://dx.doi.org/10.1146/annurev.earth.031208.100209. [71] Tyukhtenko S, Deshmukh L, Kumar V, Lary J, Cole J, Lemmon V, et al. Characterization of the neuron-specific L1-CAM cytoplasmic tail: naturally disordered in solution it exercises different binding modes for different adaptor proteins. Biochemistry 2008;47:4160–8, http://dx.doi.org/10.1021/bi702433q. [72] Campbell ID, Humphries MJ. Integrin structure, activation, interactions. Cold Spring Harb Perspect Biol 2011;3, and http://dx.doi.org/10.1101/cshperspect.a004994. [73] Wedepohl S, Beceren-Braun F, Riese S, Buscher K, Enders S, Bernhard G, et al. L-selectin – a dynamic regulator of leukocyte migration. Eur J Cell Biol 2012;91:257–64, http://dx.doi.org/10.1016/j.ejcb.2011.02.007. [74] Huber AH, Stewart DB, Laurents DV, Nelson WJ, Weis WI. The cadherin cytoplasmic domain is unstructured in the absence of beta-catenin. A possible mechanism for regulating cadherin turnover. J Biol Chem 2001;276:12301–9, http://dx.doi.org/10.1074/jbc.M010377200. [75] Batlle E, Wilkinson DG. Molecular mechanisms of cell segregation and boundary formation in development and tumorigenesis. Cold Spring Harb Perspect Biol 2012;4:a008227, http://dx.doi.org/10.1101/cshperspect.a008227. [76] Hulpiau P, van Roy F. Molecular evolution of the cadherin superfamily. Int J Biochem Cell Biol 2009;41:349–69, http://dx.doi.org/10.1016/j.biocel.2008.09.027. [77] Buxton RS, Magee AI. Structure and interactions of desmosomal and other cadherins. Semin Cell Biol 1992;3:157–67, http://dx.doi.org/10.1016/S1043-4682(10)80012-1. [78] Kami K, Chidgey M, Dafforn T, Overduin M. The desmoglein-specific cytoplasmic region is intrinsically disordered in solution and interacts with multiple desmosomal protein partners. J Mol Biol 2009;386:531–43, http://dx.doi.org/10.1016/j.jmb.2008.12.054. [79] De Biasio A, Guarnaccia C, Popovic M, Uversky VN, Pintar A, Pongor S. Prevalence of intrinsic disorder in the intracellular region of human singlepass type I proteins: the case of the notch ligand Delta-4. J Proteome Res 2008;7:2496–506, http://dx.doi.org/10.1021/pr800063u. [80] Huber AH, Weis WI. The structure of the beta-catenin/E-cadherin complex and the molecular basis of diverse ligand recognition by beta-catenin. Cell 2001;105:391–402. [81] Gunasekaran K, Tsai C-J, Nussinov R. Analysis of ordered and disordered protein complexes reveals structural features discriminating between stable and unstable monomers. J Mol Biol 2004;341:1327–41, http://dx.doi.org/10.1016/j.jmb.2004.07.002. [82] Evans RM. The steroid and thyroid hormone receptor superfamily. Science 1988;240:889–95. [83] Bertrand S, Brunet FG, Escriva H, Parmentier G, Laudet V, Robinson-Rechavi M. Evolutionary genomics of nuclear receptors: from twenty-five ancestral genes to derived endocrine systems. Mol Biol Evol 2004;21:1923–37, http://dx.doi.org/10.1093/molbev/msh200. [84] Kumar R, Thompson EB. The structure of the nuclear hormone receptors. Steroids 1999;64:310–9. [85] Tora L, White J, Brou C, Tasset D, Webster N, Scheer E, et al. The human estrogen receptor has two independent nonacidic activation functions. Cell 1989;59:477–87, transcriptional http://dx.doi.org/10.1016/0092-8674(89)90031-7. [86] McKenna NJ, Lanz RB, O’Malley BW. Nuclear receptor coregulators: cellular and molecular biology. Endocr Rev 1999;20:321–44, http://dx.doi.org/10.1210/edrv.20.3.0366. [87] Norris JD, Fan D, Kerner SA, McDonnell DP. Identification of a third autonomous activation domain within the human estrogen receptor. Mol Endocrinol 1997;11:747–54, http://dx.doi.org/10.1210/mend.11.6.0008. [88] Liu J, Perumal NB, Oldfield CJ, Su EW, Uversky VN, Dunker AK. Intrinsic disorder in transcription factors. Biochemistry 2006;45:6873–88, http://dx.doi.org/10.1021/bi0602718. [89] Minezaki Y, Homma K, Kinjo AR, Nishikawa K. Human transcription factors contain a high fraction of intrinsically disordered regions essential for transcriptional regulation. J Mol Biol 2006;359:1137–49, http://dx.doi.org/10.1016/j.jmb.2006.04.016. [90] Baskakov IV, Kumar R, Srinivasan G, Ji YS, Bolen DW, Thompson EB. Trimethylamine N-oxide-induced cooperative folding of an intrinsically unfolded

[91]

[92]

[93]

[94]

[95]

[96]

[97]

[98]

[99]

[100]

[101]

[102]

[103]

[104]

[105]

[106]

[107]

[108]

[109]

[110]

[111] [112]

[113]

11

transcription-activating fragment of human glucocorticoid receptor. J Biol Chem 1999;274:10693–6. Krasowski MD, Reschly EJ, Ekins S. Intrinsic disorder in nuclear hormone receptors. J Proteome Res 2008;7:4359–72, http://dx.doi.org/10.1021/pr8003024. Wärnmark A, Wikström A, Wright AP, Gustafsson JA, Härd T. The N-terminal regions of estrogen receptor alpha and beta are unstructured in vitro and show different TBP binding properties. J Biol Chem 2001;276:45939–44, http://dx.doi.org/10.1074/jbc.M107875200. Garza AMS, Khan SH, Kumar R. Site-specific phosphorylation induces functionally active conformation in the intrinsically disordered N-terminal activation function (AF1) domain of the glucocorticoid receptor. Mol Cell Biol 2010;30:220–30, http://dx.doi.org/10.1128/MCB.00552-09. Lavery DN, McEwan IJ. Structural characterization of the native NH2-terminal transactivation domain of the human androgen receptor: a collapsed disordered conformation underlies structural plasticity and protein-induced folding. Biochemistry 2008;47: 3360–9, http://dx.doi.org/10.1021/bi702221e. Lavery DN, McEwan IJ. Structure and function of steroid receptor AF1 transactivation domains: induction of active conformations. Biochem J 2005;391:449–64, http://dx.doi.org/10.1042/BJ20050872. Kumar R, Thompson EB. Transactivation functions of the N-terminal domains of nuclear hormone receptors: protein folding and coactivator interactions. Mol Endocrinol 2003;17:1–10, http://dx.doi.org/10.1210/me.2002-0258. Simons SS, Kumar R. Variable steroid receptor responses: intrinsically disordered AF1 is the key. Mol Cell Endocrinol 2013;376:81–4, http://dx.doi.org/10.1016/j.mce.2013.06.007. Freedman LP, Luisi BF, Korszun ZR, Basavappa R, Sigler PB, Yamamoto KR. The function and structure of the metal coordination sites within the glucocorticoid receptor DNA binding domain. Nature 1988;334:543–6, http://dx.doi.org/10.1038/334543a0. Garza AS, Khan SH, Moure CM, Edwards DP, Kumar R. Binding-folding induced regulation of AF1 transactivation domain of the glucocorticoid receptor by a cofactor that binds to its DNA binding domain. PLoS ONE 2011;6:e25875, http://dx.doi.org/10.1371/journal.pone.0025875. Goswami D, Callaway C, Pascal BD, Kumar R, Edwards DP, Griffin PR. Influence of domain interactions on conformational mobility of the progesterone receptor detected by hydrogen/deuterium exchange mass spectrometry. Structure 2014;22:961–73, http://dx.doi.org/10.1016/j.str.2014.04.013. Zwart W, de Leeuw R, Rondaij M, Neefjes J, Mancini MA, Michalides R. The hinge region of the human estrogen receptor determines functional synergy between AF-1 and AF-2 in the quantitative response to estradiol and tamoxifen. J Cell Sci 2010;123:1253–61, http://dx.doi.org/10.1242/jcs.061135. Haelens A, Tanner T, Denayer S, Callewaert L, Claessens F. The hinge region regulates DNA binding, nuclear translocation, and transactivation of the androgen receptor. Cancer Res 2007;67:4514–23, http://dx.doi.org/10.1158/0008-5472.CAN-06-1701. Yang J, Singleton DW, Shaughnessy EA, Khan SA. The F-domain of estrogen receptor-alpha inhibits ligand induced receptor dimerization. Mol Cell Endocrinol 2008;295:94–100, http://dx.doi.org/10.1016/j.mce.2008.08.001. Koide A, Abbatiello S, Rothgery L, Koide S. Probing protein conformational changes in living cells by using designer binding proteins: application to the estrogen receptor. Proc Natl Acad Sci 2002;99:1253–8, http://dx.doi.org/10.1073/pnas.032665299. Hilser VJ, Thompson EB. Structural dynamics, intrinsic disorder, and allostery in nuclear receptors as transcription factors. J Biol Chem 2011;286:39675–82, http://dx.doi.org/10.1074/jbc.R111.278929. Nettles KW, Bruning JB, Gil G, Nowak J, Sharma SK, Hahm JB, et al. NFkappaB selectivity of estrogen receptor ligands revealed by comparative crystallographic analyses. Nat Chem Biol 2008;4:241–7, http://dx.doi.org/10.1038/nchembio.76. Pike AC, Brzozowski AM, Walton J, Hubbard RE, Thorsell AG, Li YL, et al. Structural insights into the mode of action of a pure antiestrogen. Struct Lond Engl 1993 2001;9:145–53. Brzozowski AM, Pike AC, Dauter Z, Hubbard RE, Bonn T, Engström O, et al. Molecular basis of agonism and antagonism in the oestrogen receptor. Nature 1997;389:753–8, http://dx.doi.org/10.1038/39645. Shiau AK, Barstad D, Loria PM, Cheng L, Kushner PJ, Agard DA, et al. The structural basis of estrogen receptor/coactivator recognition and the antagonism of this interaction by tamoxifen. Cell 1998;95:927–37. Kelleher FC, Fennelly D, Rafferty M. Common critical pathways in embryogenesis and cancer. Acta Oncol Stockh Swed 2006;45:375–88, http://dx.doi.org/10.1080/02841860600602946. Dreesen O, Brivanlou AH. Signaling pathways in cancer and embryonic stem cells. Stem Cell Rev 2007;3:7–17. Pires-daSilva A, Sommer RJ. The evolution of signalling pathways in animal development. Nat Rev Genet 2003;4:39–49, http://dx.doi.org/10.1038/nrg977. Xue B, Dunker AK, Uversky VN. The roles of intrinsic disorder in orchestrating the Wnt-pathway. J Biomol Struct Dyn 2012;29:843–61, http://dx.doi.org/10.1080/073911012010525024.

Please cite this article in press as: Dunker AK, et al. Intrinsically disordered proteins and multicellular organisms. Semin Cell Dev Biol (2014), http://dx.doi.org/10.1016/j.semcdb.2014.09.025

G Model YSCDB-1684; No. of Pages 12 12

ARTICLE IN PRESS A.K. Dunker et al. / Seminars in Cell & Developmental Biology xxx (2014) xxx–xxx

[114] Hariharan R, Pillai MR. Structure–function relationship of inhibitory Smads: structural flexibility contributes to functional divergence. Proteins 2008;71:1853–62, http://dx.doi.org/10.1002/prot.21869. [115] Popovic M, Coglievina M, Guarnaccia C, Verdone G, Esposito G, Pintar A, et al. Gene synthesis, expression, purification, and characterization of human Jagged-1 intracellular region. Protein Expr Purif 2006;47:398–404, http://dx.doi.org/10.1016/j.pep.2005.11.027. [116] Dyson HJ, Komives EA. Role of disorder in I␬B–NF␬B interaction. IUBMB Life 2012;64:499–505, http://dx.doi.org/10.1002/iub.1044. [117] Feng Z-P, Chandrashekaran IR, Low A, Speed TP, Nicholson SE, Norton RS. The N-terminal domains of SOCS proteins: a conserved region in the disordered N-termini of SOCS4 and 5. Proteins 2012;80:946–57. [118] Nusse R, Varmus H. Three decades of Wnts: a personal perspective on how a scientific field developed. EMBO J 2012;31:2670–84, http://dx.doi.org/10.1038/emboj.2012.146. [119] Nichols SA, Dirks W, Pearse JS, King N. Early evolution of animal cell signaling and adhesion genes. Proc Natl Acad Sci 2006;103:12451–6, http://dx.doi.org/10.1073/pnas.0604065103. [120] Cadigan KM, Nusse R. Wnt signaling: a common theme in animal development. Genes Dev 1997;11:3286–305. [121] Rubinfeld B, Souza B, Albert I, Müller O, Chamberlain SH, Masiarz FR, et al. Association of the APC gene product with beta-catenin. Science 1993;262:1731–4. [122] Radivojac P, Vacic V, Haynes C, Cocklin RR, Mohan A, Heyen JW, et al. Identification, analysis, and prediction of protein ubiquitination sites. Proteins 2010;78:365–80, http://dx.doi.org/10.1002/prot.22555. [123] Inobe T, Fishbain S, Prakash S, Matouschek A. Defining the geometry of the two-component proteasome degron. Nat Chem Biol 2011;7:161–7, http://dx.doi.org/10.1038/nchembio.521. [124] Xue B, Romero PR, Noutsou M, Maurice MM, Rüdiger SGD, William Jr AM, et al. Stochastic machines as a colocalization mechanism for scaffold protein function. FEBS Lett 2013;587:1587–91, http://dx.doi.org/10.1016/j.febslet.2013.04.006. [125] Noutsou M, Duarte AMS, Anvarian Z, Didenko T, Minde DP, Kuper I, et al. Critical scaffolding regions of the tumor suppressor Axin1 are natively unfolded. J Mol Biol 2011;405:773–86, http://dx.doi.org/10.1016/j.jmb.2010.11.013. [126] Minde DP, Radli M, Forneris F, Maurice MM, Rüdiger SGD. Large extent of disorder in Adenomatous Polyposis Coli offers a strategy to guard Wnt signalling against point mutations. PLoS ONE 2013;8:e77257, http://dx.doi.org/10.1371/journal.pone.0077257. [127] McGinnis W, Levine MS, Hafen E, Kuroiwa A, Gehring WJ. A conserved DNA sequence in homoeotic genes of the Drosophila Antennapedia and bithorax complexes. Nature 1984;308:428–33. [128] Banerjee-Basu S, Baxevanis AD. Molecular evolution of the homeodomain family of transcription factors. Nucleic Acids Res 2001;29:3258–69. [129] Bertolino E, Reimund B, Wildt-Perinic D, Clerc RG. A novel homeobox protein which recognizes a TGT core and functionally interferes with a retinoidresponsive motif. J Biol Chem 1995;270:31178–88. [130] Gehring WJ, Affolter M, Bürglin T. Homeodomain proteins. Annu Rev Biochem 1994;63:487–526, http://dx.doi.org/10.1146/annurev.bi.63.070194.002415. [131] Akam M. Hox genes and the evolution of diverse body plans. Philos Trans R Soc Lond B Biol Sci 1995;349:313–9, http://dx.doi.org/10.1098/rstb.1995.0119. [132] Ryan JF, Mazza ME, Pang K, Matus DQ, Baxevanis AD, Martindale MQ, et al. Pre-bilaterian origins of the Hox cluster and the Hox code: evidence from the sea anemone, Nematostella vectensis. PLoS ONE 2007;2:e153, http://dx.doi.org/10.1371/journal.pone.0000153. [133] Joshi R, Passner JM, Rohs R, Jain R, Sosinsky A, Crickmore MA, et al. Functional specificity of a Hox protein mediated by the recognition of minor groove structure. Cell 2007;131:530–43, http://dx.doi.org/10.1016/j.cell.2007.09.024. [134] Joshi R, Sun L, Mann R. Dissecting the functional specificities of two Hox proteins. Genes Dev 2010;24:1533–45, http://dx.doi.org/10.1101/gad.1936910. [135] Tóth-Petróczy A, Simon I, Fuxreiter M, Levy Y. Disordered tails of homeodomains facilitate DNA recognition by providing a trade-off between folding and specific binding. J Am Chem Soc 2009;131:15084–5, http://dx.doi.org/10.1021/ja9052784. [136] Vuzman D, Levy Y. DNA search efficiency is modulated by charge composition and distribution in the intrinsically disordered tail. Proc Natl Acad Sci 2010;107:21004–9, http://dx.doi.org/10.1073/pnas.1011775107. [137] Tan X-X, Bondos S, Li L, Matthews KS. Transcription activation by Ultrabithorax Ib protein requires a predicted alpha-helical region. Biochemistry 2002;41:2774–85. [138] Liu Y, Matthews KS, Bondos SE. Multiple intrinsically disordered sequences alter DNA binding by the homeodomain of the Drosophila hox protein Ultrabithorax. J Biol Chem 2008;283:20874–87, http://dx.doi.org/10.1074/jbc.M800375200. [139] Bondos SE, Tan X-X, Matthews KS. Physical and genetic interactions link hox function with diverse transcription factors and

[140]

[141]

[142]

[143]

[144]

[145]

[146]

[147]

[148]

[149]

[150]

[151]

[152]

[153]

[154]

[155]

[156]

[157]

[158]

[159] [160]

[161]

cell signaling proteins. Mol Cell Proteomics 2006;5:824–34, http://dx.doi.org/10.1074/mcp.M500256-MCP200. Passner JM, Ryoo HD, Shen L, Mann RS, Aggarwal AK. Structure of a DNA-bound Ultrabithorax-extradenticle homeodomain complex. Nature 1999;397:714–9, http://dx.doi.org/10.1038/17833. Gavis ER, Hogness DS. Phosphorylation, expression and function of the Ultrabithorax protein family in Drosophila melanogaster. Development 1991;112:1077–93. Lopez AJ, Hogness DS. Immunochemical dissection of the Ultrabithorax homeoprotein family in Drosophila melanogaster. Proc Natl Acad Sci 1991;88:9924–8. Buljan M, Chalancon G, Dunker AK, Bateman A, Balaji S, Fuxreiter M, et al. Alternative splicing of intrinsically disordered regions and rewiring of protein interactions. Curr Opin Struct Biol 2013;23:443–50, http://dx.doi.org/10.1016/j.sbi.2013.03.006. Abdel-Hafiz HA, Horwitz KB. Post-translational modifications of the progesterone receptors. J Steroid Biochem Mol Biol 2014;140:80–9, http://dx.doi.org/10.1016/j.jsbmb.2013.12.008. Choi H-J, Huber AH, Weis WI. Thermodynamics of beta-catenin–ligand interactions: the roles of the N- and C-terminal tails in modulating binding affinity. J Biol Chem 2006;281:1027–38, http://dx.doi.org/10.1074/jbc.M511338200. Likhite VS, Stossi F, Kim K, Katzenellenbogen BS, Katzenellenbogen JA. Kinase-specific phosphorylation of the estrogen receptor changes receptor interactions with ligand, deoxyribonucleic acid, and coregulators associated with alterations in estrogen and tamoxifen activity. Mol Endocrinol 2006;20:3120–32, http://dx.doi.org/10.1210/me.2006-0068. Kandpal RP, Rajasimha HK, Brooks MJ, Nellissery J, Wan J, Qian J, et al. Transcriptome analysis using next generation sequencing reveals molecular signatures of diabetic retinopathy and efficacy of candidate drugs. Mol Vis 2012;18:1123–46. Yruela I, Contreras-Moreira B. Protein disorder in plants: a view from the chloroplast. BMC Plant Biol 2012;12:165, http://dx.doi.org/10.1186/1471-2229-12-165. Yao Q, Gao J, Bollinger C, Thelen JJ, Xu D. Predicting and analyzing protein phosphorylation sites in plants using musite. Front Plant Sci 2012;3:186, http://dx.doi.org/10.3389/fpls.2012.00186. Schwechheimer C. Understanding gibberellic acid signaling – are we there yet? Curr Opin Plant Biol 2008;11:9–15, http://dx.doi.org/10.1016/j.pbi.2007.10.011. Ueguchi-Tanaka M, Nakajima M, Katoh E, Ohmiya H, Asano K, Saji S, et al. Molecular interactions of a soluble gibberellin receptor, GID1, with a rice DELLA protein, SLR1, and gibberellin. Plant Cell 2007;19:2140–55, http://dx.doi.org/10.1105/tpc.106.043729. Sun X, Jones WT, Harvey D, Edwards PJB, Pascal SM, Kirk C, et al. Nterminal domains of DELLA proteins are intrinsically unstructured in the absence of interaction with GID1/gibberellic acid receptors. J Biol Chem 2010;285:11557–71, http://dx.doi.org/10.1074/jbc.M109.027011. Sun X, Xue B, Jones WT, Rikkerink E, Dunker AK, Uversky VN. A functionally required unfoldome from the plant kingdom: intrinsically disordered N-terminal domains of GRAS proteins are involved in molecular recognition during plant development. Plant Mol Biol 2011;77:205–23, http://dx.doi.org/10.1007/s11103-011-9803-z. Dill A, Jung HS, Sun TP. The DELLA motif is essential for gibberellininduced degradation of RGA. Proc Natl Acad Sci 2001;98:14162–7, http://dx.doi.org/10.1073/pnas.251534098. Murase K, Hirano Y, Sun T, Hakoshima T. Gibberellin-induced DELLA recognition by the gibberellin receptor GID1. Nature 2008;456:459–63, http://dx.doi.org/10.1038/nature07519. Holland PWH, Booth HAF, Bruford EA. Classification and nomenclature of all human homeobox genes. BMC Biol 2007;5:47, http://dx.doi.org/10.1186/1741-7007-5-47. Mukherjee K, Brocchieri L, Bürglin TR. A comprehensive classification and evolutionary analysis of plant homeobox genes. Mol Biol Evol 2009;26:2775–94, http://dx.doi.org/10.1093/molbev/msp201. Peng K, Radivojac P, Vucetic S, Dunker AK, Obradovic Z. Length-dependent prediction of protein intrinsic disorder. BMC Bioinform 2006;7:208, http://dx.doi.org/10.1186/1471-2105-7-208. Ito Y, Hirochika H, Kurata N. Organ-specific alternative transcripts of KNOX family class 2 homeobox genes of rice. Gene 2002;288:41–7. Qin Q, Wang W, Guo X, Yue J, Huang Y, Xu X, et al. Arabidopsis DELLA protein degradation is controlled by a type-one protein phosphatase, TOPP4. PLoS Genet 2014;10:e1004464, http://dx.doi.org/10.1371/journal.pgen.1004464. Chao JD, Wong D, Av-Gay Y. Microbial protein-tyrosine kinases. J Biol Chem 2014;289:9463–72.

Please cite this article in press as: Dunker AK, et al. Intrinsically disordered proteins and multicellular organisms. Semin Cell Dev Biol (2014), http://dx.doi.org/10.1016/j.semcdb.2014.09.025

Intrinsically disordered proteins and multicellular organisms.

Intrinsically disordered proteins (IDPs) and IDP regions lack stable tertiary structure yet carry out numerous biological functions, especially those ...
2MB Sizes 0 Downloads 6 Views