Mol Genet Genomics (2014) 289:1217–1223 DOI 10.1007/s00438-014-0881-x
ORIGINAL PAPER
Investigating co‑evolution of functionally associated phosphosites in human Zhi Liu · Guangyong Zheng · Xiao Dong · Zhen Wang · Beili Ying · Yang Zhong · Yixue Li
Received: 27 December 2013 / Accepted: 19 June 2014 / Published online: 9 July 2014 © Springer-Verlag Berlin Heidelberg 2014
Abstract Phosphorylation is essential for protein function and signal transduction in eukaryotic cells. With the rapid development of mass spectrometry technology, a large number of phosphosites are identified. However, high-throughput methods of functional characterization for phosphosites are still scarce. In this study, we inspected if the co-evolution property can be used as an indicator to explore function of phosphosites through investigating coevolutionary relationship between functionally associated phosphosites in human. In practice, the evolution attributes of phosphosites were represented with phylogenetic profiles, and then co-evolutionary correlations of functionally associated phosphosites were detected on three levels: (1) phosphosites within one protein; (2) phosphosites in different proteins participating in the same signal transduction
Communicated by S. Hohmann. Z. Liu and G. Zheng contributed equally to this work. Electronic supplementary material The online version of this article (doi:10.1007/s00438-014-0881-x) contains supplementary material, which is available to authorized users. Z. Liu · X. Dong · Z. Wang · Y. Li (*) Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Rd., Shanghai 200031, People’s Republic of China e-mail:
[email protected] Z. Liu University of Chinese Academy of Sciences, 19 Yuquan Rd., Beijing 100049, People’s Republic of China
pathways, and (3) general phosphosites. Results of the detection show that co-evolution is a general property of functionally associated phosphosites. This finding suggests to some degree that it is feasible to use the co-evolution property in exploring the function of phosphosites and investigating the functional association between them. Keywords Co-evolution · Functional association · Phosphorylation site · Phylogenetic profile · Posttranslational modification
Introduction It is reported that phosphorylation is involved in most of the cellular events, and defects of phosphorylation have been connected to numerous developmental disorders and human diseases (Wang et al. 2014). Characterization of phosphorylation can help understand their critical function in cellular events and provide potential drug targets for disease treatment (Lopez and Cho 2012). With tens of thousands of sites identified by mass spectrometry (MS) G. Zheng · B. Ying · Y. Li Shanghai Centre for Bioinformation Technology, 2078 Keyuan Rd., Shanghai 201203, People’s Republic of China B. Ying · Y. Zhong School of Life Sciences, Fudan University, 220 Handan Rd., Shanghai 200433, People’s Republic of China
G. Zheng CAS‑MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Rd., Shanghai 200031, People’s Republic of China
13
1218
based phosphoproteomes, researchers are faced with the challenge of distinguishing key sites which are highly biologically important from the large-scale data and performing functional annotation for these sites. In previous studies, evolutionary conservations have been used to predict individual functional phosphosites (Niu et al. 2012) based on the concepts that phosphosites of known function are dramatically more conserved than those with no characterized function (Landry et al. 2009). However, the functional associations among the phosphosites have not been investigated before, as phosphorylation is often involved in signal transduction cascades and relationship between phosphosites is complex and hard to define explicitly. In the last two decades, the co-evolution property was utilized for surveying functional association in the biological community. In 1996, Fryxell et al. reported that phylogenetic trees of insulin and insulin receptors were more similar than what could be expected in divergence across species under the standard molecular clock hypothesis (Fryxell 1996). Pellegrini and his colleagues carried out the first application of functional association prediction with phylogenetic profile information on genomic level in 1999, on E. coli (Matteo Pellegrini et al. 1999). They computed phylogenetic profiles for 4,290 E. coli proteins by aligning each protein sequence with the proteins from 16 other fully sequenced genomes. Results of their work illustrated that proteins with matching or similar profiles strongly tended to be functionally linked. More recently, many efforts have been made to investigate the co-evolution of various biological molecules. For instance, methods based on coevolution strategy were utilized for exploring functional association and predicting interaction in ligand and receptor systems (Chern-Sing Goh et al. 2000), transcription factors and their DNA target (Zheng et al. 2012). Moreover, co-evolution study at amino acid level has successfully predicted the interacting surfaces (protein interfaces) of protein complexes as well as the interacting partners of a protein (Schug et al. 2009; Tress et al. 2005; Yeang and Haussler 2007). Since the co-evolution property has been applied to explore function of various molecules successfully, we hypothesized that the co-evolution property may be used to explore the function of phosphosites. In this study, we first represented the evolutionary attributes of phosphosites with phylogenetic profiles. At the same time, functionally associated phosphosites was collected manually from public databases and literatures. Then, we tested the correlation of phylogenetic profile for functionally associated phosphosites on three levels: (1) phosphosites within same proteins; (2) phosphosites in different proteins participating in the same signal transduction pathways, and (3) general phosphosites. Test results showed that functionally associated phosphosites are of high correlation in phylogenetic profile.
13
Mol Genet Genomics (2014) 289:1217–1223
Materials and methods Phosphosites collection and pre‑processing Human phosphoproteins and phosphosites were retrieved from the SysPTM database version 2.0 (Li et al. 2014), and restricted to the proteins recorded in the NCBI RefSeq repository (Pruitt et al. 2012). Then these phosphoproteins were mapped to homology groups retrieved from the NCBI HomoloGene database (build 66) (Sayers et al. 2012). Initially, we considered all the 21 species recorded in the HomoloGene database which covered a broad evolutionary time scale. While examining each homology group, we found that around 60 % human phosphoproteins did not have orthologs in non-vertebrate species. Since we were to construct phylogenetic profile on phosphosite level, the impact of lacking orthologs should be minimized as much as possible. Therefore, we chose seven reference species whose numbers of homology proteins in the HomoloGene database are comparable to that of human (Table S1). These species included P. troglodytes, M. musculus, R. norvegicus, C. lupus, B. taurus, G. gallus, and D. rerio which were also adopted in the previous studies concerning function and evolution of phosphorylation (Wang et al. 2011). Then, we kept only one homology sequence for a species in a given homology group. If there existed several homology sequences for a certain species in a group, we kept the most likely orthologous sequence according to the results of the bi-direction best hit test carried out by the BLAST software (Altschul et al. 1990). In addition, groups having less than two homology sequences were discarded, resulting 9,037 homology groups with non-redundant phosphoproteins. Phosphosites phylogenetic profile and profile correlation The ClustalW program (Chenna et al. 2003) was first used to align the sequences of each homology group, and then human phosphosites and the corresponding sites in reference species were extracted from alignment results (Fig. 1). Next, each human phosphosite was depicted with a phylogenetic profile, which was a binary vector with eight elements representing evolutionary status of the phosphosite in eight species. An element was set to one when the site of a certain species was identical to human phosphosite; otherwise, 0 (Fig. 1). The profile correlation of two phosphosites was defined as: n − ni=1 xi ⊕ yi Cor = (1) n where x and y referred to the two vectors in comparison, i was the ith element of vectors, and n was the element
Mol Genet Genomics (2014) 289:1217–1223
1219
Fig. 1 Diagram of phosphosite phylogenetic profile generation and profile correlation calculation. Firstly, multiple sequences alignment is carried out for each homology group. Then phylogenetic profile of each human phosphosite is generated from alignment results. The correlation of a profile pair is defined as the value of identical entries divided by the total entries between the two profiles
number of a vector. Character ⊕ was a logical operation, its value was set to be 1 if and only if its two input elements were different (0 ⊕ 0 = 0,1 ⊕ 0 = 1,0 ⊕ 1 = 1,1 ⊕ 1 = 0). Annotation of phosphosites in KEGG pathway First, we downloaded human signal transduction pathways involving phosphorylation events from the KEGG database (Release 65.0) (Kanehisa et al. 2004). Then, we collected information of phosphosite participating in these pathways from published papers manually (Table S2). In practice, pathways having less than two phosphosites evidenced by published papers were excluded from our analysis. Datasets of functionally associated sites In practice, the functionally associated sites (FAS) were defined in three scenarios: 1) FAS within a protein (F1 dataset, Table S3). In the annotated pathways mentioned above, if a protein was phosphorylated on multiple sites, which were annotated with the same or related function in a certain pathway, these sites were regarded as FAS; 2) FAS within signal transduction pathways (F2 dataset, Table S4). When phosphorylation of site a in protein A could directly activate or inhibit phosphorylation of site b in protein B, site a and b were considered as FAS; 3) general FAS: phosphosite pairs within interacted proteins and catalyzed by kinases within the same family. First, experimentally verified interacted proteins and kinase-substrate dataset were collected from the STRING database (Franceschini et al. 2013) and
PhosphoSitePlus database (Hornbeck et al. 2012) respectively. Then, we extracted all the phosphosite pairs within interacted proteins, and kept the pairs which both sites were catalyzed by kinases within the same family, resulting in a dataset of around 900 phosphosite pairs (F3 dataset, Table S5).
Results High phylogenetic profile correlation of functionally associated sites within a protein It is well known that many intracellular pathways involve protein phosphorylation/dephosphorylation reactions (Krebs and Krebs 1999). Thus it was important to investigate the co-evolution of phosphosites in the context of signal pathways. Given the context of a certain pathway, multi-site phosphorylation of a protein can determine signal extension and duration in the pathway (Cohen 2000b). For example, in the MAPK pathway, phosphorylation can occur in protein MAPKAP-K2 at position Thr222, Ser272 and Thr334. Only when two out of the three sites are phosphorylated, the pathway is activated (Cohen 2000a), these sites are defined as functionally associated ones. In F1 dataset, 41 proteins were annotated with more than one phosphosites in a certain pathway (Table S3). And we noticed that, for most multi-phosphosite proteins, phylogenetic profiles of functionally associated sites in a protein are correlated. To test whether this correlation was significant in
13
Mol Genet Genomics (2014) 289:1217–1223 1.2
1220
0.8 0.6 0.4
STAT
MAPKAPK2
Lats Mapkapk
Smad5
Smad2
Smad1
IKKb p100
IKKa
Btk
beta−catenin
Multiple phosphorylated protein
Ikba
p27
FOXO3
MEK1 p21
PKCa
SGK1
JNK
c−JUN
ERK5
MEK5
HSP27
ATF−2
MEF2C
MLK3
MKK6 p38
Raf Myc
ERK2
ERK1
S6
RSK2
ATG1
IRS1
4EBP1
AKT
mTOR
TSC2
0.0
0.2
Profile correlation
1.0
Observed Average
Fig. 2 Phylogenetic profile correlation of sites within multi-phosphorylated proteins. Correlation values comparison between functionally associated phosphosite-pairs and all phosphosite-pairs within a certain protein. The red bar presents correlation values of functionally
associated site-pairs within multiple phosphorylated proteins, and the blue bar presents the average correlation values of site-pairs within the donor proteins. (p value