Hindawi Publishing Corporation BioMed Research International Volume 2016, Article ID 5313050, 7 pages http://dx.doi.org/10.1155/2016/5313050
Research Article Statistical Approaches for the Construction and Interpretation of Human Protein-Protein Interaction Network Yang Hu,1 Ying Zhang,2 Jun Ren,1 Yadong Wang,3 Zhenzhen Wang,4 and Jun Zhang2 1
School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, China Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin 150088, China 3 School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China 4 College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China 2
Correspondence should be addressed to Yadong Wang;
[email protected], Zhenzhen Wang;
[email protected], and Jun Zhang;
[email protected] Received 3 June 2016; Revised 23 July 2016; Accepted 1 August 2016 Academic Editor: Qin Ma Copyright Β© 2016 Yang Hu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The overall goal is to establish a reliable human protein-protein interaction network and develop computational tools to characterize a protein-protein interaction (PPI) network and the role of individual proteins in the context of the network topology and their expression status. A novel and unique feature of our approach is that we assigned confidence measure to each derived interacting pair and account for the confidence in our network analysis. We integrated experimental data to infer human PPI network. Our model treated the true interacting status (yes versus no) for any given pair of human proteins as a latent variable whose value was not observed. The experimental data were the manifestation of interacting status, which provided evidence as to the likelihood of the interaction. The confidence of interactions would depend on the strength and consistency of the evidence.
1. Introduction Individual proteins cannot perform their biological functions by themselves, and actually they need to perform their functions in the biological process through interacting with other proteins [1]. Usually the interaction between two proteins means either they perform a biological function corporately or there is physical direct contact between them [2]. Most of the important molecular processes in cell, such as DNA replication, need to be performed by a large number of protein complexes. And these complexes are made up by the interactions between proteins. The study of PPIs is also considered to be a central problem in proteomics for living cells. Due to the dynamic interaction between proteins, the impact of surrounding environment should also be taken into account. The study of human PPI network can help to enhance the understanding of the disease but also provide a theoretical foundation for finding new treatment. With the continuous progress and development of highthroughput experimental technology, more and more large quantities of interactions between human proteins had been
confirmed by a variety of experimental methods. And many kinds of biological interaction networks have been investigated [3β7]. However, current high-throughput experimental techniques also indicated the shortcomings of high error; not only might the different experimental methods induce different experimental results, but also even different research groups using the same experimental method could not guarantee the exact same result. Therefore, it was urgent to integrate the data from different biological experiments, and even different species, to construct a highly credible network of PPIs. So in this paper, a Bayesian hierarchical model of human PPI network was constructed with a variety of sources of protein interaction data. Meanwhile, a Monte Carlo expectation maximization algorithm was used to estimate the parameters of the model. Then the confidence of protein interaction relationship was calculated based on Bayesian model, and human PPI network with high-confidence level could be obtained. Thereafter, the role of intrinsic disordered proteins (IDPs) was investigated in the high-confidence PPI network. First of all, different functional modules were obtained through
2
BioMed Research International
Experimental data of human (Y2H, MPC, literature)
Interaction status of human protein pairs (yes versus no)
Interaction status of the corresponding homolog pairs in other organisms (yes versus no)
Experimental data of other organisms (Y2H, MPC, literature)
Genomic features (GO, co expression, . . . )
Figure 1: Overall scheme to construct the human protein-protein interaction network. The interaction status of a given pair of human proteins and their homolog in other organisms are unobserved (dashed box) and the experimental data and genomic features are observed evidence (solid boxes). Solid arrows represent model hierarchy and dashed arrows represent inference steps.
Table 1: Data sets or databases used to construct the human proteinprotein interaction network. Method Organism Y2H Human Y2H Human MPC Human Literature Human Y2H Yeast Y2H Yeast MPC Yeast MPC Yeast MPC Yeast MPC Yeast Literature Multiple Literature Multiple Multiple Multiple
Reference Stelzl et al. [8] Rual et al. [9] Ewing et al. [10] HPRD [11], http://www.hprd.org/ Ito et al. [12] Uetz et al. [13] Gavin et al. [14] Ho et al. [15] Gavin et al. [16] Krogan et al. [17] IntAct [18], http://www.ebi.ac.uk/intact/ MIPS [19], http://mips.gsf.de/proj/ppi/ DIP [20], http://dip.doe-mbi.ucla.edu
clustering of high-confidence PPI network based on the network topology structure. Then we found the functional modules which were significantly correlated with intrinsically disordered proteins and analysed the effect of IDPs in these functional modules, while searching for the associations between these functional modules and diseases.
2. Materials and Methods 2.1. Data Collection. In Table 1, we show the experimental data that will be used for the construction of the human PPI network [8β20]. Note that the literature or text mining approach represents most of the low-throughput experimental studies of individual protein-protein interaction. It is possible that the result from the same experiment will be recorded in multiple databases. We will eliminate this type of redundancy. It should be emphasized that the MPC experiments provide result in the format of protein complexes instead of pair-wise protein-protein interactions. Since proteins located in the same complex might not interact with one another directly, we will account for this factor in our model.
2.2. Statistical Modeling of Various Data Sources. The overall scheme of our approach is illustrated in Figure 1. We consider an empirical Bayes approach to integrate various sources of evidence. Let πππ be the binary indicator such that πππ = 1 means that human proteins π and π have a direct physical interaction and it is 0 otherwise. Hence, πππ is the true interacting status that is not observed. To infer πππ , we consider individual model for each type of observed data and integrate the evidence to compute the probability of πππ = 1. 2.2.1. Human Y2H Data. It has been found that there are a number of mechanisms that can lead to the expression of the reporter gene in a Y2H experiment, which means that an observed interaction might not necessarily mean a true interaction. In our model, we consider the following mechanisms: (a) true interaction; (b) self-activation; and (c) unknown process. Let πππ be the binary indicator such that πππ = 1 if proteins π and π are observed to interact in a Y2H experiment and it is 0 otherwise. Then πππ = 1 only if at least one of the three above mechanisms is functional. Let ππ = 1 if protein π is a self-activation protein and let it be 0 otherwise. We define πΌπΌ = Pr [π is functional | πππ = 1] ,
(1)
πΌπ = Pr [π is functional | ππ + ππ > 0] ,
(2)
πΌπ = Pr [π is functional] .
(3)
Then we have Pr [πππ = 1 | π, π] πππ
= 1 β (1 β πΌπΌ )
ππ +ππ
(1 β πΌπ )
(4) (1 β πΌπ) .
2.2.2. Human MPC Data. MPC experiment reveals protein complexes instead of individual pairwise PPI. We say protein B is an π-step neighbour of protein A if the shortest path between A and B in the PPI network is of length π. We conjecture that the bait will mostly fish out its 1-step neighbours, and 2-step neighbours and distant proteins (at least three
BioMed Research International
3 2.2.3. Literature Data on Human PPI. Let πΏ ππ be the interaction status of proteins π and π reported. We will account for the false positive rate (πΎ0,π ) and false negative rate (πΎ0,π ):
1.0
Normalized score
0.8
π
1βπππ
Pr [π»ππ = 1 | πππ ] = πΎ1 ππ πΎ0 0.6
.
(7)
2.2.4. Data from Other Organisms. We will also collect (πβ , πΆβ ) from other organisms with corresponding unobserved variables denoted by (πβ , πβ ). Similar models can be used to model (π, πΆ, πΏ) for inference of (πβ , πβ ). To connect (πβ , πβ ) to (π, π), we consider the following models:
0.4
0.2
Pr [ππβσΈ πσΈ = 1 | πππ ] πππ
0.0
= [Ξ 1 (π½ππσΈ ,ππσΈ ; π1 )] 0.0
0.2
0.4
0.6
0.8
1.0
π
Figure 2: The optimization of ππ and ππ for different π. Red line and green line correspond to ππ and ππ separately.
step-away) are occasionally observed. Hence, we define the following parameters for the bait proteins: Pr [1-step neighbour is observed] = π1 ,
(5)
Pr [2-step neighbour is observed] = π2 .
Let πΆπ be the set of proteins in a complex corresponding to bait protein k. Denote by ππ(1) , ππ(2) the set of 1-step and 2step neighbours of the bait protein π under a given value of π. Then the probability of observing πΆπ can be written as follows: Pr [πΆπ | π] (1)
= π1 |ππ
β©πΆπ |
|ππ(1) \πΆπ |
(1 β π1 )
(2)
π2 |ππ
β©πΆπ |
(1 β π2 )
|ππ(2) \πΆπ |
(6) ,
where | β
| is the function that maps a set to its size.
1βπππ
[Ξ 0 (π½ππσΈ ,ππσΈ ; π0 )] π
,
(8) 1βπ
Pr [ππβσΈ = 1 | ππ ] = [Ξ©1 (πΌππσΈ ; π 1 )] π [Ξ©0 (πΌππσΈ ; π 0 )] π , where π½ππσΈ ,ππσΈ is the joint sequence identity between π and πσΈ and between π and πσΈ and πΌππσΈ is sequence identity between π and πσΈ ; Ξ 1 , Ξ 0 , Ξ©1 , and Ξ©0 are functions of the joint or individual sequence identities with parameters π1 , π0 , π 1 , and π 0 , which can be modeled by parametric structure. 2.3. Construction of Hierarchical Bayesian Model. So far we have introduced the distribution models for the experimental data and genomic features that are conditional on the values of Z and X. To finish the model, we also need to specify the distributions of Z and X, which can be modeled with independent Bernoulli distributions: Pr (πππ = 1) = π, Pr (ππ = 1) = π.
With the observed data and the unobserved variables, we can infer the posterior probability of π using the EM algorithm. Note that there are multiple organisms and multiple data sets for some of the organisms. Different parameters will be used to account for difference in the data. As illustrated in (10), the complete log likelihood function of our model can be expanded below, and the factor of (10) can be substituted by (3)βΌ(9):
πΏ πΆ (Ξ¨) = π (π», π, π, π, π, πΏ, πβ , πβ , πβ , πβ | π) = π (π» | π, π) π (π | π, π, π) π (π | π, π) π (πΏ | π, π) β
π (πβ | πβ , πβ , π) π (πβ | πβ , π) π (πβ , πβ | π, π, π) π (π, π | π) = β π (π»ππ | πππ , π) (π,π)βππ»
πππ
πππ
πππ
π‘=1
π‘=1
π‘=1
π(π‘)
β
β [βπ (πππ(π‘) | πππ , ππ , ππ , π)] β [βπ (ππππ(π‘) | πππ , π) βπ (πππ (π,π)βππ
(π,π)βππ
πππβ
β
β (π,π)βππβ
[βπ (π(π‘)β ππ π‘=1 [
|
πππβ , ππβ , ππβ , π)] ]
πππβ
β [βπ (ππππ(π‘)β β π‘=1 (π,π)βππ [
|
πβππβ
(π,π)βπ
πβππ
| πππ , π)]
β πππ π(π‘)β β πππ , π) βπ (πππ π‘=1
β
β π (πππβ | πππ , π) β π (ππβ | ππ , π) β π (πππ | π) β π (ππ | π) (π,π)βπβ
(9)
| πππ , π)] β π (πΏ ππ | πππ , π) ] (π,π)βππΏ
4
BioMed Research International πππ
= β [1 β (1 β πΌπΌ ) (π,π)βππ
π
1βπππ π»ππ
β
β (πΎ1 ππ πΎ0
)
(π,π)βππ»
(π,π)βππβ
πβ πππ+β
β
β [π1 ππ β (π,π)βππ
π
1βπππ πΏ ππ
π
πππβ
)
(π,π)βππΏ
1βπππ
(π,π)βπβ
)
1βπππ 1βπ»ππ
)
πππ+
[(1 β πΌπΌ ) πππ πππ+
πππ+β
(1 β πΌπ)] β
πππ
(1 β πΌπ )
(1 β π1 ) πππβ
[(1 β πΌπΌ )
(1βπππβ )πππ#β
+β
Γ π2 (1βπππ )πππ (1 β π2 )
π
1βπππ 1βπΏ ππ
π
1βπππβ
(1 β π½1 ππ π½0
1βπππ
(1 β π1 ππ π0
)
(1 β πΌπ)]
1βπππ
β ππππ (1 β π)
πππ#
(1βπππ )πππ#
+
Γ π2 (1βπππ )πππ (1 β π2 ) ππβ +ππβ
(1 β πΌπ )
π
]
πππ#β
(1 β πΌπ)]
1βπππ π»ππ
] β (πΎ1 ππ πΎ0
)
(π,π)βππ»
π
1βπππ 1βπ»ππ
(1 β πΎ1 ππ πΎ0
)
β πππ (1 β π)1βππ πβππ
(π,π)βπ
)
ππ +ππ
πππ πππ#
β [π1
(π,π)βππ
ππβ +ππβ
πππβ πππ#β
π
(1 β πΌπ)]
(1 β πΌπ )
(1 β π1 )
β
β (π½1 ππ π½0
β
β (π1 ππ π0
(1 β πΌπ )
(1 β πΎ1 ππ πΎ0
πππβ
β
β [1 β (1 β πΌπΌ )
ππ +ππ
π
β 1βππ ππ
β (π 1 π π 0
πβππβ
)
π
β 1βππ 1βππ
(1 β π 1 π π 0
)
, (10)
where the parameter vector π = {π, π, πΌπΌ , πΌπ , πΌπ, π1 , π2 , πΎ1 , πΎ0 , π½1 , π½0 , π1 , π0 , π 1 , π 0 }. 2.4. Monte Carlo Expectation Maximization for Parameter Estimation. In the model, it was not possible to estimate the true value of potential variables and model parameters directly. In order to effectively estimate the potential variables and model parameters, this paper used the Monte Carlo expectation maximization algorithm based on incomplete parameter estimation, as illustrated in Algorithm 1. In the πΈ-step of Algorithm 1, we use Gibbs sampling to sample (π, π, πβ , πβ ) from π(π, π, πβ , πβ | π», π, π, πΏ, πβ , πβ , πΜ0 ) in turn. Repeat the sampling process until the estimations of missing data are obtained. Then in the π-step of Algorithm 1, the parameter vector π = {πΎ1 , πΎ0 , πΌπΌ , πΌπ , πΌπ, π½1 , π½0 , π1 , π0 , π 1 , π 0 } is estimated by Greedy Hill Climbing. Finally the iteration is stopped when diff > 0.01.
them as reliable confidence interaction if Pr[πππ = 1 | π», π, π, πΏ, πβ , πβ ] > 0.8. Then we got 48361 PPIs with reliable confidence measure among 23286 proteins. 3.2. Characterization of Network and Roles of IDPs Based on Network Topology. We analysed the role of IDPs in the human PPI networks with reliable confidence measure. A IDP was defined as a protein with continuous intrinsically disorder region whose length was larger than 40 amino acids. And 8735 IDPs were identified from 23286 proteins after predictions. Firstly, the human PPI network was cut into subnetworks or modules by SCAN. SCAN obtained modules based on the similarity between common neighbors. Then we used modularity and similarity-based modularity as metrics. Modularity is a statistical measure of the quality of network clustering, which is defined as follows: ππΆ
ππ = β [
3. Results All the protein names were mapped to the Entrez IDs. Finally we got 32540 proteins, and there were 144603 interactions between these proteins. 3.1. Construction of the Human PPI Network with Reliable Confidence Measure. Four models were established separately using high-throughput Y2H experimental data, highthroughput MPC experimental data, human PPI data, and all the PPI data. The comparisons among these four models were listed in Table 2. After the estimation of parameter vector π by Monte Carlo EM, we recalculated the posterior probability of π, which is Pr[π | π», π, π, πΏ, πβ , πβ ], with π and the observed values π», π, π, πΏ, πβ , πβ . And for each pair of PPI, we considered
π =1
ππ π 2 β ( π ) ], πΏ 2πΏ
(11)
where ππΆ is the number of clusterings, L is the number of edges, ππ is the number of edges for π th module, and ππ is the degree of all the nodes in π th module. We could obtain the best clustering by optimizing ππ. And similarity-based modularity is the supplementary for the modularity, which is defined as follows: ππΆ
ππ = β [ π =1
πΌππ π·π 2 β ( π ) ]. ππ 2ππ
(12)
As shown in Figure 2, on one hand, the modularity monotonically decreased from the position nearby zero, and it could not be maximized. On the other hand, the similarity-based modularity could be maximized while the threshold π equals 0.61. Conditional on the π = 0.61, the reliable human PPI
BioMed Research International
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11)
(12) (13)
5
π = 0, initialize the parameters while (diff > 0.01) { π = 0 // πΈ-Step while (π β€ π) { Sample π(π+1) from π(π(π) , π, πβ(π) , πβ(π) | π», π, π, πΏ, πβ , πβ , πΜπ ) Sample π(π+1) from π(π, π(π+1) , πβ(π) , πβ(π) | π», π, π, πΏ, πβ , πβ , πΜπ ) Sample π(π+1) from π(π(π+1) , π(π+1) , πβ(π) , πβ | π», π, π, πΏ, πβ , πβ , πΜπ ) Sample π(π+1) from π(π(π+1) , π(π+1) , πβ , πβ(π+1) | π», π, π, πΏ, πβ , πβ , πΜπ ) π=π+1 } calculate π function π Μ (π | π(π) , π, π, πΏ, πβ , πβ ) = 1 β log πΏπ (π | π, π, πΏ, πβ , πβ , π(π) , π(π) , π(π)β , π(π)β ) π π π=1 // π-Step (π) 1 π β(π,π)βπ πππ Μ (π+1) = β ( ) π π π=1 |π| (π) 1 π βπβπ ππ Μπ(π+1) = β ( σ΅© π σ΅© ) π π=1 σ΅©σ΅©σ΅©ππ σ΅©σ΅©σ΅© β β πππ πππ πππ πππ ππ ππβ πππππ + βπ=1 πππ ) + β(π,π)βπβ πππβ(π) (βπ=1 πππππβ + βπ=1 πππ ) β(π,π)βππ πππ(π) (βπ=1 1 π π (π+1) Μ π1 = β ( ) π π=1 β(π,π)βππ πππ(π) (πππ + πππ ) + β(π,π)βπβ πππβ(π) (πππβ + πππβ ) π
Μ (π+1) π 2 (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24) (25) (26) (27) (28)
}
π
1 = β( π π=1
π
π
πβ
ππ
πβ
ππβ
ππ ππ ππ ππ πππππ + βπ=1 πππ ) + β(π,π)βπβ (1 β πππβ(π) ) (βπ=1 πππππβ + βπ=1 πππ ) β(π,π)βππ (1 β πππ(π) ) (βπ=1 π
β(π,π)βππ (1 β πππ(π) ) (πππ + πππ ) + β(π,π)βπβ (1 β πππβ(π) ) (πππβ + πππβ )
)
π
π = 0; change0 = 0.01 (π) ππ = π(π) = {πΌπΌ(π) , πΌπ(π) , πΌπ(π) , πΎ1(π) , πΎ0(π) , π½1(π) , π½0(π) , π1(π) , π0(π) , π(π) 1 , π0 } while (1) { πΌπΌπ+1 = arg max π (πΌπΌ , πΌππ , πΌππ ) πΌππ+1 = arg max π (πΌπΌπ+1 , πΌπ , πΌππ ) πΌππ+1 = arg max π (πΌπΌπ+1 , πΌππ+1 , πΌπ ) Μ (ππ+1 ) β π Μ (ππ ) changeπ+1 = π if (abs(changeπ+1 ) < abs(changeπ /20)) break π=π+1 } σ΅¨σ΅¨ Μ (π+1) Μ (π(π) )σ΅¨σ΅¨σ΅¨σ΅¨ σ΅¨σ΅¨π (π ) β π σ΅¨ diff = σ΅¨ Μ (π(π) ) π π=π+1 π = π β 1.1
Algorithm 1: Monte Carlo expectation maximization for parameter estimation.
network was cut into 241 modules. Under the significant level πΌ = 0.05, the π value of each module was calculated by the formula below: π
( ππ ) ( πβπ πβπ ) , π) ( π π=π
π-value = β
(13)
where π is the number of all the proteins and π is the number of all the IDPs. 33 modules among 241 modules were significantly associated with IDPs.
However, due to the fact that acquisition of functional modules is only dependent on the network topology, we analysed the modules with known diseases. And the overlap of PPI in hela cell and a functional module which was highly related with IDPs was shown in Figure 3. The weight of each side is the posterior probability of the real value Z. If a node with more than 5 neighbours was defined as a hub node in this subnetwork, a total of 69% of the hub nodes were IDPs. It is verified that IDPs were easy to become hub nodes of the protein interaction network due to the flexibility
6
BioMed Research International MAP3K11
0.988281943951733
LYK5 CAB39
CDC42
0.992211195221171 0.928797709895298
0.916706934804097 STK11 PRKAA2
0.976683685462922
TAOK2
0.93252996976953 MAP3K4
0.961012795846909 0.931957258866169 RPS6KB1 0.942964086215943 RICTOR
ETS1
TNF
NOS3
0.94532719089184 MAP2K3 0.953462355793454 0.975196305336431
MAX NFATC4
0.981849851994775
MKNK2 0.984706089040264 0.973410787642933 0.926640554866754 PPM1A
0.981691981828771 MAPK13
0.940581873967312
MAP2K6
0.945371649204753 0.977618924784474 0.915498713194393
0.921881220955402
GADD45A 0.975216674013063
MAPK12 EIF4EBP10.956537746964022
AKT1
0.939944128529169 NFKB1
TNFRSF1A 0.961103916191496 IKBKB
0.918918188964017
MAPK14
0.929881895054132 0.996390584134497 0.956659654597752
0.950761633040383 0.920713611249812 0.923483328218572 MAPKAPK3 0.993070949031971 0.946814049035311 0.985305433603935 MAPKAPK2 0.992017667135224 0.983116342336871 APPL1 0.905622362112626
RELB 0.919653447926976
DUSP1 SRC
0.970759057998657
RELA
0.958500777697191
RPS6KA5
0.996512538474053 0.958548365836032 MAP3K7IP1 0.921444909600541 MAPK11 0.961261739302427 0.976575488573872 0.945294330199249 NR4A1 HSPA1L 0.996898632822558 0.934782098606229 0.914294418483041 0.965479361545295 0.968648963863961 ATF2 0.974183189147152 RPS6KA2 PTPN7 ACVR1B 0.967031562980264 0.994482649955899 0.9805953731527550.994280808861367 HSPB1 PTK2 0.932734908000566 PIK3R1 0.965239932364784 0.93360921673011 0.930646225716919 MAP3K7 0.95543672691565 0.995021546422504 0.907413092372008 PPM1B 0.975433830474503 MAP3K1 0.942683638213202 MAPK3 0.951040239352733 DUSP9 0.920654378598556 0.956093352544121 0.918770066648722 BIRC4 0.900253127841279 PIAS1 ARRB2 0.905868032528088 0.950504301209003 0.989411127241328 0.925117675447837 0.909018674050458 0.901917463727295 0.954404675262049 MAP2K7 0.924848372861743 0.924461668543518 0.956331543298438 FLNA ELK1 TP53 PXN 0.921318202326074 0.931317168916576 0.978415978094563 0.941751045174897 0.986539391428232 0.913279780140147 TRAF6 0.911073893494904 0.94671230611857 CHUK NLK 0.983137573464774 FGFR2 MAPK8 CYCS 0.986690857866779 0.914776860992424 0.975804627151229 RET 0.975841080630198 CRK 0.956825831206515 MAPK1 0.9997719044331460.913919422426261 MAP3K3 0.9952761003281920.985154237272218 0.971650722622871 PLCG1 0.907407541829161 0.91643746108748 0.93231390232686 RPS6KA3 0.925122427009046 0.988122801296413 FOXO3 CDKN1B 0.945910584321246
CSF1R
MKNK1 0.923692180938087 MDM2
MAP3K14
0.999690869147889 0.934390111477114
0.968578722397797 0.947340827737935
0.98290718451608 0.939194727083668 0.982442429615185
0.950850107567385
0.943548620864749
KIT 0.902885242179036 STAT1 HSPA1B
DAXX
PML PTPRR
PRKACA
TRAF2
0.971134889591485
DFFA
PAK2
GSTP1
LAMA4 0.981363289873116 0.967327374499291 0.957270197686739
MAP2K5 PDPK1
STK30.942446070653386 CASP3
MAPKAPK5
0.915790811181068
0.951265787566081 MAP3K5
EGFR TGFB1
0.926799199474044 PECAM1
RXRA
0.995621985103935 0.933443687297404
SNCA
0.995423619728535
0.963376750890166
0.912790355179459
TRAF1 0.934508309792727
0.931238074460998 0.961266692611389 APP
STAT3 0.997650526557118HDAC2 0.913925072038546
FAS
0.955746185895987 CASP6
0.918777518323623
ACTN4
EGF 0.910632426384836
0.907892302935943
0.965811743983068
MMP9 MAPK9
CTNNB1
0.947107569966465
PDGFRB
0.946906847506762 0.938711688760668 PARD3 0.908502965210937 0.944146110233851
RAP1A
DUSP4 CDK6
0.978788903797977 FYN
0.940609277109616
AR
RASGRP4
0.998856807569973 0.934440714376979
PRKCD
FASLG
0.981066464190371
FN1
Figure 3: A reliable subnetwork for hela cell. Circles correspond to IDPs. And the degree of grey corresponds to the length of intrinsically disordered region for IDP.
Table 2: Comparison of parameters based on different data. HighParameters throughput Y2H π 6.8 Γ 10β3 π 7.7 Γ 10β5 πΌπΌ 0.658 πΌπ 0.426 πΌπ 4.5 Γ 10β3 π1 β π2 β
Highthroughput MPC
Human PPI data
All PPI data
1.9 Γ 10β3 β β β β 0.738 0.623
6.1 Γ 10β3 5.3 Γ 10β5 0.543 0.496 9.7 Γ 10β4 0.755 0.764
1.4 Γ 10β2 8.9 Γ 10β5 0.933 0.852 0.007 0.809 0.788
of the structure, revealing an important role of IDPs in the regulation of cervical cancer hela cell.
4. Discussion Our model is unique and novel in the following perspectives. First, it integrates Y2H and MPC data in a cohesive and unified model that connect the two types of data through the unobserved true status of direct physical interaction π. Second, the model allows a natural calculation of the confidence of each interacting pair via the posterior probability. This is a critical measurement in downstream analysis and will be accounted for. To our knowledge, no previous study has considered uncertainty in the PPI network analysis. The inference of the interacting probability involves a large number of latent variables. The combinatorial effects make it impractical to compute the expectation of the missing variables analytically during the πΈ-step. It is likely that various data sets carry different amount of information regarding the true interaction status. Hence, the inference can be
BioMed Research International made by appropriately weighing data of various types instead of treating them equally. This can be achieved by setting parameter constrain.
Competing Interests The authors confirm that there is no conflict of interests related to the content of this article.
Authorsβ Contributions Yang Hu and Ying Zhang contributed equally to this work.
References [1] C. Wu, F. Zhang, X. Li et al., βComposite functional module inference: detecting cooperation between transcriptional regulation and protein interaction by mantel test,β BMC Systems Biology, vol. 4, article 82, 2010. [2] Y.-A. Huang, Z.-H. You, X. Chen, K. Chan, and X. Luo, βSequence-based prediction of protein-protein interactions using weighted sparse representation model combined with global encoding,β BMC Bioinformatics, vol. 17, article 184, 2016. [3] X. Chen, βPredicting lncRNA-disease associations and constructing lncRNA functional similarity network based on the information of miRNA,β Scientific Reports, vol. 5, Article ID 13186, 2015. [4] X. Zeng, X. Zhang, and Q. Zou, βIntegrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks,β Briefings in Bioinformatics, vol. 17, no. 2, pp. 193β203, 2016. [5] Q. Zou, J. Li, Q. Hong et al., βPrediction of microRNA-disease associations based on social network analysis methods,β BioMed Research International, vol. 2015, Article ID 810514, 9 pages, 2015. [6] Q. Zou, J. Li, L. Song, X. Zeng, and G. Wang, βSimilarity computation strategies in the microRNA-disease network: a survey,β Briefings in Functional Genomics, vol. 15, no. 1, pp. 55β 64, 2016. [7] F. Zhang, B. Gao, L. Xu et al., βAllele-specific behavior of molecular networks: understanding small-molecule drug response in yeast,β PLoS ONE, vol. 8, no. 1, Article ID e53581, 2013. [8] U. Stelzl, U. Worm, M. Lalowski et al., βA human protein-protein interaction network: a resource for annotating the proteome,β Cell, vol. 122, no. 6, pp. 957β968, 2005. [9] J.-F. Rual, K. Venkatesan, T. Hao et al., βTowards a proteomescale map of the human protein-protein interaction network,β Nature, vol. 437, no. 7062, pp. 1173β1178, 2005. [10] R. M. Ewing, P. Chu, F. Elisma et al., βLarge-scale mapping of human protein-protein interactions by mass spectrometry,β Molecular Systems Biology, vol. 3, article 89, 2007. [11] T. S. Keshava Prasad, R. Goel, K. Kandasamy et al., βHuman Protein Reference Databaseβ2009 update,β Nucleic Acids Research, vol. 37, supplement 1, pp. D767βD772, 2009. [12] T. Ito, T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, and Y. Sakaki, βA comprehensive two-hybrid analysis to explore the yeast protein interactome,β Proceedings of the National Academy of Sciences of the United States of America, vol. 98, no. 8, pp. 4569β4574, 2001.
7 [13] P. Uetz, L. Glot, G. Cagney et al., βA comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae,β Nature, vol. 403, no. 6770, pp. 623β627, 2000. [14] A.-C. Gavin, P. Aloy, P. Grandi et al., βProteome survey reveals modularity of the yeast cell machinery,β Nature, vol. 440, no. 7084, pp. 631β636, 2006. [15] Y. Ho, A. Gruhler, A. Heilbut et al., βSystematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry,β Nature, vol. 415, no. 6868, pp. 180β183, 2002. [16] A.-C. Gavin, M. BΒ¨osche, R. Krause et al., βFunctional organization of the yeast proteome by systematic analysis of protein complexes,β Nature, vol. 415, no. 6868, pp. 141β147, 2002. [17] N. J. Krogan, G. Cagney, H. Yu et al., βGlobal landscape of protein complexes in the yeast Saccharomyces cerevisiae,β Nature, vol. 440, no. 7084, pp. 637β643, 2006. [18] S. Kerrien, Y. Alam-Faruque, B. Aranda et al., βIntActβopen source resource for molecular interaction data,β Nucleic Acids Research, vol. 35, no. 1, pp. D561βD565, 2007. [19] H. W. Mewes, D. Frishman, U. GΒ¨uldener et al., βMIPS: a database for genomes and protein sequences,β Nucleic Acids Research, vol. 30, no. 1, pp. 31β34, 2002. [20] I. Xenarios, Ε. SalwΒ΄Δ±nski, X. J. Duan, P. Higney, S.-M. Kim, and D. Eisenberg, βDIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions,β Nucleic Acids Research, vol. 30, no. 1, pp. 303β305, 2002.