Interdiscip Sci Comput Life Sci (2013) 5: 167–174 DOI: 10.1007/s12539-013-0171-z
Detecting Protein Complexes Based on Relevancy from Protein Interaction Networks 1
2
Hua-Xiong YAO1∗ , Yan YANG1 , Xiao-Long LI2
(Department of Computer Science, Central China Normal University, Wuhan 430079, China) (Department of Electronics and Computer Engineering Technology, Indiana State University, Terre Haute, IN 47809, USA)
Received 7 January 2013 / Revised 30 March 2013 / Accepted 12 June 2013
Abstract: In protein-protein interaction networks, proteins combine into macromolecular complexes to execute essential functions in the cells, such as replication, transcription, protein transport. To solve the problem of detecting protein complexes from protein interaction networks, we used relevant graph and irrelevant graph to represent the relation of connection between a node and a core graph. We defined a variable Relevancy to represent whether a node had a dense or loose connection to a core graph. Then we proposed the Relevancy Judgment algorithm to detecting protein complexes from protein interaction networks. Our algorithm decided whether a node belonged to a protein complex through judging the relevancy between core graph and nodes out of core graph. Experiment results show that our algorithm has an excellent performance in both accuracy and hit rate. Key words: protein complexes, protein interaction networks, relevancy judgement, PPI.
1 Introduction High-throughput technologies make it possible to understand the abundant information of protein-protein interaction (PPI) networks. The protein-protein interactions of a given organism are usually modeled by a network, called protein-protein interaction network, highlighting the mutual interactions between pairs of proteins (Morcos et al., 2010). Proteins combine into macromolecular complexes to execute essential functions in the cells, such as replication, transcription, and protein transport (Li et al., 2007). A protein complex is composed of a group of two or more proteins that are associated by stable proteinprotein interactions. Protein complexes play important roles in the formation of complicated biological functions. Detecting the protein complexes from protein interaction network will help find the building modules of the protein network and predict the function of uncharacterized proteins. It is a great challenge to effectively analyze the massive data for biologically meaningful protein complex detection. There are many studies on detecting protein complexes from protein interaction networks. Roth laboratory proposed a Complexpander approach, using a confidence-weighted graph of protein interactions to ∗
Corresponding author. E-mail:
[email protected] predict new members of protein complexes (Asthana et al., 2004). Then Huang et al. (2007) improved the methodology. Wu and Hu (2007) presented a ModuleBuilder Algorithm to discover a protein sub-network, which could decide whether a node belongs to the subnetworks by comparing the in-module degree and outmodule degree. Pei and Zhang (2007) introduced a seed-refine algorithm, using a subgraph quality measure, a two-layer heuristic to find seeds and a subgraph refinement method. Feng et al. (2011) applied a maxflow-based approach to identify protein complexes in protein interaction networks. In this paper, we firstly introduced the motivation of our algorithm in Section 2. Then we used variables and formulas to state the problem of protein interaction networks in Section 3. In Section 4, we proposed a relevance judgment algorithm to detect protein complexes from PPI networks. Experiment results and performance analysis are followed in Section 5. Finally, we drew our conclusions in Section 6.
2 Motivation Proteins are likely to form closely coupled protein complexes as functional units to participate in biological processes (Pei and Zhang, 2007). So protein complexes can be roughly considered as dense subgraphs of the protein interaction network, i.e. coherent sets of proteins that are densely connected within themselves
168
Interdiscip Sci Comput Life Sci (2013) 5: 167–174
but loosely connected with other proteins. We can consider a protein interaction network as a undirected graph G(V, E), where V represents the set of proteins and E represents the set of interactions. Then a protein complex is seemed as a subgraph Gc (Vc , Ec ), which is a dense graph belonging to G(V, E). So the problem of detecting protein complexes from protein interaction networks is converted to the problem of how to find a dense network from a whole graph. Fig. 1 shows the procedure of finding a dense graph from a graph G based on a seed node. Firstly, based on the neighbor nodes of the seed, the core graph Go (Vo , Eo ) is to be found which has the most connections among themselves. Then whether other nodes are members of a complex graph Gc is to be decided based on their connections to the core graph. G Go
Gc
seed
Fig. 1
Procedure of finding a dense graph based on seed.
Hu and Wu (2007) proposed a ModuleBuilder (MB) algorithm and Pei and Zhang (2007) presented a SeedRefine (SR) algorithm, which are both examples of using graph theories to solve the problem of detecting protein complexes. SR decides whether a node belongs to a complex graph through adding or deleting a node to the core graph, and checking whether it improves the quality of the density of graph. MB makes the decision through the comparison of the in-module degree (Kin ) and out-module degree (Kout ). As shown in Fig. 2, among all edges connected to node N1 , solid lines represent edges between N1 and nodes in core graph Go , and dashed lines represent edges between N1 and nodes out of core graph Go . Kin is the number of solid lines, and Kout is the number of dashed lines. N1 belongs to the complex graph if Kin is bigger than Kout . Go(Vo, Eo) seed
Kin Kout N1
Fig. 2
Kin and Kout in ModuleBuilder.
Since protein complexes have the character of connecting densely within and loosely out of themselves
as mentioned in the first paragraph of this section, we should decide whether a node in graph G has a dense connection or loose connection to core graph Go . Through the detailed analysis to the numerous data of protein interaction networks, we find PPI networks are characterized that some nodes have many connections as well as some have very few, even only one connection (we use rare connections to represent this kind of connection). SR only considers connections into the core graph, but ignores connections out of the core graph which might be their vital part. MB directly compares Kin with Kout , in spite of Kout could contain some rare connections which play an insignificant role in networks while contrarily lower the performance of networks.
3 Problem statement To detect protein complexes accurately from protein interaction networks, we propose two concepts in this section, Relevant Graph and Irrelevant Graph, and use the variable Relevancy to decide whether a node has a dense or loose connection to the core graph. In a protein interaction networks G(V, E), we assume that Go (Vo , Eo ) represents the core graph based on the assigned seed node S, where Vo is the set of core nodes chosen from all adjacent nodes of node S, and Eo is the set of edges among these core nodes. To a node N1 which is not in Go , adjacent graph GADJ (VADJ , EADJ ) means a graph consists of all its neighbor nodes and itself, as showed in formula (1). VADJ = {u|∀ (u, N1 ) ⊂ E} ∪ {N1 } EADJ = {(u, v) |∀u, v ∈ VADJ }
(1)
Relevant Graph GR (VR , ER ) means a graph consists of N1 and nodes connected to core graph Go , as shown in formula (2). VR = VADJ ∩ Vo ∪ {N1 } ER = {(u, v) |∀u, v ∈ VR }
(2)
Irrelevant Graph GIR (VIR , EIR ) means a graph consists of N1 and nodes connected to core graph Go , as shown in formula (3). VIR = VADJ − VADJ ∩ Vo EIR = {(u, v) |∀u, v ∈ VIR }
(3)
To better understand the meaning of these different graphs, we draw them in Fig. 3. In the figure, the left horizontal oval is core graph Go (Vo , Eo ), the right horizontal oval is adjacent graph GADJ (VADJ , EADJ ), the left vertical oval is Relevant Graph GR (VR , ER ), and the right vertical oval is Irrelevant Graph GIR (VIR , EIR ). Then we define a variable Relevancy to represent the relation of node N1 to core graph Go , as shown in formula (4): Relevancy = |ER | − |EIR | (4)
Interdiscip Sci Comput Life Sci (2013) 5: 167–174 GR
GIR
Go seed
N1
GADJ
169
vancy of every node out of Go . If the relevancy of node Ni is positive, add Ni into Go , and refresh the core graph and judge the relevancy again. If the relevancy of node Ni is negative, a different node is judged until every node has a negative relevancy, as the data flow shown in Fig. 5. Refresh GO(VO, EO)
Fig. 3
Relevant Graph and Irrelevant Graph.
Y
Vo=φ? N
If Relevancy > 0, it means node N 1 densely connects to core graph Go (Vo , Eo ). Then node N 1 can be included into the complex S. If Relevancy ≤ 0, it means node N 1 loosely connects to Go (Vo , Eo ). Then node N 1 cannot be included into the complex S.
Relevancy (Ni)>0? Ni∈Vo {Ni} +Vo→Vo Return GO(VO, EO)
Based on the judgment of relevancy, we propose the Relevancy Judgment (RJ) algorithm to detecting protein complexes from PPI networks. RJ algorithm mainly includes two parts: identify core graph with a given seed node, and get protein complex through judging the relevancy between core graph and nodes out of them. We apply the same method to identify core graph as Hu and Wu (2007) proposed. As the data flow shown in Fig. 4, firstly, find all the adjacent nodes of seed node and compose these nodes and seed into adjacent graph GADJ (VADJ , EADJ ). Then compute the in-module degree (Kin ) for every node in the adjacent graph except seed, and get the minimal (Kmin ) and the maximal (Kmax ) value of Kin . Afterwards, delete the nodes with minimal Kin from GADJ , and repeat this action until Kmin equals to Kmax . Finally, compose the rest nodes into a graph Go (Vo , Eo ), which is the result for identifying core graph. After we get the core graph Go , we judge the rele-
Find adjacent graph GADJ (VADJ, EADJ) of seed node
Compute Kin for all nodes in VADJ except seed
N
Fig. 5
Delete the node with Kmin fromVADJ
Return Go(Vo, Eo)
Data flow of identifying core graph.
Data flow of Relevant Judgment algorithm.
5 Experiments In this section, we apply the Relevance Judgment (RJ) algorithm to the yeast interaction network. We download the Saccharomyces cerevisiae data set from DIP (Xenarios et al., 2002), which has 5078 proteins and 22418 interactions. For the convenience of comparison, we chose four same seed proteins as ModuleBuilder (MB) algorithm (Hu and Wu, 2007): TAF6, NOT3, RFC2, and ARP3. Using the four proteins as seed, we find their corresponding sub-networks by RJ algorithm. The sub-networks are visualized by Cytoscape (Smoot et al., 2010), which is an open source platform for complex network analysis and visualization. We search seed proteins’ related complexes from the complexes database CYC2008 (Pu et al., 2008), which is a comprehensive catalogue (like mips database) of manually curated 408 heteromeric protein complexes in Saccharomyces cerevisiae reliably backed by smallscale experiments from the literature. Following, we will list experiment results of four seed proteins. Then, we will compare the performance of our algorithm to MB algorithm. 5.1
Y
Fig. 4
Vo−{Ni}→Vo
Y
4 Algorithm
Kmin =Kmax?
N
TAF6
TAF6 is a Subunit of TFIID and SAGA complexes, involved in transcription initiation of RNA polymerase II and in chromatin modification. Searching TAF6 related complexes from CYC2008, we find 3 results: SAGA complex, TFIID complex, SLIK complex. SAGA complex is a Spt-Ada-Gcn5-acetyltransferase complex, including 20 nodes: ADA2, CHD1, GCN5, HFI1, NGG1, SGF11, SGF29, SGF73, SPT20, SPT3,
170
Interdiscip Sci Comput Life Sci (2013) 5: 167–174
SPT7, SPT8, SUS1, TAF10, TAF12, TAF5, TAF6, TAF9, TRA1, and UBP8 (Brown et al., 2000). TFIID complex is composed of TATA binding protein (TBP) and TBP associated factors (TAFs). Most of the TAFs are conserved across species. In TATAcontaining promoters for RNA polymerase II (Pol II), TFIID is believed to recognize at least two distinct elements, the TATA element and a downstream promoter element. TFIID is also involved in recognition of TATA-less Pol II promoters. Binding of TFIID to DNA is necessary but not sufficient for transcription initiation from most RNA polymerase II promoters. It includes 15 nodes: SPT15, TAF1, TAF10, TAF11, TAF12, TAF13, TAF14, TAF2, TAF3, TAF4, TAF5, TAF6, TAF7, TAF8, and TAF9 (Locker, 1996). Table 1 Unitprotkb
Proteins of TAF6 sub-networks Gene
Q02336
ADA2
P07248
ADR1
P03069
GCN4
Q03330
GCN5
Q12060
HFI1
P32494
NGG1
P25554
SGF29
P53165
SGF73
P13393
SPT15
P50875
SPT20
P06844
SPT3
P35177
SPT7
P38915
SPT8
P46677
TAF1
Q12030
taf10
Q04226
taf11
Q03761
TAF12
P11747
taf13
P23255
TAF2
Q12297
taf3
P50105
taf4
P38129
TAF5
P53040
taf6
Q05021
TAF7
Q03750
TAF8
Q05027
taf9
P50102
UBP8
DIP-524N
VP16
SAGA
TFIID
SLIK
√
√
√
√
√
√
√
√
√ √
√ √
Using TAF6 as seed, TAF6 sub-network is detected through the Relevance Judgment algorithm, as shown in Fig. 6, which has 28 proteins and 108 interactions. As shown in Table 1, the sub-network detected by our algorithm contains 28 proteins, in which 25 proteins belong to TAF6 related complexes. Separately, the subnetwork has 16 of 20 SAGA complex proteins, only missing CHD1, SGF11, SUS1 and TRA1; 14 of 15 TFIID complex proteins, only missing TAF14; 13 of 16 SLIK complex proteins, only missing CHD1, RTG2 and TRA1. 5.2
NOT3
NOT3 is a Subunit of the CCR4-NOT complex, which is a global transcriptional regulator with roles in transcription initiation and elongation and in mRNA degradation. Searching NOT3 related complexes from CYC2008, we find only 1 result: CCR4-NOT core complex. In Saccharomyces cerevisiae, the CCR4-NOT core complex comprises 9 nodes: Ccr4p, Caf1p, Caf40p, Caf130p, Not1p, Not2p, Not3p, Not4p, and Not5p (Liu et al., 2001). Table 2
Proteins of NOT3 sub-network
Uniprotkb
Gene
P06102
NOT3
P34909
MOT2
√
√
√
√
P25655
CDC39
√
√
P39008
POP2
P53829
CAF40
P53280
CAF130
√ √ √
√ √
√
√ √ √ √ √
√
√
√
√
√
√ √
√ √ √ √ √
Using NOT3 as seed, NOT3 sub-network is detected through the Relevance Judgment algorithm, as shown in Fig. 7, which has 6 proteins and 15 interactions. As shown in Table 2, the sub-network detected by our algorithm contains 6 proteins, which all belong to CCR4NOT complex. Among 9 CCR4-NOT complex proteins, our algorithm misses 3 proteins: CCR4, CDC36 and NOT5. 5.3
√ √
√
√
√ √
CCR4-NOT
√
√
SLIK complex is a SAGA-type histone acetyltransferase complex, involved in the yeast retrograde response pathway, which is important for gene expression changes during mitochondrial dysfunction. It includes 16 nodes: ADA2, CHD1, GCN5, HFI1, NGG1, RTG2, SGF73, SPT20, SPT3, SPT7, TAF10, TAF12, TAF5, TAF6, TAF9, and TRA1 (Wu and Winston, 2002).
RFC2
RFC is a Subunit of heteropentameric Replication factor C (RF-C), which is a DNA binding protein and ATPase that acts as a clamp loader of the proliferating cell nuclear antigen (PCNA) processivity factor for DNA polymerases delta and epsilon. Searching RFC2 related complexes from CYC2008, we find 4 results: DNA replication factor C (RFC) complex, Rad17 RFClike complex, Ctf18 RFC-like complex, and Elg1 RFClike complex. DNA replication factor C (RFC) complex is a complex of five polypeptides in eukaryotes, and two in prokaryotes, that loads the DNA polymerase proces-
Interdiscip Sci Comput Life Sci (2013) 5: 167–174
taf4
TAF12
171
TAF7 UBP8
SGF73
taf11
taf13
taf3 TAF5
taf9
TAF2
taf10
ADR1
taf6
TAF8
SPT7 GCN4
SPT8 ADA2 SPT20
SPT3
GCN5
TAF1
VP16
Fig. 6
CAF40
MOT2
POP2 CDC39
Fig. 7
NGG1
TAF6 sub-network.
CAF130
NOT3
SGF29
HFI1
SPT15
NOT3 sub-network.
sivity factor proliferating cell nuclear antigen (PCNA) onto DNA, thereby permitting processive DNA synthesis catalyzed by DNA polymerase. In Saccharomyces cerevisiae, it has 5 nodes: RFC5, RFC2, RFC3, RFC4, and RFC1 (Kim and MacNeill, 2003). Rad17 RFC-like complex is a pentameric protein complex related to replication factor C, which loads a trimeric complex of checkpoint proteins (known as the checkpoint clamp or 9-1-1 complex) onto DNA at damage sites; functions in DNA damage cell cycle checkpoints. In Saccharomyces cerevisiae, it has 5 nodes: RFC5, RAD24, RFC2, RFC3, and RFC4 (Kim and MacNeill, 2003).
Ctf18 RFC-like complex is a heptameric complex related to replication factor C, which loads the DNA polymerase processivity factor proliferating cell nuclear antigen (PCNA) onto DNA and plays a vital role in chromosome cohesion. In Saccharomyces cerevisiae, it has 7 nodes: RFC5, DCC1, CTF8, RFC2, CTF18, RFC3, and RFC4 (Kim and MacNeill, 2003). Elg1 RFC-like complex is a pentameric complex related to replication factor C, which loads the DNA polymerase processivity factor proliferating cell nuclear antigen (PCNA) onto DNA and has roles in telomere length regulation and other aspects of genome stability. In Saccharomyces cerevisiae, it has 5 nodes: RFC5, RFC2, RFC3, RFC4, and ELG1(Kim and MacNeill, 2003). Using RFC2 as seed, RFC2 sub-network is detected through the relevance judgment algorithm, as shown in Fig. 8, which has 8 proteins and 21 interactions. As shown in Table 3, the sub-network detected by our algorithm contains 8 proteins, which all belong to RFC2 related complexes. Separately, the sub-network has all 5 RFC complex proteins; 4 of 5 Rad17 RFC-like complex proteins, only missing RAD24; 6 of 7 Ctf18 RFClike complex proteins, only missing DCC1; all 5 Elg1 RFC-like complex proteins. 5.4
ARP3
ARP3 is an essential component of the Arp2/3 complex, which is a highly conserved actin nucleation cen-
172
Interdiscip Sci Comput Life Sci (2013) 5: 167–174
5.5
ELG1 RFC1 CTF8
CTF18 RFC2
RFC4 RFC5 RFC3
Fig. 8 Table 3 Unitprotkb
RFC2 sub-network.
Proteins of RFC2 sub-network Gene
P40348
RFC2
P38251
RFC5
P38877
CTF8
P38629
RFC3
P40339
RFC4
P49956
CTF18
Q12050
ELG1
P38630
RFC1
RFC
Rad17
Ctf18
Elg1
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√ √ √
ter required for the motility and integrity of actin patches; involved in endocytosis and membrane growth and polarity. Searching ARP3 related complexes from CYC2008, we find only 1 result: Arp2/3 protein complex. Arp2/3 protein complex is a stable protein complex, and functions in the nucleation of branched actin filaments. It contains 7 nodes: ARP3, ARP2, ARC40, ARC15, ARC18, ARC35, and ARC19 (Kovacs and Yap, 2002). Using ARP3 as seed, ARP3 sub-network is detected through the relevance judgment algorithm, as shown in Fig. 9, which has 7 proteins and 21 interactions. As shown in Table 4, the sub-network detected by our algorithm contains 7 proteins, which perfectly are the same as 7 Arp2/3 protein complex proteins and no missing. Table 4
Performance Analysis
We compare the accurate percent and hit rate between our Relevancy Judgment algorithm and Module Builder algorithm. The accurate percent is, for all detected proteins, the rate of number of those belong to known complexes to number of those all. The hit rate represents rate of how many nodes are detected for a known protein complex to the number of all nodes of that complex. We compare our results with those in the report of Wu and Hu (2007). We only reference the raw detected sub-networks for every seed, but not use their statistics data, precision and recall, which correspond to accurate percent and hit rate. For seed TAF6, Wu and Hu (2007) considered nodes in SAGA complex and SRB complex as correct results, which is unreasonable because SRB complex (Gustafsson and Samuelsson, 2001) doesn’t contain TAF6 at all. The correct nodes should be in SAGA, TFIID and SLIK complex according to the search for seed TAF6 from CYC2008. Furthermore, for seed RFC2, Wu and Hu (2007) only counted the number in RFC complex. Actually, RFC2 has four related complexes from CYC2008, RFC complex, Rad17 RFC-like complex, Ctf18 RFC-like complex, and Elg1 RFC-like complex. So we re-count the hit number and accurate number based on the raw detected sub-networks. Table 5 is the statistics of accurate number for our RJ algorithm and MB algorithm. From the data, subnetworks detected by our algorithm RJ always have much less nodes than by MB algorithm. In TAF6 subnetwork, RJ has 28 nodes while MB has 39; in NOT3, the number is 6 vs. 40; in RFC2, 8 vs. 17; in ARP3, 7 vs. 20. Meanwhile, our algorithm RJ always has more accurate nodes than MB, except NOT3. That is because the CCR4-NOT complex has a low connectivity and the 3 missing nodes by RJ have too many neighbors. Both reasons drop the performance of RJ algorithm. The comparison of accurate percent between RJ and MB is shown in Fig. 10. From the figure, we
ARC35
ARC18
Proteins of ARP3 sub-networks
Unitprotkb
Gene
P47117
ARP3
P32381
ARP2
P38328
ARC40
P40518
ARC15
Q05933
ARC18
P53731
ARC35
P33204
ARC19
Arp2/3 √
ARC19
ARC40
√
ARP3
√ √ √ √ √
ARP2
Fig. 9
ARC15
ARP3 sub-network.
Interdiscip Sci Comput Life Sci (2013) 5: 167–174
173
Table 5
Statistics of accurate number for RJ and MB
Sub-network
Total
Accurate
Percent (%)
100 90 80 70 60 50 40 30 20 10 0
RJ MB
%
show that our algorithm RJ has a significant improvement to MB on accurate percent, and that RJ even has a completely accurate identification in small complexes while MB only has 30% around for it includes too many irrelevant nodes.
SAGA TFIID
TAF6
NOT3
RFC2
ARP3
28
25
89.3
39
22
56.4
RJ
6
6
100
MB
40
9
22.5
RJ
8
8
100
MB
17
5
29.4
RJ
7
7
100
MB
20
7
35
RJ MB
%
100 90 80 70 60 50 40 30 20 10 0
RJ MB
Fig. 10
TAF6
NOT3
RFC2
ARP3
Comparison of accurate percent between RJ and MB.
Table 6 is the statistics of hit number of our algorithm RJ and MB. From the table, we can find that RJ and MB can identify most of complex nodes. Especially in some small complexes, they have detected all nodes, such as RFC complex, Elg1 RFC-like complex and ARP2/3 complex. Fig. 11 compares their hit rate for different complexes. In big complexes such as TAF6 related, RJ performs better than MB, while worse in some small complexes such as CCR4-Not and Rad17 RFC-like complex. This is because it is easier for MB to hit an additional node in a much bigger detected sub-network than RJ.
6 Conclusion In this paper we discussed the character of protein complexes and problems of some detecting algorithm. Proteins are densely connected within their complexes, while loosely connected with other proteins. We used relevant graph and irrelevant graph to represent the relation of connection between a node and a core graph. We defined a variable Relevancy to represent whether a node has a dense or loose connection with a core graph.
Fig. 11
SLIK RFC CTF18 ARP2/3 CCR4-NOT RAD17 ELG1
Comparison of hit rate between RJ and MB.
A node is deemed to have a dense connection with a core graph if relevancy is positive, which means the edges in relevant graph are more than edges in irrelevant graph. Then we proposed the Relevancy Judgment algorithm to detecting protein complexes from protein interaction networks. RJ algorithm decides whether a node belongs to a protein complex through judging the relevancy between core graph and nodes out of core graph. Applying RJ algorithm to the yeast interaction network, we assigned four seed proteins to detect related complexes. Compared the sub-network found by our RJ algorithm with related complexes searched from complex databases, we found that RJ algorithm has an excellent accuracy on acquired proteins, which is 89.6% in big complexes such as TAF6 sub-network, and 100% in small complexes such as the other three sub-network. This result got a significant improvement on MB algorithm, which sub-networks contain much more nodes than our algorithm. In the respect of hit rate, our algorithm detected more proteins of a whole complex than MB in big complexes such as TAF6-related complexes, but in small complexes it performed worse than MB, for it is easier for MB to hit an additional node in its bigger sub-networks. So in the future, we will further analysis the characters of PPI networks and improve
Table 6
Seed
TAF6
NOT3
RFC2
ARP3
Comparison of hit number for different complexes Complex
Total
Hit
Rate
Hit
Rate
(RJ)
(RJ)
(MB)
(MB)
SAGA
20
16
80%
14
70%
TFIID
15
14
93.3%
13
86.7%
SLIK
16
13
81.3%
13
81.3%
CCR4-NOT
9
6
66.7%
9
100% 100%
RFC
5
5
100%
5
RAD17
5
4
80%
5
100%
CTF18
7
6
85.7%
6
85.7%
ELG1
5
5
100%
5
100%
ARP2/3
7
7
100%
7
100%
174
the hit rate of our algorithm.
Acknowledgements The work of this paper was supported by the National Natural Science Foundation of China under the grant 61072051 and 61202470, the Natural Science Foundation of Hubei Province, China under the grant 2013CKB024, and partly supported by the Central China Normal University under the grant CCNU11A01015 and CCNU11A01014.
References [1] Asthana, S., King, O.D., Gibbons, F.D., Roth, F.P. 2004. Predicting protein complex membership using probabilistic network reliability. Genome Res 14, 11701175. [2] Brown, C.E., Lechner, T., Howe, L., Workman, J.L. 2000. The many HATs of transcription coactivators. Trends Biochem Sci 25, 15-19.
Interdiscip Sci Comput Life Sci (2013) 5: 167–174 [9] Li, W., Liu, Y., Huang, H.-C., Peng, Y., Lin, Y., Ng, W.-K., Ong, K.-L. 2007. Dynamical systems for discovering protein complexes and functional modules from biological networks. IEEE/ACM Trans Comput Bio Bi 4, 233-250. [10] Liu, H.Y., Chiang, Y.C., Pan, J., Chen, J., Salvadore, C., Audino, D.C., Badarinarayana, V., Palaniswamy, V., Anderson, B., Denis, C.L. 2001. Characterization of CAF4 and CAF16 reveals a functional connection between the CCR4-NOT complex and a subset of SRB proteins of the RNA polymerase II holoenzyme. J Biol Chem 276, 7541-7548. [11] Locker, J. 1996. Transcription Factors: Essential Data. John Wiley & Sons, Chichester, UK. [12] Morcos, F., Sikora, M., Alber, M.S., Kaiser, D., Izaguirre, J.A. 2010. Belief propagation estimation of protein and domain interactions using the sum-product algorithm. IEEE Trans Inform Theory 56, 742-755. [13] Pei, P., Zhang, A. 2007. A “seed-refine” algorithm for detecting protein complexes from protein interaction data. IEEE Trans Nanobiosci 6, 43-50.
[3] Feng, J., Jiang, R., Jiang, T. 2011. A Max-flowbased approach to the identification of protein complexes using protein interaction and microarray data. IEEE/ACM Trans Comput Bio Bi 8, 621-634.
[14] Pu, S., Wong, J., Turner, B., Cho, E., Wodak, S.J. 2008. Up-to-date catalogues of yeast protein complexes. Nucl Acid Res 37, 825-831.
[4] Gustafsson, C.M., Samuelsson, T. 2001. Mediator - a universal complex in transcriptional regulation. Mol Microbiol 41, 1-8.
[15] Smoot, M., Ono, K., Ruscheinski, J., Wang, P.-L., Ideker, T. 2011. Cytoscape 2.8: New features for data integration and network visualization. Bioinformatics 27, 431-432.
[5] Hu, X., Wu, D.D. 2007. Data mining and predictive modeling of biomolecular network from biomedical literature databases. IEEE/ACM Trans Comput Bio Bi 4, 251-263. [6] Huang, H., Zhang, L.V., Roth, F.P., Bader, J.S. 2007. Probabilistic paths in protein interaction networks. In: Proceedings of the RECOMB Conferences on Systems Biology and Computational Proteomics, Oakland, USA, 14-28. [7] Kim, J., MacNeill, S.A. 2003. Genome stability: A new member of the RFC family. Curr Biol 13, R873-R875. [8] Kovacs, E.M., Yap, A.S. 2002. The web and the rock: Cell adhesion and the ARP2/3 complex. Dev Cell 3, 760-761.
[16] Wu, D., Hu, X. 2007. Topological analysis and subnetwork mining of protein-protein interactions. In: Taniar, D. (Ed.) Research and Trends in Data Mining Technology and Application. Idea Group Publisher, Hershey, USA, 209-240. [17] Wu, P.Y., Winston, F. 2002. Analysis of Spt7 function in the Saccharomyces cerevisiae SAGA coactivator complex. Mol Cell Biol 22, 5367-5379. [18] Xenarios, I., Salwinski, L., Duan, X.J., Higney, P., Kim, S.M., Eisenberg, D. 2002. Dip, the database of interacting proteins: A research tool for studying cellular networks of protein interactions. Nucl Acid Res 30, 303-305.