Detecting protein complexes based on relevancy from protein interaction networks.

Interdiscip Sci Comput Life Sci (2013) 5: 167–174 DOI: 10.1007/s12539-013-0171-z

Detecting Protein Complexes Based on Relevancy from Protein Interaction Networks 1

2

Hua-Xiong YAO1∗ , Yan YANG1 , Xiao-Long LI2

(Department of Computer Science, Central China Normal University, Wuhan 430079, China) (Department of Electronics and Computer Engineering Technology, Indiana State University, Terre Haute, IN 47809, USA)

Received 7 January 2013 / Revised 30 March 2013 / Accepted 12 June 2013

Abstract: In protein-protein interaction networks, proteins combine into macromolecular complexes to execute essential functions in the cells, such as replication, transcription, protein transport. To solve the problem of detecting protein complexes from protein interaction networks, we used relevant graph and irrelevant graph to represent the relation of connection between a node and a core graph. We defined a variable Relevancy to represent whether a node had a dense or loose connection to a core graph. Then we proposed the Relevancy Judgment algorithm to detecting protein complexes from protein interaction networks. Our algorithm decided whether a node belonged to a protein complex through judging the relevancy between core graph and nodes out of core graph. Experiment results show that our algorithm has an excellent performance in both accuracy and hit rate. Key words: protein complexes, protein interaction networks, relevancy judgement, PPI.

1 Introduction High-throughput technologies make it possible to understand the abundant information of protein-protein interaction (PPI) networks. The protein-protein interactions of a given organism are usually modeled by a network, called protein-protein interaction network, highlighting the mutual interactions between pairs of proteins (Morcos et al., 2010). Proteins combine into macromolecular complexes to execute essential functions in the cells, such as replication, transcription, and protein transport (Li et al., 2007). A protein complex is composed of a group of two or more proteins that are associated by stable proteinprotein interactions. Protein complexes play important roles in the formation of complicated biological functions. Detecting the protein complexes from protein interaction network will help find the building modules of the protein network and predict the function of uncharacterized proteins. It is a great challenge to effectively analyze the massive data for biologically meaningful protein complex detection. There are many studies on detecting protein complexes from protein interaction networks. Roth laboratory proposed a Complexpander approach, using a confidence-weighted graph of protein interactions to ∗

Corresponding author. E-mail: [email protected]

predict new members of protein complexes (Asthana et al., 2004). Then Huang et al. (2007) improved the methodology. Wu and Hu (2007) presented a ModuleBuilder Algorithm to discover a protein sub-network, which could decide whether a node belongs to the subnetworks by comparing the in-module degree and outmodule degree. Pei and Zhang (2007) introduced a seed-refine algorithm, using a subgraph quality measure, a two-layer heuristic to find seeds and a subgraph refinement method. Feng et al. (2011) applied a maxflow-based approach to identify protein complexes in protein interaction networks. In this paper, we firstly introduced the motivation of our algorithm in Section 2. Then we used variables and formulas to state the problem of protein interaction networks in Section 3. In Section 4, we proposed a relevance judgment algorithm to detect protein complexes from PPI networks. Experiment results and performance analysis are followed in Section 5. Finally, we drew our conclusions in Section 6.

2 Motivation Proteins are likely to form closely coupled protein complexes as functional units to participate in biological processes (Pei and Zhang, 2007). So protein complexes can be roughly considered as dense subgraphs of the protein interaction network, i.e. coherent sets of proteins that are densely connected within themselves

168

Interdiscip Sci Comput Life Sci (2013) 5: 167–174

but loosely connected with other proteins. We can consider a protein interaction network as a undirected graph G(V, E), where V represents the set of proteins and E represents the set of interactions. Then a protein complex is seemed as a subgraph Gc (Vc , Ec ), which is a dense graph belonging to G(V, E). So the problem of detecting protein complexes from protein interaction networks is converted to the problem of how to find a dense network from a whole graph. Fig. 1 shows the procedure of finding a dense graph from a graph G based on a seed node. Firstly, based on the neighbor nodes of the seed, the core graph Go (Vo , Eo ) is to be found which has the most connections among themselves. Then whether other nodes are members of a complex graph Gc is to be decided based on their connections to the core graph. G Go

Gc

seed

Fig. 1

Procedure of finding a dense graph based on seed.

Hu and Wu (2007) proposed a ModuleBuilder (MB) algorithm and Pei and Zhang (2007) presented a SeedRefine (SR) algorithm, which are both examples of using graph theories to solve the problem of detecting protein complexes. SR decides whether a node belongs to a complex graph through adding or deleting a node to the core graph, and checking whether it improves the quality of the density of graph. MB makes the decision through the comparison of the in-module degree (Kin ) and out-module degree (Kout ). As shown in Fig. 2, among all edges connected to node N1 , solid lines represent edges between N1 and nodes in core graph Go , and dashed lines represent edges between N1 and nodes out of core graph Go . Kin is the number of solid lines, and Kout is the number of dashed lines. N1 belongs to the complex graph if Kin is bigger than Kout . Go(Vo, Eo) seed

Kin Kout N1

Fig. 2

Kin and Kout in ModuleBuilder.

Since protein complexes have the character of connecting densely within and loosely out of themselves

as mentioned in the first paragraph of this section, we should decide whether a node in graph G has a dense connection or loose connection to core graph Go . Through the detailed analysis to the numerous data of protein interaction networks, we find PPI networks are characterized that some nodes have many connections as well as some have very few, even only one connection (we use rare connections to represent this kind of connection). SR only considers connections into the core graph, but ignores connections out of the core graph which might be their vital part. MB directly compares Kin with Kout , in spite of Kout could contain some rare connections which play an insignificant role in networks while contrarily lower the performance of networks.

3 Problem statement To detect protein complexes accurately from protein interaction networks, we propose two concepts in this section, Relevant Graph and Irrelevant Graph, and use the variable Relevancy to decide whether a node has a dense or loose connection to the core graph. In a protein interaction networks G(V, E), we assume that Go (Vo , Eo ) represents the core graph based on the assigned seed node S, where Vo is the set of core nodes chosen from all adjacent nodes of node S, and Eo is the set of edges among these core nodes. To a node N1 which is not in Go , adjacent graph GADJ (VADJ , EADJ ) means a graph consists of all its neighbor nodes and itself, as showed in formula (1). VADJ = {u|∀ (u, N1 ) ⊂ E} ∪ {N1 } EADJ = {(u, v) |∀u, v ∈ VADJ }

(1)

Relevant Graph GR (VR , ER ) means a graph consists of N1 and nodes connected to core graph Go , as shown in formula (2). VR = VADJ ∩ Vo ∪ {N1 } ER = {(u, v) |∀u, v ∈ VR }

(2)

Irrelevant Graph GIR (VIR , EIR ) means a graph consists of N1 and nodes connected to core graph Go , as shown in formula (3). VIR = VADJ − VADJ ∩ Vo EIR = {(u, v) |∀u, v ∈ VIR }

(3)

To better understand the meaning of these different graphs, we draw them in Fig. 3. In the figure, the left horizontal oval is core graph Go (Vo , Eo ), the right horizontal oval is adjacent graph GADJ (VADJ , EADJ ), the left vertical oval is Relevant Graph GR (VR , ER ), and the right vertical oval is Irrelevant Graph GIR (VIR , EIR ). Then we define a variable Relevancy to represent the relation of node N1 to core graph Go , as shown in formula (4): Relevancy = |ER | − |EIR | (4)

Interdiscip Sci Comput Life Sci (2013) 5: 167–174 GR

GIR

Go seed

N1

GADJ

169

vancy of every node out of Go . If the relevancy of node Ni is positive, add Ni into Go , and refresh the core graph and judge the relevancy again. If the relevancy of node Ni is negative, a different node is judged until every node has a negative relevancy, as the data flow shown in Fig. 5. Refresh GO(VO, EO)

Fig. 3

Relevant Graph and Irrelevant Graph.

Y

Vo=φ? N

If Relevancy > 0, it means node N 1 densely connects to core graph Go (Vo , Eo ). Then node N 1 can be included into the complex S. If Relevancy ≤ 0, it means node N 1 loosely connects to Go (Vo , Eo ). Then node N 1 cannot be included into the complex S.

Relevancy (Ni)>0? Ni∈Vo {Ni} +Vo→Vo Return GO(VO, EO)

Based on the judgment of relevancy, we propose the Relevancy Judgment (RJ) algorithm to detecting protein complexes from PPI networks. RJ algorithm mainly includes two parts: identify core graph with a given seed node, and get protein complex through judging the relevancy between core graph and nodes out of them. We apply the same method to identify core graph as Hu and Wu (2007) proposed. As the data flow shown in Fig. 4, firstly, find all the adjacent nodes of seed node and compose these nodes and seed into adjacent graph GADJ (VADJ , EADJ ). Then compute the in-module degree (Kin ) for every node in the adjacent graph except seed, and get the minimal (Kmin ) and the maximal (Kmax ) value of Kin . Afterwards, delete the nodes with minimal Kin from GADJ , and repeat this action until Kmin equals to Kmax . Finally, compose the rest nodes into a graph Go (Vo , Eo ), which is the result for identifying core graph. After we get the core graph Go , we judge the rele-

Find adjacent graph GADJ (VADJ, EADJ) of seed node

Compute Kin for all nodes in VADJ except seed

N

Fig. 5

Delete the node with Kmin fromVADJ

Return Go(Vo, Eo)

Data flow of identifying core graph.

Data flow of Relevant Judgment algorithm.

5 Experiments In this section, we apply the Relevance Judgment (RJ) algorithm to the yeast interaction network. We download the Saccharomyces cerevisiae data set from DIP (Xenarios et al., 2002), which has 5078 proteins and 22418 interactions. For the convenience of comparison, we chose four same seed proteins as ModuleBuilder (MB) algorithm (Hu and Wu, 2007): TAF6, NOT3, RFC2, and ARP3. Using the four proteins as seed, we find their corresponding sub-networks by RJ algorithm. The sub-networks are visualized by Cytoscape (Smoot et al., 2010), which is an open source platform for complex network analysis and visualization. We search seed proteins’ related complexes from the complexes database CYC2008 (Pu et al., 2008), which is a comprehensive catalogue (like mips database) of manually curated 408 heteromeric protein complexes in Saccharomyces cerevisiae reliably backed by smallscale experiments from the literature. Following, we will list experiment results of four seed proteins. Then, we will compare the performance of our algorithm to MB algorithm. 5.1

Y

Fig. 4

Vo−{Ni}→Vo

Y

4 Algorithm

Kmin =Kmax?

N

TAF6

TAF6 is a Subunit of TFIID and SAGA complexes, involved in transcription initiation of RNA polymerase II and in chromatin modification. Searching TAF6 related complexes from CYC2008, we find 3 results: SAGA complex, TFIID complex, SLIK complex. SAGA complex is a Spt-Ada-Gcn5-acetyltransferase complex, including 20 nodes: ADA2, CHD1, GCN5, HFI1, NGG1, SGF11, SGF29, SGF73, SPT20, SPT3,

170


SPT7, SPT8, SUS1, TAF10, TAF12, TAF5, TAF6, TAF9, TRA1, and UBP8 (Brown et al., 2000). TFIID complex is composed of TATA binding protein (TBP) and TBP associated factors (TAFs). Most of the TAFs are conserved across species. In TATAcontaining promoters for RNA polymerase II (Pol II), TFIID is believed to recognize at least two distinct elements, the TATA element and a downstream promoter element. TFIID is also involved in recognition of TATA-less Pol II promoters. Binding of TFIID to DNA is necessary but not sufficient for transcription initiation from most RNA polymerase II promoters. It includes 15 nodes: SPT15, TAF1, TAF10, TAF11, TAF12, TAF13, TAF14, TAF2, TAF3, TAF4, TAF5, TAF6, TAF7, TAF8, and TAF9 (Locker, 1996). Table 1 Unitprotkb

Proteins of TAF6 sub-networks Gene

Q02336

ADA2

P07248

ADR1

P03069

GCN4

Q03330

GCN5

Q12060

HFI1

P32494

NGG1

P25554

SGF29

P53165

SGF73

P13393

SPT15

P50875

SPT20

P06844

SPT3

P35177

SPT7

P38915

SPT8

P46677

TAF1

Q12030

taf10

Q04226

taf11

Q03761

TAF12

P11747

taf13

P23255

TAF2

Q12297

taf3

P50105

taf4

P38129

TAF5

P53040

taf6

Q05021

TAF7

Q03750

TAF8

Q05027

taf9

P50102

UBP8

DIP-524N

VP16

SAGA

TFIID

SLIK

√

√

√

√

√

√

√

√

√ √

√ √

Using TAF6 as seed, TAF6 sub-network is detected through the Relevance Judgment algorithm, as shown in Fig. 6, which has 28 proteins and 108 interactions. As shown in Table 1, the sub-network detected by our algorithm contains 28 proteins, in which 25 proteins belong to TAF6 related complexes. Separately, the subnetwork has 16 of 20 SAGA complex proteins, only missing CHD1, SGF11, SUS1 and TRA1; 14 of 15 TFIID complex proteins, only missing TAF14; 13 of 16 SLIK complex proteins, only missing CHD1, RTG2 and TRA1. 5.2

NOT3

NOT3 is a Subunit of the CCR4-NOT complex, which is a global transcriptional regulator with roles in transcription initiation and elongation and in mRNA degradation. Searching NOT3 related complexes from CYC2008, we find only 1 result: CCR4-NOT core complex. In Saccharomyces cerevisiae, the CCR4-NOT core complex comprises 9 nodes: Ccr4p, Caf1p, Caf40p, Caf130p, Not1p, Not2p, Not3p, Not4p, and Not5p (Liu et al., 2001). Table 2

Proteins of NOT3 sub-network

Uniprotkb

Gene

P06102

NOT3

P34909

MOT2

√

√

√

√

P25655

CDC39

√

√

P39008

POP2

P53829

CAF40

P53280

CAF130

√ √ √

√ √

√

√ √ √ √ √

√

√

√

√

√

√ √

√ √ √ √ √

Using NOT3 as seed, NOT3 sub-network is detected through the Relevance Judgment algorithm, as shown in Fig. 7, which has 6 proteins and 15 interactions. As shown in Table 2, the sub-network detected by our algorithm contains 6 proteins, which all belong to CCR4NOT complex. Among 9 CCR4-NOT complex proteins, our algorithm misses 3 proteins: CCR4, CDC36 and NOT5. 5.3

√ √

√

√

√ √

CCR4-NOT

√

√

SLIK complex is a SAGA-type histone acetyltransferase complex, involved in the yeast retrograde response pathway, which is important for gene expression changes during mitochondrial dysfunction. It includes 16 nodes: ADA2, CHD1, GCN5, HFI1, NGG1, RTG2, SGF73, SPT20, SPT3, SPT7, TAF10, TAF12, TAF5, TAF6, TAF9, and TRA1 (Wu and Winston, 2002).

RFC2

RFC is a Subunit of heteropentameric Replication factor C (RF-C), which is a DNA binding protein and ATPase that acts as a clamp loader of the proliferating cell nuclear antigen (PCNA) processivity factor for DNA polymerases delta and epsilon. Searching RFC2 related complexes from CYC2008, we find 4 results: DNA replication factor C (RFC) complex, Rad17 RFClike complex, Ctf18 RFC-like complex, and Elg1 RFClike complex. DNA replication factor C (RFC) complex is a complex of five polypeptides in eukaryotes, and two in prokaryotes, that loads the DNA polymerase proces-


taf4

TAF12

171

TAF7 UBP8

SGF73

taf11

taf13

taf3 TAF5

taf9

TAF2

taf10

ADR1

taf6

TAF8

SPT7 GCN4

SPT8 ADA2 SPT20

SPT3

GCN5

TAF1

VP16

Fig. 6

CAF40

MOT2

POP2 CDC39

Fig. 7

NGG1

TAF6 sub-network.

CAF130

NOT3

SGF29

HFI1

SPT15

NOT3 sub-network.

sivity factor proliferating cell nuclear antigen (PCNA) onto DNA, thereby permitting processive DNA synthesis catalyzed by DNA polymerase. In Saccharomyces cerevisiae, it has 5 nodes: RFC5, RFC2, RFC3, RFC4, and RFC1 (Kim and MacNeill, 2003). Rad17 RFC-like complex is a pentameric protein complex related to replication factor C, which loads a trimeric complex of checkpoint proteins (known as the checkpoint clamp or 9-1-1 complex) onto DNA at damage sites; functions in DNA damage cell cycle checkpoints. In Saccharomyces cerevisiae, it has 5 nodes: RFC5, RAD24, RFC2, RFC3, and RFC4 (Kim and MacNeill, 2003).

Ctf18 RFC-like complex is a heptameric complex related to replication factor C, which loads the DNA polymerase processivity factor proliferating cell nuclear antigen (PCNA) onto DNA and plays a vital role in chromosome cohesion. In Saccharomyces cerevisiae, it has 7 nodes: RFC5, DCC1, CTF8, RFC2, CTF18, RFC3, and RFC4 (Kim and MacNeill, 2003). Elg1 RFC-like complex is a pentameric complex related to replication factor C, which loads the DNA polymerase processivity factor proliferating cell nuclear antigen (PCNA) onto DNA and has roles in telomere length regulation and other aspects of genome stability. In Saccharomyces cerevisiae, it has 5 nodes: RFC5, RFC2, RFC3, RFC4, and ELG1(Kim and MacNeill, 2003). Using RFC2 as seed, RFC2 sub-network is detected through the relevance judgment algorithm, as shown in Fig. 8, which has 8 proteins and 21 interactions. As shown in Table 3, the sub-network detected by our algorithm contains 8 proteins, which all belong to RFC2 related complexes. Separately, the sub-network has all 5 RFC complex proteins; 4 of 5 Rad17 RFC-like complex proteins, only missing RAD24; 6 of 7 Ctf18 RFClike complex proteins, only missing DCC1; all 5 Elg1 RFC-like complex proteins. 5.4

ARP3

ARP3 is an essential component of the Arp2/3 complex, which is a highly conserved actin nucleation cen-

172


5.5

ELG1 RFC1 CTF8

CTF18 RFC2

RFC4 RFC5 RFC3

Fig. 8 Table 3 Unitprotkb

RFC2 sub-network.

Proteins of RFC2 sub-network Gene

P40348

RFC2

P38251

RFC5

P38877

CTF8

P38629

RFC3

P40339

RFC4

P49956

CTF18

Q12050

ELG1

P38630

RFC1

RFC

Rad17

Ctf18

Elg1

√

√

√

√

√

√

√

√

√

√

√

√

√

√

√

√

√

√ √ √

ter required for the motility and integrity of actin patches; involved in endocytosis and membrane growth and polarity. Searching ARP3 related complexes from CYC2008, we find only 1 result: Arp2/3 protein complex. Arp2/3 protein complex is a stable protein complex, and functions in the nucleation of branched actin filaments. It contains 7 nodes: ARP3, ARP2, ARC40, ARC15, ARC18, ARC35, and ARC19 (Kovacs and Yap, 2002). Using ARP3 as seed, ARP3 sub-network is detected through the relevance judgment algorithm, as shown in Fig. 9, which has 7 proteins and 21 interactions. As shown in Table 4, the sub-network detected by our algorithm contains 7 proteins, which perfectly are the same as 7 Arp2/3 protein complex proteins and no missing. Table 4

Performance Analysis

We compare the accurate percent and hit rate between our Relevancy Judgment algorithm and Module Builder algorithm. The accurate percent is, for all detected proteins, the rate of number of those belong to known complexes to number of those all. The hit rate represents rate of how many nodes are detected for a known protein complex to the number of all nodes of that complex. We compare our results with those in the report of Wu and Hu (2007). We only reference the raw detected sub-networks for every seed, but not use their statistics data, precision and recall, which correspond to accurate percent and hit rate. For seed TAF6, Wu and Hu (2007) considered nodes in SAGA complex and SRB complex as correct results, which is unreasonable because SRB complex (Gustafsson and Samuelsson, 2001) doesn’t contain TAF6 at all. The correct nodes should be in SAGA, TFIID and SLIK complex according to the search for seed TAF6 from CYC2008. Furthermore, for seed RFC2, Wu and Hu (2007) only counted the number in RFC complex. Actually, RFC2 has four related complexes from CYC2008, RFC complex, Rad17 RFC-like complex, Ctf18 RFC-like complex, and Elg1 RFC-like complex. So we re-count the hit number and accurate number based on the raw detected sub-networks. Table 5 is the statistics of accurate number for our RJ algorithm and MB algorithm. From the data, subnetworks detected by our algorithm RJ always have much less nodes than by MB algorithm. In TAF6 subnetwork, RJ has 28 nodes while MB has 39; in NOT3, the number is 6 vs. 40; in RFC2, 8 vs. 17; in ARP3, 7 vs. 20. Meanwhile, our algorithm RJ always has more accurate nodes than MB, except NOT3. That is because the CCR4-NOT complex has a low connectivity and the 3 missing nodes by RJ have too many neighbors. Both reasons drop the performance of RJ algorithm. The comparison of accurate percent between RJ and MB is shown in Fig. 10. From the figure, we

ARC35

ARC18

Proteins of ARP3 sub-networks

Unitprotkb

Gene

P47117

ARP3

P32381

ARP2

P38328

ARC40

P40518

ARC15

Q05933

ARC18

P53731

ARC35

P33204

ARC19

Arp2/3 √

ARC19

ARC40

√

ARP3

√ √ √ √ √

ARP2

Fig. 9

ARC15

ARP3 sub-network.


173

Table 5

Statistics of accurate number for RJ and MB

Sub-network

Total

Accurate

Percent (%)

100 90 80 70 60 50 40 30 20 10 0

RJ MB

%

show that our algorithm RJ has a significant improvement to MB on accurate percent, and that RJ even has a completely accurate identification in small complexes while MB only has 30% around for it includes too many irrelevant nodes.

SAGA TFIID

TAF6

NOT3

RFC2

ARP3

28

25

89.3

39

22

56.4

RJ

6

6

100

MB

40

9

22.5

RJ

8

8

100

MB

17

5

29.4

RJ

7

7

100

MB

20

7

35

RJ MB

%

100 90 80 70 60 50 40 30 20 10 0

RJ MB

Fig. 10

TAF6

NOT3

RFC2

ARP3

Comparison of accurate percent between RJ and MB.

Table 6 is the statistics of hit number of our algorithm RJ and MB. From the table, we can find that RJ and MB can identify most of complex nodes. Especially in some small complexes, they have detected all nodes, such as RFC complex, Elg1 RFC-like complex and ARP2/3 complex. Fig. 11 compares their hit rate for different complexes. In big complexes such as TAF6 related, RJ performs better than MB, while worse in some small complexes such as CCR4-Not and Rad17 RFC-like complex. This is because it is easier for MB to hit an additional node in a much bigger detected sub-network than RJ.

6 Conclusion In this paper we discussed the character of protein complexes and problems of some detecting algorithm. Proteins are densely connected within their complexes, while loosely connected with other proteins. We used relevant graph and irrelevant graph to represent the relation of connection between a node and a core graph. We defined a variable Relevancy to represent whether a node has a dense or loose connection with a core graph.

Fig. 11

SLIK RFC CTF18 ARP2/3 CCR4-NOT RAD17 ELG1

Comparison of hit rate between RJ and MB.

A node is deemed to have a dense connection with a core graph if relevancy is positive, which means the edges in relevant graph are more than edges in irrelevant graph. Then we proposed the Relevancy Judgment algorithm to detecting protein complexes from protein interaction networks. RJ algorithm decides whether a node belongs to a protein complex through judging the relevancy between core graph and nodes out of core graph. Applying RJ algorithm to the yeast interaction network, we assigned four seed proteins to detect related complexes. Compared the sub-network found by our RJ algorithm with related complexes searched from complex databases, we found that RJ algorithm has an excellent accuracy on acquired proteins, which is 89.6% in big complexes such as TAF6 sub-network, and 100% in small complexes such as the other three sub-network. This result got a significant improvement on MB algorithm, which sub-networks contain much more nodes than our algorithm. In the respect of hit rate, our algorithm detected more proteins of a whole complex than MB in big complexes such as TAF6-related complexes, but in small complexes it performed worse than MB, for it is easier for MB to hit an additional node in its bigger sub-networks. So in the future, we will further analysis the characters of PPI networks and improve

Table 6

Seed

TAF6

NOT3

RFC2

ARP3

Comparison of hit number for different complexes Complex

Total

Hit

Rate

Hit

Rate

(RJ)

(RJ)

(MB)

(MB)

SAGA

20

16

80%

14

70%

TFIID

15

14

93.3%

13

86.7%

SLIK

16

13

81.3%

13

81.3%

CCR4-NOT

9

6

66.7%

9

100% 100%

RFC

5

5

100%

5

RAD17

5

4

80%

5

100%

CTF18

7

6

85.7%

6

85.7%

ELG1

5

5

100%

5

100%

ARP2/3

7

7

100%

7

100%

174

the hit rate of our algorithm.

Acknowledgements The work of this paper was supported by the National Natural Science Foundation of China under the grant 61072051 and 61202470, the Natural Science Foundation of Hubei Province, China under the grant 2013CKB024, and partly supported by the Central China Normal University under the grant CCNU11A01015 and CCNU11A01014.

References [1] Asthana, S., King, O.D., Gibbons, F.D., Roth, F.P. 2004. Predicting protein complex membership using probabilistic network reliability. Genome Res 14, 11701175. [2] Brown, C.E., Lechner, T., Howe, L., Workman, J.L. 2000. The many HATs of transcription coactivators. Trends Biochem Sci 25, 15-19.

Interdiscip Sci Comput Life Sci (2013) 5: 167–174 [9] Li, W., Liu, Y., Huang, H.-C., Peng, Y., Lin, Y., Ng, W.-K., Ong, K.-L. 2007. Dynamical systems for discovering protein complexes and functional modules from biological networks. IEEE/ACM Trans Comput Bio Bi 4, 233-250. [10] Liu, H.Y., Chiang, Y.C., Pan, J., Chen, J., Salvadore, C., Audino, D.C., Badarinarayana, V., Palaniswamy, V., Anderson, B., Denis, C.L. 2001. Characterization of CAF4 and CAF16 reveals a functional connection between the CCR4-NOT complex and a subset of SRB proteins of the RNA polymerase II holoenzyme. J Biol Chem 276, 7541-7548. [11] Locker, J. 1996. Transcription Factors: Essential Data. John Wiley & Sons, Chichester, UK. [12] Morcos, F., Sikora, M., Alber, M.S., Kaiser, D., Izaguirre, J.A. 2010. Belief propagation estimation of protein and domain interactions using the sum-product algorithm. IEEE Trans Inform Theory 56, 742-755. [13] Pei, P., Zhang, A. 2007. A “seed-refine” algorithm for detecting protein complexes from protein interaction data. IEEE Trans Nanobiosci 6, 43-50.

[3] Feng, J., Jiang, R., Jiang, T. 2011. A Max-flowbased approach to the identification of protein complexes using protein interaction and microarray data. IEEE/ACM Trans Comput Bio Bi 8, 621-634.

[14] Pu, S., Wong, J., Turner, B., Cho, E., Wodak, S.J. 2008. Up-to-date catalogues of yeast protein complexes. Nucl Acid Res 37, 825-831.

[4] Gustafsson, C.M., Samuelsson, T. 2001. Mediator - a universal complex in transcriptional regulation. Mol Microbiol 41, 1-8.

[15] Smoot, M., Ono, K., Ruscheinski, J., Wang, P.-L., Ideker, T. 2011. Cytoscape 2.8: New features for data integration and network visualization. Bioinformatics 27, 431-432.

[5] Hu, X., Wu, D.D. 2007. Data mining and predictive modeling of biomolecular network from biomedical literature databases. IEEE/ACM Trans Comput Bio Bi 4, 251-263. [6] Huang, H., Zhang, L.V., Roth, F.P., Bader, J.S. 2007. Probabilistic paths in protein interaction networks. In: Proceedings of the RECOMB Conferences on Systems Biology and Computational Proteomics, Oakland, USA, 14-28. [7] Kim, J., MacNeill, S.A. 2003. Genome stability: A new member of the RFC family. Curr Biol 13, R873-R875. [8] Kovacs, E.M., Yap, A.S. 2002. The web and the rock: Cell adhesion and the ARP2/3 complex. Dev Cell 3, 760-761.

[16] Wu, D., Hu, X. 2007. Topological analysis and subnetwork mining of protein-protein interactions. In: Taniar, D. (Ed.) Research and Trends in Data Mining Technology and Application. Idea Group Publisher, Hershey, USA, 209-240. [17] Wu, P.Y., Winston, F. 2002. Analysis of Spt7 function in the Saccharomyces cerevisiae SAGA coactivator complex. Mol Cell Biol 22, 5367-5379. [18] Xenarios, I., Salwinski, L., Duan, X.J., Higney, P., Kim, S.M., Eisenberg, D. 2002. Dip, the database of interacting proteins: A research tool for studying cellular networks of protein interactions. Nucl Acid Res 30, 303-305.

Detecting overlapping protein complexes by rough-fuzzy clustering in protein-protein interaction networks.

Prediction of protein-protein interactions related to protein complexes based on protein interaction networks.

Detecting Protein Complexes in Protein Interaction Networks Modeled as Gene Expression Biclusters.

Detecting protein complexes in protein interaction networks using a ranking algorithm with a refined merging procedure.

Protein complexes predictions within protein interaction networks using genetic algorithms.

Exploration of the dynamic properties of protein complexes predicted from spatially constrained protein-protein interaction networks.

Dynamic identifying protein functional modules based on adaptive density modularity in protein-protein interaction networks.

A novel method for identifying disease associated protein complexes based on functional similarity protein complex networks.

Unified Alignment of Protein-Protein Interaction Networks.

Identification of essential proteins based on ranking edge-weights in protein-protein interaction networks.

Construction and analysis of protein-protein interaction networks based on proteomics data of prostate cancer.

Identification of breast cancer prognostic modules based on weighted protein-protein interaction networks.

Cell-free Determination of Binary Complexes That Comprise Extended Protein-Protein Interaction Networks of Yersinia pestis.

How do oncoprotein mutations rewire protein-protein interaction networks?

Module organization and variance in protein-protein interaction networks.

Topology-function conservation in protein-protein interaction networks.

SiPAN: simultaneous prediction and alignment of protein-protein interaction networks.

Mirin: identifying microRNA regulatory modules in protein-protein interaction networks.

Evolution of protein-protein interaction networks in yeast.

Protein-protein interaction networks (PPI) and complex diseases.

3DProIN: Protein-Protein Interaction Networks and Structure Visualization.

Drug Target Protein-Protein Interaction Networks: A Systematic Perspective.

RAIN: RNA-protein Association and Interaction Networks.

Controlling Directed Protein Interaction Networks in Cancer.