Determining the minimum number of protein-protein interactions required to support known protein complexes

被引:9
作者
Nakajima, Natsu [1 ]
Hayashida, Morihiro [2 ]
Jansson, Jesper [3 ]
Maruyama, Osamu [4 ]
Akutsu, Tatsuya [5 ]
机构
[1] Univ Tokyo, Inst Mol & Cellular Biosci, Bunkyo Ku, 1-1-1 Yayoi, Tokyo 1130032, Japan
[2] Matsue Coll, Natl Inst Technol, Dept Elect Engn & Comp Sci, 14-4 Nishiikumacho, Matsue, Shimane 6908518, Japan
[3] Hong Kong Polytech Univ, Dept Comp, Kowloon, Hong Kong, Peoples R China
[4] Kyushu Univ, Inst Math Ind, Nishi Ku, 744 Motooka, Fukuoka 8190395, Japan
[5] Kyoto Univ, Inst Chem Res, Bioinformat Ctr, Uji, Kyoto 6110011, Japan
来源
PLOS ONE | 2018年 / 13卷 / 04期
关键词
MOLECULAR INTERACTION DATABASE; INTERACTION NETWORKS; SACCHAROMYCES-CEREVISIAE; CLUSTERING-TREE; YEAST; PREDICTION; DATASETS; INTACT; CANCER;
D O I
10.1371/journal.pone.0195545
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The prediction of protein complexes from protein-protein interactions (PPIs) is a well-studied problem in bioinformatics. However, the currently available PPI data is not enough to describe all known protein complexes. In this paper, we express the problem of determining the minimum number of (additional) required protein-protein interactions as a graph theoretic problem under the constraint that each complex constitutes a connected component in a PPI network. For this problem, we develop two computational methods: one is based on integer linear programming (ILPMinPPI) and the other one is based on an existing greedytype approximation algorithm (GreedyMinPPI) originally developed in the context of communication and social networks. Since the former method is only applicable to datasets of small size, we apply the latter method to a combination of the CYC2008 protein complex dataset and each of eight PPI datasets (STRING, MINT, BioGRID, IntAct, DIP, BIND, WI-PHI, iRefIndex). The results show that the minimum number of additional required PPIs ranges from 51 (STRING) to 964 (BIND), and that even the four best PPI databases, STRING (51), BioGRID (67), WI-PHI (93) and iRefIndex (85), do not include enough PPIs to form all CYC2008 protein complexes. We also demonstrate that the proposed problem framework and our solutions can enhance the prediction accuracy of existing PPI prediction methods. ILPMinPPI can be freely downloaded from http://sunflower.kuicr.kyoto-u.ac.jp/similar to nakajima/.
引用
收藏
页数:17
相关论文
共 46 条
  • [1] Structure-based assembly of protein complexes in yeast
    Aloy, P
    Böttcher, B
    Ceulemans, H
    Leutwein, C
    Mellwig, C
    Fischer, S
    Gavin, AC
    Bork, P
    Superti-Furga, G
    Serrano, L
    Russell, RB
    [J]. SCIENCE, 2004, 303 (5666) : 2026 - 2029
  • [2] Network construction with subgraph connectivity constraints
    Angluin, Dana
    Aspnes, James
    Reyzin, Lev
    [J]. JOURNAL OF COMBINATORIAL OPTIMIZATION, 2015, 29 (02) : 418 - 432
  • [3] The IntAct molecular interaction database in 2010
    Aranda, B.
    Achuthan, P.
    Alam-Faruque, Y.
    Armean, I.
    Bridge, A.
    Derow, C.
    Feuermann, M.
    Ghanbarian, A. T.
    Kerrien, S.
    Khadake, J.
    Kerssemakers, J.
    Leroy, C.
    Menden, M.
    Michaut, M.
    Montecchi-Palazzi, L.
    Neuhauser, S. N.
    Orchard, S.
    Perreau, V.
    Roechert, B.
    van Eijk, K.
    Hermjakob, H.
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 : D525 - D531
  • [4] An automated method for finding molecular complexes in large protein interaction networks
    Bader, GD
    Hogue, CW
    [J]. BMC BIOINFORMATICS, 2003, 4 (1)
  • [5] Bader GD, 2003, NUCLEIC ACIDS RES, V31, P248, DOI 10.1093/nar/gkg056
  • [6] MINT: the molecular INTeraction database
    Chatr-aryamontri, Andrew
    Ceol, Arnaud
    Palazzi, Luisa Montecchi
    Nardelli, Giuliano
    Schneider, Maria Victoria
    Castagnoli, Luisa
    Cesareni, Gianni
    [J]. NUCLEIC ACIDS RESEARCH, 2007, 35 : D572 - D574
  • [7] KUPS: constructing datasets of interacting and non-interacting protein pairs with associated attributions
    Chen, Xue-wen
    Jeong, Jong Cheol
    Dermyer, Patrick
    [J]. NUCLEIC ACIDS RESEARCH, 2011, 39 : D750 - D754
  • [8] Chockler G, 2007, PODC'07: PROCEEDINGS OF THE 26TH ANNUAL ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING, P109
  • [9] Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae
    Collins, Sean R.
    Kemmeren, Patrick
    Zhao, Xue-Chu
    Greenblatt, Jack F.
    Spencer, Forrest
    Holstege, Frank C. P.
    Weissman, Jonathan S.
    Krogan, Nevan J.
    [J]. MOLECULAR & CELLULAR PROTEOMICS, 2007, 6 (03) : 439 - 450
  • [10] An efficient algorithm for large-scale detection of protein families
    Enright, AJ
    Van Dongen, S
    Ouzounis, CA
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (07) : 1575 - 1584