Improving protein function prediction using domain and protein complexes in PPI networks

被引:40
作者
Peng, Wei [1 ,2 ]
Wang, Jianxin [1 ]
Cai, Juan [1 ]
Chen, Lu [1 ]
Li, Min [1 ]
Wu, Fang-Xiang [1 ,3 ,4 ]
机构
[1] Cent South Univ, Sch Informat Sci & Engn, Changsha 410083, Hunan, Peoples R China
[2] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650093, Yunnan, Peoples R China
[3] Univ Saskatchewan, Dept Mech Engn, Saskatoon, SK S7N 5A9, Canada
[4] Univ Saskatchewan, Div Biomed Engn, Saskatoon, SK S7N 5A9, Canada
基金
中国国家自然科学基金;
关键词
GENE ONTOLOGY; DATABASE; YEAST; GENERATION; ANNOTATION; ALGORITHM; MODULES;
D O I
10.1186/1752-0509-8-35
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Characterization of unknown proteins through computational approaches is one of the most challenging problems in silico biology, which has attracted world-wide interests and great efforts. There have been some computational methods proposed to address this problem, which are either based on homology mapping or in the context of protein interaction networks. Results: In this paper, two algorithms are proposed by integrating the protein-protein interaction (PPI) network, proteins' domain information and protein complexes. The one is domain combination similarity (DCS), which combines the domain compositions of both proteins and their neighbors. The other is domain combination similarity in context of protein complexes (DSCP), which extends the protein functional similarity definition of DCS by combining the domain compositions of both proteins and the complexes including them. The new algorithms are tested on networks of the model species of Saccharomyces cerevisiae to predict functions of unknown proteins using cross validations. Comparing with other several existing algorithms, the results have demonstrated the effectiveness of our proposed methods in protein function prediction. Furthermore, the algorithm DSCP using experimental determined complex data is robust when a large percentage of the proteins in the network is unknown, and it outperforms DCS and other several existing algorithms. Conclusions: The accuracy of predicting protein function can be improved by integrating the protein-protein interaction (PPI) network, proteins' domain information and protein complexes.
引用
收藏
页数:13
相关论文
共 46 条
  • [1] Development and implementation of an algorithm for detection of protein complexes in large interaction networks
    Altaf-Ul-Amin, Md
    Shinbo, Yoko
    Mihara, Kenji
    Kurokawa, Ken
    Kanaya, Shigehiko
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [3] SCOP database in 2004: refinements integrate structure and sequence family data
    Andreeva, A
    Howorth, D
    Brenner, SE
    Hubbard, TJP
    Chothia, C
    Murzin, AG
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D226 - D229
  • [4] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [5] An automated method for finding molecular complexes in large protein interaction networks
    Bader, GD
    Hogue, CW
    [J]. BMC BIOINFORMATICS, 2003, 4 (1)
  • [6] The generation of new protein functions by the combination of domains
    Bashton, Matthew
    Chothia, Cyrus
    [J]. STRUCTURE, 2007, 15 (01) : 85 - 99
  • [7] Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
  • [8] An iterative approach of protein function prediction
    Chi, Xiaoxiao
    Hou, Jingyu
    [J]. BMC BIOINFORMATICS, 2011, 12
  • [9] An efficient strategy for extensive integration of diverse biological data for protein function prediction
    Chua, Hon Nian
    Sung, Wing-Kin
    Wong, Limsoon
    [J]. BIOINFORMATICS, 2007, 23 (24) : 3364 - 3373
  • [10] Chua Hon Nian, 2008, Journal of Bioinformatics and Computational Biology, V6, P435, DOI 10.1142/S0219720008003497