Protein-protein interaction prediction using a hybrid feature representation and a stacked generalization scheme

被引:43
作者
Chen, Kuan-Hsi [1 ]
Wang, Tsai-Feng [2 ]
Hu, Yuh-Jyh [3 ]
机构
[1] Natl Chiao Tung Univ, Coll Comp Sci, Hsinchu 300, Taiwan
[2] Natl Chiao Tung Univ, Inst Data Sci & Engn, Hsinchu 300, Taiwan
[3] Natl Chiao Tung Univ, Inst Biomed Engn, Coll Comp Sci, Hsinchu 300, Taiwan
关键词
Protein-protein interaction; Stacked generalization; Gene ontology; Network topology; SEMANTIC SIMILARITY MEASURES; GENE ONTOLOGY; SEQUENCES; SCALE; TOOL; RESIDUES; CELL;
D O I
10.1186/s12859-019-2907-1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundAlthough various machine learning-based predictors have been developed for estimating protein-protein interactions, their performances vary with dataset and species, and are affected by two primary aspects: choice of learning algorithm, and the representation of protein pairs. To improve the performance of predicting protein-protein interactions, we exploit the synergy of multiple learning algorithms, and utilize the expressiveness of different protein-pair features.ResultsWe developed a stacked generalization scheme that integrates five learning algorithms. We also designed three types of protein-pair features based on the physicochemical properties of amino acids, gene ontology annotations, and interaction network topologies. When tested on 19 published datasets collected from eight species, the proposed approach achieved a significantly higher or comparable overall performance, compared with seven competitive predictors.ConclusionWe introduced an ensemble learning approach for PPI prediction that integrated multiple learning algorithms and different protein-pair representations. The extensive comparisons with other state-of-the-art prediction tools demonstrated the feasibility and superiority of the proposed method.
引用
收藏
页数:17
相关论文
共 57 条
[51]   STACKED GENERALIZATION [J].
WOLPERT, DH .
NEURAL NETWORKS, 1992, 5 (02) :241-259
[52]   Prediction of functional modules based on comparative genome analysis and Gene Ontology application [J].
Wu, HW ;
Su, ZC ;
Mao, FL ;
Olman, V ;
Xu, Y .
NUCLEIC ACIDS RESEARCH, 2005, 33 (09) :2822-2837
[53]   Prediction of yeast protein-protein interaction network: insights from the Gene Ontology and annotations [J].
Wu, Xiaomei ;
Zhu, Lei ;
Guo, Jie ;
Zhang, Da-Yong ;
Lin, Kui .
NUCLEIC ACIDS RESEARCH, 2006, 34 (07) :2137-2150
[54]   Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis [J].
You, Zhu-Hong ;
Lei, Ying-Ke ;
Zhu, Lin ;
Xia, Junfeng ;
Wang, Bing .
BMC BIOINFORMATICS, 2013, 14
[55]   An improved approach to infer protein-protein interaction based on a hierarchical vector space model [J].
Zhang, Jiongmin ;
Jia, Ke ;
Jia, Jinmeng ;
Qian, Ying .
BMC BIOINFORMATICS, 2018, 19
[56]   Predicting co-complexed protein pairs using genomic and proteomic data integration [J].
Zhang, LV ;
Wong, SL ;
King, OD ;
Roth, FP .
BMC BIOINFORMATICS, 2004, 5 (1)
[57]   Global analysis of protein activities using proteome chips [J].
Zhu, H ;
Bilgin, M ;
Bangham, R ;
Hall, D ;
Casamayor, A ;
Bertone, P ;
Lan, N ;
Jansen, R ;
Bidlingmaier, S ;
Houfek, T ;
Mitchell, T ;
Miller, P ;
Dean, RA ;
Gerstein, M ;
Snyder, M .
SCIENCE, 2001, 293 (5537) :2101-2105