Protein-protein interaction prediction using a hybrid feature representation and a stacked generalization scheme

被引:43
作者
Chen, Kuan-Hsi [1 ]
Wang, Tsai-Feng [2 ]
Hu, Yuh-Jyh [3 ]
机构
[1] Natl Chiao Tung Univ, Coll Comp Sci, Hsinchu 300, Taiwan
[2] Natl Chiao Tung Univ, Inst Data Sci & Engn, Hsinchu 300, Taiwan
[3] Natl Chiao Tung Univ, Inst Biomed Engn, Coll Comp Sci, Hsinchu 300, Taiwan
关键词
Protein-protein interaction; Stacked generalization; Gene ontology; Network topology; SEMANTIC SIMILARITY MEASURES; GENE ONTOLOGY; SEQUENCES; SCALE; TOOL; RESIDUES; CELL;
D O I
10.1186/s12859-019-2907-1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundAlthough various machine learning-based predictors have been developed for estimating protein-protein interactions, their performances vary with dataset and species, and are affected by two primary aspects: choice of learning algorithm, and the representation of protein pairs. To improve the performance of predicting protein-protein interactions, we exploit the synergy of multiple learning algorithms, and utilize the expressiveness of different protein-pair features.ResultsWe developed a stacked generalization scheme that integrates five learning algorithms. We also designed three types of protein-pair features based on the physicochemical properties of amino acids, gene ontology annotations, and interaction network topologies. When tested on 19 published datasets collected from eight species, the proposed approach achieved a significantly higher or comparable overall performance, compared with seven competitive predictors.ConclusionWe introduced an ensemble learning approach for PPI prediction that integrated multiple learning algorithms and different protein-pair representations. The extensive comparisons with other state-of-the-art prediction tools demonstrated the feasibility and superiority of the proposed method.
引用
收藏
页数:17
相关论文
共 57 条
  • [1] The cell as a collection of protein machines: Preparing the next generation of molecular biologists
    Alberts, B
    [J]. CELL, 1998, 92 (03) : 291 - 294
  • [2] Co-complex protein membership evaluation using Maximum Entropy on GO ontology and InterPro annotation
    Armean, Irina M.
    Lilley, Kathryn S.
    Trotter, MatthewW. B.
    Pilkington, Nicholas C. V.
    Holden, Sean B.
    [J]. BIOINFORMATICS, 2018, 34 (11) : 1884 - 1892
  • [3] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [4] A New Feature Vector Based on Gene Ontology Terms for Protein-Protein Interaction Prediction
    Bandyopadhyay, Sanghamitra
    Mallick, Koushik
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2017, 14 (04) : 762 - 770
  • [5] Kernel methods for predicting protein-protein interactions
    Ben-Hur, A
    Noble, WS
    [J]. BIOINFORMATICS, 2005, 21 : I38 - I46
  • [6] Bishop C.M., 1995, Neural networks for pattern recognition
  • [7] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [8] LIBSVM: A Library for Support Vector Machines
    Chang, Chih-Chung
    Lin, Chih-Jen
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
  • [9] Cover T, 1995, IEEE T INFORM THEORY, V13, P21
  • [10] DeepPPI: Boosting Prediction of Protein-Protein Interactions with Deep Neural Networks
    Du, Xiuquan
    Sun, Shiwei
    Hu, Changlin
    Yao, Yu
    Yan, Yuanting
    Zhang, Yanping
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2017, 57 (06) : 1499 - 1510