Using ensemble methods to deal with imbalanced data in predicting protein-protein interactions

被引:90
|
作者
Zhang, Yongqing [1 ]
Zhang, Danling [1 ]
Mi, Gang [2 ]
Ma, Daichuan [3 ]
Li, Gongbing [1 ]
Guo, Yanzhi [3 ]
Li, Menglong [3 ]
Zhu, Min [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Sichuan Univ, Sch Life Sci, Chengdu 610065, Peoples R China
[3] Sichuan Univ, Coll Chem, Chengdu 610065, Peoples R China
基金
中国国家自然科学基金;
关键词
Protein-protein interaction; Ensemble methods; Imbalanced data; HYDROPHOBICITY;
D O I
10.1016/j.compbiolchem.2011.12.003
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In proteins, the number of interacting pairs is usually much smaller than the number of non-interacting ones. So the imbalanced data problem will arise in the field of protein-protein interactions (PPIs) prediction. In this article, we introduce two ensemble methods to solve the imbalanced data problem. These ensemble methods combine the based-cluster under-sampling technique and the fusion classifiers. And then we evaluate the ensemble methods using a dataset from Database of Interacting Proteins (DIP) with 10-fold cross validation. All the prediction models achieve area under the receiver operating characteristic curve (AUC) value about 95%. Our results show that the ensemble classifiers are quite effective in predicting PPIs; we also gain some valuable conclusions on the performance of ensemble methods for PPIs in imbalanced data. The prediction software and all dataset employed in the work can be obtained for free at http://cic.scu.edu.cn/bioinformatics/Ensemble_PPIs/index.html. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:36 / 41
页数:6
相关论文
共 50 条
  • [41] Peptide assemblies in living cells. Methods for detecting protein-protein interactions
    Ozawa, T
    Umezawa, Y
    SUPRAMOLECULAR CHEMISTRY, 2002, 14 (2-3) : 271 - 280
  • [42] A MapReduce based parallel SVM for large-scale predicting protein-protein interactions
    You, Zhu-Hong
    Yu, Jian-Zhong
    Zhu, Lin
    Li, Shuai
    Wen, Zhen-Kun
    NEUROCOMPUTING, 2014, 145 : 37 - 43
  • [43] Prediction Protein-Protein Interactions with LSTM
    Tao, Zheng
    Yao, Jiahao
    Yuan, Chao
    Zhao, Ning
    Yang, Bin
    Chen, Baitong
    Bao, Wenzheng
    SIMULATION TOOLS AND TECHNIQUES, SIMUTOOLS 2021, 2022, 424 : 540 - 545
  • [44] Applications of In Silico Methods for Design and Development of Drugs Targeting Protein-Protein Interactions
    Cicaloni, Vittoria
    Trezza, Alfonso
    Pettini, Francesco
    Spiga, Ottavia
    CURRENT TOPICS IN MEDICINAL CHEMISTRY, 2019, 19 (07) : 534 - 554
  • [45] Predicting the involvement of polyQ- and polyA in protein-protein interactions by their amino acid context
    Mier, Pablo
    Andrade-Navarro, Miguel A.
    HELIYON, 2024, 10 (18)
  • [46] Protein myristoylation in protein-lipid and protein-protein interactions
    Taniguchi, H
    BIOPHYSICAL CHEMISTRY, 1999, 82 (2-3) : 129 - 137
  • [47] Predicting permanent and transient protein-protein interfaces
    La, David
    Kong, Misun
    Hoffman, William
    Choi, Youn Im
    Kihara, Daisuke
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2013, 81 (05) : 805 - 818
  • [48] MultiProtident: Identifying proteins using database search and protein-protein interactions
    Huang, HD
    Lee, TY
    Wu, LC
    Lin, FM
    Juan, HF
    Horng, JT
    Tsou, AP
    JOURNAL OF PROTEOME RESEARCH, 2005, 4 (03) : 690 - 697
  • [49] Green fluorescent protein as a signal for protein-protein interactions
    Park, SH
    Raines, RT
    PROTEIN SCIENCE, 1997, 6 (11) : 2344 - 2349
  • [50] Protein recruitment systems for the analysis of protein-protein interactions
    Aronheim, A
    BIOCHEMICAL PHARMACOLOGY, 2000, 60 (08) : 1009 - 1013