Using ensemble methods to deal with imbalanced data in predicting protein-protein interactions

被引:90
|
作者
Zhang, Yongqing [1 ]
Zhang, Danling [1 ]
Mi, Gang [2 ]
Ma, Daichuan [3 ]
Li, Gongbing [1 ]
Guo, Yanzhi [3 ]
Li, Menglong [3 ]
Zhu, Min [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Sichuan Univ, Sch Life Sci, Chengdu 610065, Peoples R China
[3] Sichuan Univ, Coll Chem, Chengdu 610065, Peoples R China
基金
中国国家自然科学基金;
关键词
Protein-protein interaction; Ensemble methods; Imbalanced data; HYDROPHOBICITY;
D O I
10.1016/j.compbiolchem.2011.12.003
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In proteins, the number of interacting pairs is usually much smaller than the number of non-interacting ones. So the imbalanced data problem will arise in the field of protein-protein interactions (PPIs) prediction. In this article, we introduce two ensemble methods to solve the imbalanced data problem. These ensemble methods combine the based-cluster under-sampling technique and the fusion classifiers. And then we evaluate the ensemble methods using a dataset from Database of Interacting Proteins (DIP) with 10-fold cross validation. All the prediction models achieve area under the receiver operating characteristic curve (AUC) value about 95%. Our results show that the ensemble classifiers are quite effective in predicting PPIs; we also gain some valuable conclusions on the performance of ensemble methods for PPIs in imbalanced data. The prediction software and all dataset employed in the work can be obtained for free at http://cic.scu.edu.cn/bioinformatics/Ensemble_PPIs/index.html. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:36 / 41
页数:6
相关论文
共 50 条
  • [1] Predicting Protein-Protein Interactions based on ensemble classifiers
    Zhou, Zheng-Rong
    Song, Xiao-Feng
    Wang, Ming-Hao
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2010, 38 (06): : 1464 - 1467
  • [2] Computational Methods For Predicting Protein-Protein Interactions
    Pitre, Sylvain
    Alamgir, Md
    Green, James R.
    Dumontier, Michel
    Dehne, Frank
    Golshani, Ashkan
    PROTEIN - PROTEIN INTERACTION, 2008, 110 : 247 - 267
  • [3] Kernel methods for predicting protein-protein interactions
    Ben-Hur, A
    Noble, WS
    BIOINFORMATICS, 2005, 21 : I38 - I46
  • [4] Data mining methods for protein-protein interactions
    Nafar, Zahra
    Golshani, Ashkan
    2006 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-5, 2006, : 2090 - +
  • [5] An ensemble of K-local hyperplanes for predicting protein-protein interactions
    Nanni, L
    Lumini, A
    BIOINFORMATICS, 2006, 22 (10) : 1207 - 1210
  • [6] Predicting protein-protein interactions in unbalanced data using the primary structure of proteins
    Chi-Yuan Yu
    Lih-Ching Chou
    Darby Tien-Hao Chang
    BMC Bioinformatics, 11
  • [7] An Ensemble Classifier with Random Projection for Predicting Protein-Protein Interactions Using Sequence and Evolutionary Information
    Song, Xiao-Yu
    Chen, Zhan-Heng
    Sun, Xiang-Yang
    You, Zhu-Hong
    Li, Li-Ping
    Zhao, Yang
    APPLIED SCIENCES-BASEL, 2018, 8 (01):
  • [8] Predicting protein-protein interactions in unbalanced data using the primary structure of proteins
    Yu, Chi-Yuan
    Chou, Lih-Ching
    Chang, Darby Tien-Hao
    BMC BIOINFORMATICS, 2010, 11
  • [9] Predicting functional protein-protein interactions based on computational methods
    Zhang, Luwen
    Zhang, Wu
    LIFE SYSTEM MODELING AND SIMULATION, PROCEEDINGS, 2007, 4689 : 354 - +
  • [10] Predicting disease genes using protein-protein interactions
    Oti, M.
    Snel, B.
    Huynen, M. A.
    Brunner, H. G.
    JOURNAL OF MEDICAL GENETICS, 2006, 43 (08) : 691 - 698