Using ensemble methods to deal with imbalanced data in predicting protein-protein interactions

被引:90
|
作者
Zhang, Yongqing [1 ]
Zhang, Danling [1 ]
Mi, Gang [2 ]
Ma, Daichuan [3 ]
Li, Gongbing [1 ]
Guo, Yanzhi [3 ]
Li, Menglong [3 ]
Zhu, Min [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Sichuan Univ, Sch Life Sci, Chengdu 610065, Peoples R China
[3] Sichuan Univ, Coll Chem, Chengdu 610065, Peoples R China
基金
中国国家自然科学基金;
关键词
Protein-protein interaction; Ensemble methods; Imbalanced data; HYDROPHOBICITY;
D O I
10.1016/j.compbiolchem.2011.12.003
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In proteins, the number of interacting pairs is usually much smaller than the number of non-interacting ones. So the imbalanced data problem will arise in the field of protein-protein interactions (PPIs) prediction. In this article, we introduce two ensemble methods to solve the imbalanced data problem. These ensemble methods combine the based-cluster under-sampling technique and the fusion classifiers. And then we evaluate the ensemble methods using a dataset from Database of Interacting Proteins (DIP) with 10-fold cross validation. All the prediction models achieve area under the receiver operating characteristic curve (AUC) value about 95%. Our results show that the ensemble classifiers are quite effective in predicting PPIs; we also gain some valuable conclusions on the performance of ensemble methods for PPIs in imbalanced data. The prediction software and all dataset employed in the work can be obtained for free at http://cic.scu.edu.cn/bioinformatics/Ensemble_PPIs/index.html. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:36 / 41
页数:6
相关论文
共 50 条
  • [21] Predicting Protein-Protein Interactions with Weighted PSSM Histogram and Random Forests
    Wei, Zhi-Sen
    Yang, Jing-Yu
    Yu, Dong-Jun
    INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: BIG DATA AND MACHINE LEARNING TECHNIQUES, ISCIDE 2015, PT II, 2015, 9243 : 326 - 335
  • [22] Prediction of Protein-Protein Interactions Based on Protein-Protein Correlation Using Least Squares Regression
    Huang, De-Shuang
    Zhang, Lei
    Han, Kyungsook
    Deng, Suping
    Yang, Kai
    Zhang, Hongbo
    CURRENT PROTEIN & PEPTIDE SCIENCE, 2014, 15 (06) : 553 - 560
  • [23] Predicting protein-protein interactions by weighted pseudo amino acid composition
    Goktepe, Yunus Emre
    Ilhan, Ilhan
    Kahramanli, Sirzat
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2016, 15 (03) : 272 - 290
  • [24] Prediction of protein-protein interaction sites using an ensemble method
    Lei Deng
    Jihong Guan
    Qiwen Dong
    Shuigeng Zhou
    BMC Bioinformatics, 10
  • [25] A Bayesian Framework for Combining Protein and Network Topology Information for Predicting Protein-Protein Interactions
    Birlutiu, Adriana
    d'Alche-Buc, Florence
    Heskes, Tom
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (03) : 538 - 550
  • [26] Sequence-based machine learning method for predicting the effects of phosphorylation on protein-protein interactions
    Hong, Xiaokun
    Lv, Jiyang
    Li, Zhengxin
    Xiong, Yi
    Zhang, Jian
    Chen, Hai-Feng
    INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2023, 243
  • [27] Protein-Protein Interactions in Plants
    Fukao, Yoichiro
    PLANT AND CELL PHYSIOLOGY, 2012, 53 (04) : 617 - 625
  • [28] Modeling and Predicting Protein-Protein Interactions of Type 2 Diabetes Mellitus Using Feedforward Neural Networks
    Zulfikar, Alif Ahmad
    Kusuma, Wisnu Ananta
    2019 11TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS 2019), 2019, : 163 - 168
  • [29] RETRACTED: Comparison of classification methods on imbalanced protein-protein interaction text set (Retracted Article)
    Xu, Guixian
    Gao, Xu
    Zhao, Xiaobing
    2011 INTERNATIONAL CONFERENCE ON ENERGY AND ENVIRONMENTAL SCIENCE-ICEES 2011, 2011, 11 : 2295 - 2301
  • [30] Prediction of protein function using protein-protein interaction data
    Deng, MH
    Zhang, K
    Mehta, S
    Chen, T
    Sun, FZ
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (06) : 947 - 960