Using ensemble methods to deal with imbalanced data in predicting protein-protein interactions

被引:90
|
作者
Zhang, Yongqing [1 ]
Zhang, Danling [1 ]
Mi, Gang [2 ]
Ma, Daichuan [3 ]
Li, Gongbing [1 ]
Guo, Yanzhi [3 ]
Li, Menglong [3 ]
Zhu, Min [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Sichuan Univ, Sch Life Sci, Chengdu 610065, Peoples R China
[3] Sichuan Univ, Coll Chem, Chengdu 610065, Peoples R China
基金
中国国家自然科学基金;
关键词
Protein-protein interaction; Ensemble methods; Imbalanced data; HYDROPHOBICITY;
D O I
10.1016/j.compbiolchem.2011.12.003
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In proteins, the number of interacting pairs is usually much smaller than the number of non-interacting ones. So the imbalanced data problem will arise in the field of protein-protein interactions (PPIs) prediction. In this article, we introduce two ensemble methods to solve the imbalanced data problem. These ensemble methods combine the based-cluster under-sampling technique and the fusion classifiers. And then we evaluate the ensemble methods using a dataset from Database of Interacting Proteins (DIP) with 10-fold cross validation. All the prediction models achieve area under the receiver operating characteristic curve (AUC) value about 95%. Our results show that the ensemble classifiers are quite effective in predicting PPIs; we also gain some valuable conclusions on the performance of ensemble methods for PPIs in imbalanced data. The prediction software and all dataset employed in the work can be obtained for free at http://cic.scu.edu.cn/bioinformatics/Ensemble_PPIs/index.html. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:36 / 41
页数:6
相关论文
共 50 条
  • [1] Computational Methods For Predicting Protein-Protein Interactions
    Pitre, Sylvain
    Alamgir, Md
    Green, James R.
    Dumontier, Michel
    Dehne, Frank
    Golshani, Ashkan
    PROTEIN - PROTEIN INTERACTION, 2008, 110 : 247 - 267
  • [2] Data mining methods for protein-protein interactions
    Nafar, Zahra
    Golshani, Ashkan
    2006 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-5, 2006, : 2090 - +
  • [3] ProteinPrompt: a webserver for predicting protein-protein interactions
    Canzler, Sebastian
    Fischer, Markus
    Ulbricht, David
    Ristic, Nikola
    Hildebrand, Peter W.
    Staritzbichler, Rene
    BIOINFORMATICS ADVANCES, 2022, 2 (01):
  • [4] Using data fusion for scoring reliability of protein-protein interactions
    Vazifedoost, Alireza
    Rahgozar, Maseud
    Moshiri, Behzad
    Sadeghi, Mehdi
    Hon Nian Chua
    See Kiong Ng
    Wong, Limsoon
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2014, 12 (04)
  • [5] A survey on computational models for predicting protein-protein interactions
    Hu, Lun
    Wang, Xiaojuan
    Huang, Yu-An
    Hu, Pengwei
    You, Zhu-Hong
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (05)
  • [6] Hot spot prediction in protein-protein interactions by an ensemble system
    Liu, Quanya
    Chen, Peng
    Wang, Bing
    Zhang, Jun
    Li, Jinyan
    BMC SYSTEMS BIOLOGY, 2018, 12
  • [7] Predicting subcellular localization of proteins using protein-protein interaction data
    Garapati, Hita Sony
    Male, Gurranna
    Mishra, Krishnaveni
    GENOMICS, 2020, 112 (03) : 2361 - 2368
  • [8] ProtInteract: A deep learning framework for predicting protein-protein interactions
    Soleymani, Farzan
    Paquet, Eric
    Viktor, Herna Lydia
    Michalowski, Wojtek
    Spinello, Davide
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2023, 21 : 1324 - 1348
  • [9] Protein-protein interactions prediction based on ensemble deep neural networks
    Zhang, Long
    Yu, Guoxian
    Xia, Dawen
    Wang, Jun
    NEUROCOMPUTING, 2019, 324 : 10 - 19
  • [10] Predicting Protein-Protein Interactions based on Biological Information using Extreme Gradient Boosting
    Beltran, Jerome Cary
    Valdez, Paolo
    Naval, Prospero, Jr.
    2019 16TH IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY - CIBCB 2019, 2019, : 346 - 351