Using ensemble methods to deal with imbalanced data in predicting protein-protein interactions

被引:90
|
作者
Zhang, Yongqing [1 ]
Zhang, Danling [1 ]
Mi, Gang [2 ]
Ma, Daichuan [3 ]
Li, Gongbing [1 ]
Guo, Yanzhi [3 ]
Li, Menglong [3 ]
Zhu, Min [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Sichuan Univ, Sch Life Sci, Chengdu 610065, Peoples R China
[3] Sichuan Univ, Coll Chem, Chengdu 610065, Peoples R China
基金
中国国家自然科学基金;
关键词
Protein-protein interaction; Ensemble methods; Imbalanced data; HYDROPHOBICITY;
D O I
10.1016/j.compbiolchem.2011.12.003
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In proteins, the number of interacting pairs is usually much smaller than the number of non-interacting ones. So the imbalanced data problem will arise in the field of protein-protein interactions (PPIs) prediction. In this article, we introduce two ensemble methods to solve the imbalanced data problem. These ensemble methods combine the based-cluster under-sampling technique and the fusion classifiers. And then we evaluate the ensemble methods using a dataset from Database of Interacting Proteins (DIP) with 10-fold cross validation. All the prediction models achieve area under the receiver operating characteristic curve (AUC) value about 95%. Our results show that the ensemble classifiers are quite effective in predicting PPIs; we also gain some valuable conclusions on the performance of ensemble methods for PPIs in imbalanced data. The prediction software and all dataset employed in the work can be obtained for free at http://cic.scu.edu.cn/bioinformatics/Ensemble_PPIs/index.html. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:36 / 41
页数:6
相关论文
共 50 条
  • [41] Information assessment on predicting protein-protein interactions
    Lin, N
    Wu, BL
    Jansen, R
    Gerstein, M
    Zhao, HY
    BMC BIOINFORMATICS, 2004, 5 (1)
  • [42] ProteinPrompt: a webserver for predicting protein-protein interactions
    Canzler, Sebastian
    Fischer, Markus
    Ulbricht, David
    Ristic, Nikola
    Hildebrand, Peter W.
    Staritzbichler, Rene
    BIOINFORMATICS ADVANCES, 2022, 2 (01):
  • [43] Predicting Protein-Protein Interactions Using Symmetric Logistic Matrix Factorization
    Pei, Fen
    Shi, Qingya
    Zhang, Haotian
    Bahar, Ivet
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2021, 61 (04) : 1670 - 1682
  • [44] Predicting protein-protein interactions using graph invariants and a neural network
    Knisley, D.
    Knisley, J.
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2011, 35 (02) : 108 - 113
  • [45] Predicting protein-protein interactions from one feature using SVM
    Chung, Y
    Kim, GM
    Hwang, YS
    Park, H
    INNOVATIONS IN APPLIED ARTIFICIAL INTELLIGENCE, 2004, 3029 : 50 - 55
  • [46] Ensemble learning prediction of protein-protein interactions using proteins functional annotations
    Saha, Indrajit
    Zubek, Julian
    Klingstrom, Tomas
    Forsberg, Simon
    Wikander, Johan
    Kierczak, Marcin
    Maulik, Ujjwal
    Plewczynski, Dariusz
    MOLECULAR BIOSYSTEMS, 2014, 10 (04) : 820 - 830
  • [47] Using a stacked ensemble learning framework to predict modulators of protein-protein interactions
    Gao, Mengyao
    Zhao, Lingling
    Zhang, Zitong
    Wang, Junjie
    Wang, Chunyu
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 161
  • [48] A Bayesian networks approach for predicting protein-protein interactions from genomic data
    Jansen, R
    Yu, HY
    Greenbaum, D
    Kluger, Y
    Krogan, NJ
    Chung, SB
    Emili, A
    Snyder, M
    Greenblatt, JF
    Gerstein, M
    SCIENCE, 2003, 302 (5644) : 449 - 453
  • [49] An Efficient Ensemble Learning Approach for Predicting Protein-Protein Interactions by Integrating Protein Primary Sequence and Evolutionary Information
    You, Zhu-Hong
    Huang, Wen-Zhun
    Zhang, Shanwen
    Huang, Yu-An
    Yu, Chang-Qing
    Li, Li-Ping
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (03) : 809 - 817
  • [50] Efficient mining from heterogeneous data sets for predicting protein-protein interactions
    Mamitsuka, H
    14TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2003, : 32 - 36