Ensemble Learning with Active Example Selection for Imbalanced Biomedical Data Classification

被引:72
|
作者
Oh, Sangyoon [1 ]
Lee, Min Su [2 ,3 ]
Zhang, Byoung-Tak [2 ,3 ]
机构
[1] Ajou Univ, Div Informat & Comp Engn, WISE Lab, Suwon 443749, Kyeonggi, South Korea
[2] Seoul Natl Univ, CBIT, Seoul 151742, South Korea
[3] Seoul Natl Univ, Sch Engn & Comp Sci, Seoul 151742, South Korea
关键词
Bioinformatics; classification; interactive data exploration and discovery; mining methods and algorithms; DISCOVERY;
D O I
10.1109/TCBB.2010.96
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In biomedical data, the imbalanced data problem occurs frequently and causes poor prediction performance for minority classes. It is because the trained classifiers are mostly derived from the majority class. In this paper, we describe an ensemble learning method combined with active example selection to resolve the imbalanced data problem. Our method consists of three key components: 1) an active example selection algorithm to choose informative examples for training the classifier, 2) an ensemble learning method to combine variations of classifiers derived by active example selection, and 3) an incremental learning scheme to speed up the iterative training procedure for active example selection. We evaluate the method on six real-world imbalanced data sets in biomedical domains, showing that the proposed method outperforms both the random under sampling and the ensemble with under sampling methods. Compared to other approaches to solving the imbalanced data problem, our method excels by 0.03-0.15 points in AUC measure.
引用
收藏
页码:316 / 325
页数:10
相关论文
共 50 条
  • [1] Ensemble Learning Based on Active Example Selection for Solving Imbalanced Data Problem in Biomedical Data
    Lee, Min Su
    Oh, Sangyoon
    Zhang, Byoung-Tak
    2009 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2009, : 350 - +
  • [2] AESNB: Active Example Selection with Naive Bayes Classifier for Learning from Imbalanced Biomedical Data
    Lee, Min Su
    Rhee, Je-Keun
    Kim, Byoung-Hee
    Zhang, Byoung-Tak
    2009 9TH IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING, 2009, : 15 - +
  • [3] An Improved Ensemble Learning for Imbalanced Data Classification
    Yuan, Zhengwu
    Zhao, Pu
    PROCEEDINGS OF 2019 IEEE 8TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC 2019), 2019, : 408 - 411
  • [4] Imbalanced Data Classification Method Based on Ensemble Learning
    Xiang, Yu
    Xie, Yongping
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, CSPS 2018, VOL III: SYSTEMS, 2020, 517 : 18 - 24
  • [5] Data Preprocessing and Dynamic Ensemble Selection for Imbalanced Data Stream Classification
    Zyblewski, Pawel
    Sabourin, Robert
    Wozniak, Michal
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 : 367 - 379
  • [6] Meta-learning for imbalanced data and classification ensemble in binary classification
    Lin, Sung-Chiang
    Chang, Yuan-chin I.
    Yang, Wei-Ning
    NEUROCOMPUTING, 2009, 73 (1-3) : 484 - 494
  • [7] Online semi-supervised active learning ensemble classification for evolving imbalanced data streams
    Guo, Yinan
    Pu, Jiayang
    Jiao, Botao
    Peng, Yanyan
    Wang, Dini
    Yang, Shengxiang
    APPLIED SOFT COMPUTING, 2024, 155
  • [8] Robust Multiclass Classification for Learning from Imbalanced Biomedical Data
    Piyaphol Phoungphol
    TsinghuaScienceandTechnology, 2012, 17 (06) : 619 - 628
  • [9] Robust multiclass classification for learning from imbalanced biomedical data
    Phoungphol, Piyaphol
    Zhang, Yanqing
    Zhao, Yichuan
    Tsinghua Science and Technology, 2012, 17 (06) : 619 - 628
  • [10] imDC: an ensemble learning method for imbalanced classification with miRNA data
    Wang, C. Y.
    Hu, L. L.
    Guo, M. Z.
    Liu, X. Y.
    Zou, Q.
    GENETICS AND MOLECULAR RESEARCH, 2015, 14 (01): : 123 - 133