EasyEnsemble and Feature Selection for Imbalance Data Sets

被引:80
|
作者
Liu, Tian-Yu [1 ]
机构
[1] Shanghai Dianji Univ, Sch Elect, Shanghai 200240, Peoples R China
来源
2009 INTERNATIONAL JOINT CONFERENCE ON BIOINFORMATICS, SYSTEMS BIOLOGY AND INTELLIGENT COMPUTING, PROCEEDINGS | 2009年
关键词
unbalanced data sets; EasyEnsemble; mutual information; feature selection; RELEVANCE;
D O I
10.1109/IJCBS.2009.22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There are many labeled data sets which have an unbalanced representation among the classes in them. When the imbalance is large, classification accuracy on the smaller class tends to be lower. In particular, when a class is of great interest but occurs relatively rarely such as cases of fraud, instances of disease, and so on, it is important to accurately identify it. Here we propose a novel algorithm named MIEE (Mutual Information based feature selection for Easy Ensemble) to treat this problem and improve generalization performance of the Easy Ensemble classifier. Experimental results on the UCI data sets show that MIEE obtain better performance, compared with the asymmetric bagging and Easy Ensemble.
引用
收藏
页码:517 / 520
页数:4
相关论文
共 50 条
  • [1] A Hybrid Feature Selection Method for Data Sets of thousands of Variables
    Liu, Jihong
    Wang, Guoxiong
    2ND IEEE INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER CONTROL (ICACC 2010), VOL. 2, 2010, : 288 - 291
  • [2] Feature selection for genomic data sets through feature clustering
    Zheng, Fengbin
    Shen, Xiajiong
    Fu, Zhengye
    Zheng, Shanshan
    Li, Guangrong
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2010, 4 (02) : 228 - 240
  • [3] Assessing feature selection method performance with class imbalance data
    Matharaarachchi, Surani
    Domaratzki, Mike
    Muthukumarana, Saman
    MACHINE LEARNING WITH APPLICATIONS, 2021, 6
  • [4] Feature Selection and Ensemble Meta Classifier for Multiclass Imbalance Data Learning
    Sainin, Mohd Shamrie
    Alfred, Rayner
    Alias, Suraya
    Lammasha, Mohamed A. M.
    PROCEEDINGS OF KNOWLEDGE MANAGEMENT INTERNATIONAL CONFERENCE (KMICE) 2018, 2018, : 134 - 139
  • [5] Feature selection for large-scale data sets in GrC
    Liang, Jiye
    2012 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING (GRC 2012), 2012, : 2 - 7
  • [6] Feature selection for imbalanced data based on neighborhood rough sets
    Chen, Hongmei
    Li, Tianrui
    Fan, Xin
    Luo, Chuan
    INFORMATION SCIENCES, 2019, 483 : 1 - 20
  • [7] Feature subset selection wrapper based on mutual information and rough sets
    Foithong, Sombut
    Pinngern, Ouen
    Attachoo, Boonwat
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (01) : 574 - 584
  • [8] Feature selection using data envelopment analysis
    Zhang, Yishi
    Yang, Anrong
    Xiong, Chan
    Wang, Teng
    Zhang, Zigang
    KNOWLEDGE-BASED SYSTEMS, 2014, 64 : 70 - 80
  • [9] A review of feature selection methods on synthetic data
    Bolon-Canedo, Veronica
    Sanchez-Marono, Noelia
    Alonso-Betanzos, Amparo
    KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 34 (03) : 483 - 519
  • [10] A survey on feature selection methods for mixed data
    Solorio-Fernandez, Saul
    Carrasco-Ochoa, J. Ariel
    Martinez-Trinidad, Jose Francisco
    ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (04) : 2821 - 2846