EasyEnsemble and Feature Selection for Imbalance Data Sets

被引:80
作者
Liu, Tian-Yu [1 ]
机构
[1] Shanghai Dianji Univ, Sch Elect, Shanghai 200240, Peoples R China
来源
2009 INTERNATIONAL JOINT CONFERENCE ON BIOINFORMATICS, SYSTEMS BIOLOGY AND INTELLIGENT COMPUTING, PROCEEDINGS | 2009年
关键词
unbalanced data sets; EasyEnsemble; mutual information; feature selection; RELEVANCE;
D O I
10.1109/IJCBS.2009.22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There are many labeled data sets which have an unbalanced representation among the classes in them. When the imbalance is large, classification accuracy on the smaller class tends to be lower. In particular, when a class is of great interest but occurs relatively rarely such as cases of fraud, instances of disease, and so on, it is important to accurately identify it. Here we propose a novel algorithm named MIEE (Mutual Information based feature selection for Easy Ensemble) to treat this problem and improve generalization performance of the Easy Ensemble classifier. Experimental results on the UCI data sets show that MIEE obtain better performance, compared with the asymmetric bagging and Easy Ensemble.
引用
收藏
页码:517 / 520
页数:4
相关论文
共 50 条
  • [41] Causal Feature Selection with Missing Data
    Yu, Kui
    Yang, Yajing
    Ding, Wei
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2022, 16 (04)
  • [42] A Spectral Feature Selection Approach With Kernelized Fuzzy Rough Sets
    Chen, Jinkun
    Lin, Yaojin
    Mi, Jusheng
    Li, Shaozi
    Ding, Weiping
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2022, 30 (08) : 2886 - 2901
  • [43] Feature selection based on neighborhood rough sets and Gini index
    Zhang, Yuchao
    Nie, Bin
    Du, Jianqiang
    Chen, Jiandong
    Du, Yuwen
    Jin, Haike
    Zheng, Xuepeng
    Chen, Xingxin
    Miao, Zhen
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [44] Identifying Feature Pattern for Weighted Imbalance Data: A Feature Selection Study for Thoracolumbar Spine Fractures in Crash Injury Research
    Nitu, Paromita S.
    Madiraju, Praveen
    Pintar, Frank A.
    2020 IEEE 21ST INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2020), 2020, : 142 - 147
  • [45] Instance and Feature Selection Using Fuzzy Rough Sets: A Bi-Selection Approach for Data Reduction
    Zhang, Xiao
    Mei, Changlin
    Li, Jinhai
    Yang, Yanyan
    Qian, Ting
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2023, 31 (06) : 1981 - 1994
  • [46] Multi-label feature selection based on manifold regularization and imbalance ratio
    Lu, Haohan
    Chen, Hongmei
    Li, Tianrui
    Chen, Hao
    Luo, Chuan
    APPLIED INTELLIGENCE, 2022, 52 (10) : 11652 - 11671
  • [47] A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis
    Borah, Kasmika
    Das, Himanish Shekhar
    Seth, Soumita
    Mallick, Koushik
    Rahaman, Zubair
    Mallik, Saurav
    FUNCTIONAL & INTEGRATIVE GENOMICS, 2024, 24 (05)
  • [48] A multiple association-based unsupervised feature selection algorithm for mixed data sets
    Taha, Ayman
    Hadi, Ali S.
    Cosgrave, Bernard
    McKeever, Susan
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 212
  • [49] Feature Selection for Multi-Class Imbalanced Data Sets Based on Genetic Algorithm
    Du L.-M.
    Xu Y.
    Zhu H.
    Ann. Data Sci., 3 (293-300): : 293 - 300
  • [50] Neighborhood multigranulation rough sets for cost-sensitive feature selection on hybrid data
    Shu, Wenhao
    Xia, Qiang
    Qian, Wenbin
    NEUROCOMPUTING, 2024, 565