EasyEnsemble and Feature Selection for Imbalance Data Sets

被引:80
作者
Liu, Tian-Yu [1 ]
机构
[1] Shanghai Dianji Univ, Sch Elect, Shanghai 200240, Peoples R China
来源
2009 INTERNATIONAL JOINT CONFERENCE ON BIOINFORMATICS, SYSTEMS BIOLOGY AND INTELLIGENT COMPUTING, PROCEEDINGS | 2009年
关键词
unbalanced data sets; EasyEnsemble; mutual information; feature selection; RELEVANCE;
D O I
10.1109/IJCBS.2009.22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There are many labeled data sets which have an unbalanced representation among the classes in them. When the imbalance is large, classification accuracy on the smaller class tends to be lower. In particular, when a class is of great interest but occurs relatively rarely such as cases of fraud, instances of disease, and so on, it is important to accurately identify it. Here we propose a novel algorithm named MIEE (Mutual Information based feature selection for Easy Ensemble) to treat this problem and improve generalization performance of the Easy Ensemble classifier. Experimental results on the UCI data sets show that MIEE obtain better performance, compared with the asymmetric bagging and Easy Ensemble.
引用
收藏
页码:517 / 520
页数:4
相关论文
共 50 条
  • [31] Feature Selection using Mutual Information for High-dimensional Data Sets
    Nagpal, Arpita
    Gaur, Deepti
    Gaur, Seema
    SOUVENIR OF THE 2014 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2014, : 45 - 49
  • [32] Application of genetic algorithm-PLS for feature selection in spectral data sets
    Leardi, R
    JOURNAL OF CHEMOMETRICS, 2000, 14 (5-6) : 643 - 655
  • [33] Feature selection with partition differentiation entropy for large-scale data sets
    Li, Fachao
    Zhang, Zan
    Jin, Chenxia
    INFORMATION SCIENCES, 2016, 329 : 690 - 700
  • [34] Feature selection based on neighborhood rough sets and Gini index
    Zhang, Yuchao
    Nie, Bin
    Du, Jianqiang
    Chen, Jiandong
    Du, Yuwen
    Jin, Haike
    Zheng, Xuepeng
    Chen, Xingxin
    Miao, Zhen
    PEERJ, 2023, 11
  • [35] Mutual information criterion for feature selection from incomplete data
    Qian, Wenbin
    Shu, Wenhao
    NEUROCOMPUTING, 2015, 168 : 210 - 220
  • [36] An Algorithm for Cross-Dependent Feature Selection of Genetic Data
    Zhang L.
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2022, 51 (05): : 754 - 759
  • [37] Feature selection based on neighborhood rough sets and Gini index
    Zhang, Yuchao
    Nie, Bin
    Du, Jianqiang
    Chen, Jiandong
    Du, Yuwen
    Jin, Haike
    Zheng, Xuepeng
    Chen, Xingxin
    Miao, Zhen
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [38] Causal Feature Selection with Missing Data
    Yu, Kui
    Yang, Yajing
    Ding, Wei
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2022, 16 (04)
  • [39] FRIEND: Feature selection on inconsistent data
    Qi Z.
    Wang H.
    He T.
    Li J.
    Gao H.
    Wang, Hongzhi (wangzh@hit.edu.cn), 1600, Elsevier B.V., Netherlands (391): : 52 - 64
  • [40] Robust object detection using fast feature selection from huge feature sets
    Le, Duy-Dinh
    Satoh, Shin'ichi
    2006 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP 2006, PROCEEDINGS, 2006, : 961 - +