EasyEnsemble and Feature Selection for Imbalance Data Sets

被引:80
|
作者
Liu, Tian-Yu [1 ]
机构
[1] Shanghai Dianji Univ, Sch Elect, Shanghai 200240, Peoples R China
来源
2009 INTERNATIONAL JOINT CONFERENCE ON BIOINFORMATICS, SYSTEMS BIOLOGY AND INTELLIGENT COMPUTING, PROCEEDINGS | 2009年
关键词
unbalanced data sets; EasyEnsemble; mutual information; feature selection; RELEVANCE;
D O I
10.1109/IJCBS.2009.22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There are many labeled data sets which have an unbalanced representation among the classes in them. When the imbalance is large, classification accuracy on the smaller class tends to be lower. In particular, when a class is of great interest but occurs relatively rarely such as cases of fraud, instances of disease, and so on, it is important to accurately identify it. Here we propose a novel algorithm named MIEE (Mutual Information based feature selection for Easy Ensemble) to treat this problem and improve generalization performance of the Easy Ensemble classifier. Experimental results on the UCI data sets show that MIEE obtain better performance, compared with the asymmetric bagging and Easy Ensemble.
引用
收藏
页码:517 / 520
页数:4
相关论文
共 50 条
  • [21] Markov Blanket Feature Selection Using Representative Sets
    Yu, Kui
    Wu, Xindong
    Ding, Wei
    Mu, Yang
    Wang, Hao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (11) : 2775 - 2788
  • [22] A Fitting Model for Feature Selection With Fuzzy Rough Sets
    Wang, Changzhong
    Qi, Yali
    Shao, Mingwen
    Hu, Qinghua
    Chen, Degang
    Qian, Yuhua
    Lin, Yaojin
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2017, 25 (04) : 741 - 753
  • [23] FEATURE SELECTION FOR HIGH-DIMENSIONAL DATA ANALYSIS
    Verleysen, Michel
    ECTA 2011/FCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON EVOLUTIONARY COMPUTATION THEORY AND APPLICATIONS AND INTERNATIONAL CONFERENCE ON FUZZY COMPUTATION THEORY AND APPLICATIONS, 2011,
  • [24] Feature Selection and Classification in gene expression cancer data
    Pavithra, D.
    Lakshmanan, B.
    2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN DATA SCIENCE (ICCIDS), 2017,
  • [25] FEATURE SELECTION FOR HIGH-DIMENSIONAL DATA ANALYSIS
    Verleysen, Michel
    NCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON NEURAL COMPUTATION THEORY AND APPLICATIONS, 2011, : IS23 - IS25
  • [26] Feature selection with multi-view data: A survey
    Zhang, Rui
    Nie, Feiping
    Li, Xuelong
    Wei, Xian
    INFORMATION FUSION, 2019, 50 : 158 - 167
  • [27] A hybrid feature selection scheme for mixed attributes data
    Liu, Haitao
    Wei, Ruxiang
    Jiang, Guoping
    COMPUTATIONAL & APPLIED MATHEMATICS, 2013, 32 (01) : 145 - 161
  • [28] A Clustering Based Feature Selection Method Using Feature Information Distance for Text Data
    Chao, Shilong
    Cai, Jie
    Yang, Sheng
    Wang, Shulin
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2016, PT I, 2016, 9771 : 122 - 132
  • [29] Feature selection considering synergy between features based on soft neighborhood rough sets
    Chen, Lubin
    Chen, Jinkun
    Lin, Yaojin
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 150
  • [30] A Robust Linear Regression Feature Selection Method for Data Sets With Unknown Noise
    Guo, Yaqing
    Wang, Wenjian
    Wang, Xuejun
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (01) : 31 - 44