Feature Subset Selection based on Redundancy Maximized Clusters

被引:3
|
作者
Tarek, Md Hasan [1 ]
Kadir, Md Eusha [1 ]
Sharmin, Sadia [2 ]
Sajib, Abu Ashfaqur [3 ]
Ali, Amin Ahsan [2 ]
Shoyaib, Mohammad [1 ]
机构
[1] Univ Dhaka, Inst Informat Technol, Dhaka, Bangladesh
[2] Islamic Univ Technol, Comp Sci & Engn, Gazipur, Bangladesh
[3] Univ Dhaka, Genet Engn & Biotechnol, Dhaka, Bangladesh
关键词
Clustering; Normalized mutual information; Bias correction; Feature selection; CHRONIC LYMPHOCYTIC-LEUKEMIA; MUTUAL INFORMATION; ALGORITHMS;
D O I
10.1109/ICMLA52953.2021.00087
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection plays a vital role in the field of data mining and machine learning for analyzing high-dimensional data. A popular criteria for feature selection is Mutual Information (MI) as it can capture both the linear and non-linear relationship among different features and class variable. Existing MI based feature selection methods use different approximation techniques to capture the joint performance of features, their relationship with the classes and eliminate the redundant features. However, these approximations may fail to select the optimal set of features, especially when the feature dimension is high. Besides, due to the absence of an appropriate searching strategy, these MI based approximations may select unnecessary features. To address these issues, we propose a method namely Feature Selection based on Redundancy maximized Clusters (FSRC) that creates the clusters of redundant features and then selects a subset of representative features from each cluster. We also propose to use bias corrected normalized MI in this regard. Rigorous experiments performed on thirty benchmark datasets demonstrate that FSRC outperforms the existing state-of-the-art methods in most of the cases. Moreover, FSRC is applied to three gene expression datasets which are high-dimensional but small sample datasets. The result shows that FSRC can select the features (genes) that are not only discriminating but also biologically relevant.
引用
收藏
页码:521 / 526
页数:6
相关论文
共 50 条
  • [31] Wrappers for feature subset selection
    Silicon Graphics, Inc, Mountain View, United States
    Artif Intell, 1-2 (273-324):
  • [32] Multi-label feature selection based on label correlations and feature redundancy
    Fan, Yuling
    Chen, Baihua
    Huang, Weiqin
    Liu, Jinghua
    Weng, Wei
    Lan, Weiyao
    KNOWLEDGE-BASED SYSTEMS, 2022, 241
  • [33] Feature Redundancy Based on Interaction Information for Multi-Label Feature Selection
    Gao, Wanfu
    Hu, Juncheng
    Li, Yonghao
    Zhang, Ping
    IEEE ACCESS, 2020, 8 : 146050 - 146064
  • [34] Feature redundancy term variation for mutual information-based feature selection
    Gao, Wanfu
    Hu, Liang
    Zhang, Ping
    APPLIED INTELLIGENCE, 2020, 50 (04) : 1272 - 1288
  • [35] Feature redundancy term variation for mutual information-based feature selection
    Wanfu Gao
    Liang Hu
    Ping Zhang
    Applied Intelligence, 2020, 50 : 1272 - 1288
  • [36] Feature Selection With Controlled Redundancy in a Fuzzy Rule Based Framework
    Chung, I-Fang
    Chen, Yi-Cheng
    Pal, Nikhil R.
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2018, 26 (02) : 734 - 748
  • [37] A new feature selection algorithm based on relevance, redundancy and complementarity
    Li, Chao
    Luo, Xiao
    Qi, Yanpeng
    Gao, Zhenbo
    Lin, Xiaohui
    COMPUTERS IN BIOLOGY AND MEDICINE, 2020, 119
  • [38] Feature Selection based on Improved Maximal Relevance and Minimal Redundancy
    Hao, Huijuan
    Wang, Maoli
    Tang, Yongwei
    PROCEEDINGS OF 2016 IEEE ADVANCED INFORMATION MANAGEMENT, COMMUNICATES, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IMCEC 2016), 2016, : 1426 - 1429
  • [39] A feature selection algorithm based on redundancy analysis and interaction weight
    Gu, Xiangyuan
    Guo, Jichang
    Li, Chongyi
    Xiao, Lijun
    APPLIED INTELLIGENCE, 2021, 51 (04) : 2672 - 2686
  • [40] FEATURE SELECTION USING GRAPH CUTS BASED ON RELEVANCE AND REDUNDANCY
    Ishii, Masato
    Sato, Atsushi
    2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 4292 - 4296