Feature Subset Selection based on Redundancy Maximized Clusters

被引:3
|
作者
Tarek, Md Hasan [1 ]
Kadir, Md Eusha [1 ]
Sharmin, Sadia [2 ]
Sajib, Abu Ashfaqur [3 ]
Ali, Amin Ahsan [2 ]
Shoyaib, Mohammad [1 ]
机构
[1] Univ Dhaka, Inst Informat Technol, Dhaka, Bangladesh
[2] Islamic Univ Technol, Comp Sci & Engn, Gazipur, Bangladesh
[3] Univ Dhaka, Genet Engn & Biotechnol, Dhaka, Bangladesh
关键词
Clustering; Normalized mutual information; Bias correction; Feature selection; CHRONIC LYMPHOCYTIC-LEUKEMIA; MUTUAL INFORMATION; ALGORITHMS;
D O I
10.1109/ICMLA52953.2021.00087
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection plays a vital role in the field of data mining and machine learning for analyzing high-dimensional data. A popular criteria for feature selection is Mutual Information (MI) as it can capture both the linear and non-linear relationship among different features and class variable. Existing MI based feature selection methods use different approximation techniques to capture the joint performance of features, their relationship with the classes and eliminate the redundant features. However, these approximations may fail to select the optimal set of features, especially when the feature dimension is high. Besides, due to the absence of an appropriate searching strategy, these MI based approximations may select unnecessary features. To address these issues, we propose a method namely Feature Selection based on Redundancy maximized Clusters (FSRC) that creates the clusters of redundant features and then selects a subset of representative features from each cluster. We also propose to use bias corrected normalized MI in this regard. Rigorous experiments performed on thirty benchmark datasets demonstrate that FSRC outperforms the existing state-of-the-art methods in most of the cases. Moreover, FSRC is applied to three gene expression datasets which are high-dimensional but small sample datasets. The result shows that FSRC can select the features (genes) that are not only discriminating but also biologically relevant.
引用
收藏
页码:521 / 526
页数:6
相关论文
共 50 条
  • [1] Discovering the Representative Subset with Low Redundancy for Hyperspectral Feature Selection
    Zhang, Wenqiang
    Li, Xiaorun
    Zhao, Liaoying
    REMOTE SENSING, 2019, 11 (11)
  • [2] Clustering-based feature subset selection with analysis on the redundancy-complementarity dimension
    Chen, Zhijun
    Chen, Qiushi
    Zhang, Yishi
    Zhou, Lei
    Jiang, Junfeng
    Wu, Chaozhong
    Huang, Zhen
    COMPUTER COMMUNICATIONS, 2021, 168 : 65 - 74
  • [3] Maximum weight and minimum redundancy: A novel framework for feature subset selection
    Wang, Jianzhong
    Wu, Lishan
    Kong, Jun
    Li, Yuxin
    Zhang, Baoxue
    PATTERN RECOGNITION, 2013, 46 (06) : 1616 - 1627
  • [4] A Feature Selection Based on Relevance and Redundancy
    Lu, Yonghe
    Liu, Wenqiu
    Li, Yanfeng
    JOURNAL OF COMPUTERS, 2015, 10 (04) : 284 - 291
  • [5] FEATURE REDUNDANCY IN CONSONANT CLUSTERS
    WOOLLEY, DE
    LINGUISTICS, 1970, (64) : 70 - 93
  • [6] Differential Evolution based Feature Subset Selection
    Khushaba, Rami N.
    Al-Ani, Ahmed
    Al-Jumaily, Adel
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 3674 - 3677
  • [7] Feature Subset Selection based on Filter Technique
    Bibi, K. Fathima
    Banu, M. Nazreen
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATIONS TECHNOLOGIES (ICCCT 15), 2015, : 1 - 6
  • [8] Feature subset selection based on the genetic algorithm
    Yang, Jingwei
    Wang, Sile
    Chen, Yingyi
    Lu, Sukui
    Yang, Wenzhu
    ADVANCED TECHNOLOGIES IN MANUFACTURING, ENGINEERING AND MATERIALS, PTS 1-3, 2013, 774-776 : 1532 - +
  • [9] Feature ranking based consensus clustering for feature subset selection
    Rani, D. Sandhya
    Rani, T. Sobha
    Bhavani, S. Durga
    Krishna, G. Bala
    APPLIED INTELLIGENCE, 2024, 54 (17-18) : 8154 - 8169
  • [10] A Feature Selection Method Based on New Redundancy Measurement
    Li Z.-S.
    Lyu A.-N.
    Dongbei Daxue Xuebao/Journal of Northeastern University, 2020, 41 (11): : 1550 - 1556