Feature Subset Selection based on Redundancy Maximized Clusters

被引:3
作者
Tarek, Md Hasan [1 ]
Kadir, Md Eusha [1 ]
Sharmin, Sadia [2 ]
Sajib, Abu Ashfaqur [3 ]
Ali, Amin Ahsan [2 ]
Shoyaib, Mohammad [1 ]
机构
[1] Univ Dhaka, Inst Informat Technol, Dhaka, Bangladesh
[2] Islamic Univ Technol, Comp Sci & Engn, Gazipur, Bangladesh
[3] Univ Dhaka, Genet Engn & Biotechnol, Dhaka, Bangladesh
来源
20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021) | 2021年
关键词
Clustering; Normalized mutual information; Bias correction; Feature selection; CHRONIC LYMPHOCYTIC-LEUKEMIA; MUTUAL INFORMATION; ALGORITHMS;
D O I
10.1109/ICMLA52953.2021.00087
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection plays a vital role in the field of data mining and machine learning for analyzing high-dimensional data. A popular criteria for feature selection is Mutual Information (MI) as it can capture both the linear and non-linear relationship among different features and class variable. Existing MI based feature selection methods use different approximation techniques to capture the joint performance of features, their relationship with the classes and eliminate the redundant features. However, these approximations may fail to select the optimal set of features, especially when the feature dimension is high. Besides, due to the absence of an appropriate searching strategy, these MI based approximations may select unnecessary features. To address these issues, we propose a method namely Feature Selection based on Redundancy maximized Clusters (FSRC) that creates the clusters of redundant features and then selects a subset of representative features from each cluster. We also propose to use bias corrected normalized MI in this regard. Rigorous experiments performed on thirty benchmark datasets demonstrate that FSRC outperforms the existing state-of-the-art methods in most of the cases. Moreover, FSRC is applied to three gene expression datasets which are high-dimensional but small sample datasets. The result shows that FSRC can select the features (genes) that are not only discriminating but also biologically relevant.
引用
收藏
页码:521 / 526
页数:6
相关论文
共 32 条
[1]  
Akhter Suravi, 2021, Computational Science - ICCS 2021. 21st International Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12742), P278, DOI 10.1007/978-3-030-77961-0_24
[2]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[3]   Hypoxia modulates the gene expression profile of immunoregulatory receptors in human mature dendritic cells: identification of TREM-1 as a novel hypoxic marker in vitro and in vivo [J].
Bosco, Maria Carla ;
Pierobon, Daniele ;
Blengio, Fabiola ;
Raggi, Federica ;
Vanni, Cristina ;
Gattorno, Marco ;
Eva, Alessandra ;
Novelli, Francesco ;
Cappello, Paola ;
Giovarelli, Mirella ;
Varesio, Luigi .
BLOOD, 2011, 117 (09) :2625-2639
[4]   Hypoxia influences polysome distribution of human ribosomal protein S12 and alternative splicing of ribosomal protein mRNAs [J].
Brumwell, Andrea ;
Fell, Leslie ;
Obress, Lindsay ;
Uniacke, James .
RNA, 2020, 26 (03) :361-371
[5]   DORSAL MIDBRAIN ENCEPHALITIS CAUSED BY PROPIONIBACTERIUM-ACNES - REPORT OF 2 CASES [J].
CAMARATA, PJ ;
MCGEACHIE, RE ;
HAINES, SJ .
JOURNAL OF NEUROSURGERY, 1990, 72 (04) :654-659
[6]   mRNA-to-protein translation in hypoxia [J].
Chee, Nancy T. ;
Lohse, Ines ;
Brothers, Shaun P. .
MOLECULAR CANCER, 2019, 18 (1)
[7]   Small Cell Breast Cancer with Lung Metastases [J].
Cruz Castellanos, Patricia ;
Quintana, Laura ;
de Castro, Javier .
ARCHIVOS DE BRONCONEUMOLOGIA, 2018, 54 (11) :586-587
[8]  
Dua D., 2017, UCI Machine Learning Repository
[9]   Hypoxia-induced alternative splicing: the 11th Hallmark of Cancer [J].
Farina, Antonietta Rosella ;
Cappabianca, Lucia ;
Sebastiano, Michela ;
Zelli, Veronica ;
Guadagni, Stefano ;
Mackay, Andrew Reay .
JOURNAL OF EXPERIMENTAL & CLINICAL CANCER RESEARCH, 2020, 39 (01)
[10]   Research on collaborative negotiation for e-commerce. [J].
Feng, YQ ;
Lei, Y ;
Li, Y ;
Cao, RZ .
2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, :2085-2088