Research on Feature Selection Methods Based on Feature Clustering and Information Theory

被引:0
作者
Wang, Wenhui [1 ]
Zhou, Changyin [1 ]
机构
[1] Shandong Univ Sci & Technol, Coll Math & Syst Sci, Qingdao 266590, Peoples R China
来源
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XIII, ICIC 2024 | 2024年 / 14874卷
基金
中国国家自然科学基金;
关键词
Feature Selection; Multivariate Symmetric Uncertainty; AP Clustering; Redundancy; Interactivity; Gene Expression Data;
D O I
10.1007/978-981-97-5618-6_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In order to identify the most representative subset of features in high-dimensional data, a feature selection algorithm (AP-MSU) based on feature clustering and information theory is proposed. The algorithm introduces the AP clustering algorithm and multivariate symmetric uncertainty (MSU) based on the filtering feature selection algorithm's preliminary screening of relevant features, better demonstrating the interactions between multiple feature variables and their interactions with target variables. The features are evaluated sequentially by an MSU-based feature quality metric, which considers both redundancy and interaction among the candidate features in the selected feature set, and removes the redundant features by assessing the ability of the features to provide effective categorization information with a small amount of computation. The experimental results show that the AP-MSU feature selection algorithm can effectively select a good feature set on binary and multi-classified gene expression datasets, and has good classification effect on different classifiers. In addition, the classification accuracy can be improved by the algorithm obtained a lower dimensional subset of features.
引用
收藏
页码:71 / 82
页数:12
相关论文
共 17 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]   Novel multi-label feature selection via label symmetric uncertainty correlation learning and feature redundancy evaluation [J].
Dai, Jianhua ;
Chen, Jiaolong ;
Liu, Ye ;
Hu, Hu .
KNOWLEDGE-BASED SYSTEMS, 2020, 207
[3]   Clustering by passing messages between data points [J].
Frey, Brendan J. ;
Dueck, Delbert .
SCIENCE, 2007, 315 (5814) :972-976
[4]   Information gain ratio-based subfeature grouping empowers particle swarm optimization for feature selection [J].
Gao, Jinrui ;
Wang, Ziqian ;
Jin, Ting ;
Cheng, Jiujun ;
Lei, Zhenyu ;
Gao, Shangce .
KNOWLEDGE-BASED SYSTEMS, 2024, 286
[5]   Feature redundancy term variation for mutual information-based feature selection [J].
Gao, Wanfu ;
Hu, Liang ;
Zhang, Ping .
APPLIED INTELLIGENCE, 2020, 50 (04) :1272-1288
[6]  
gdc.cancer, Cancer program datasets DS/OL
[7]   Measuring Interactions in Categorical Datasets Using Multivariate Symmetrical Uncertainty [J].
Gomez-Guerrero, Santiago ;
Ortiz, Inocencio ;
Sosa-Cabrera, Gustavo ;
Garcia-Torres, Miguel ;
Schaerer, Christian E. .
ENTROPY, 2022, 24 (01)
[8]   Subclass Mapping: Identifying Common Subtypes in Independent Disease Data Sets [J].
Hoshida, Yujin ;
Brunet, Jean-Philippe ;
Tamayo, Pablo ;
Golub, Todd R. ;
Mesirov, Jill P. .
PLOS ONE, 2007, 2 (11)
[9]  
Kononenko I., 1994, Machine Learning: ECML-94. European Conference on Machine Learning. Proceedings, P171
[10]   A new feature selection method based on symmetrical uncertainty and interaction gain [J].
Lin, Xiaohui ;
Li, Chao ;
Ren, Weijie ;
Luo, Xiao ;
Qi, Yanpeng .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2019, 83