Relevance assignation feature selection method based on mutual information for machine learning

被引:36
作者
Gao, Liyang [1 ]
Wu, Weiguo [2 ]
机构
[1] Harbin Inst Technol, Sch Mechatron Engn, Room 1046,Jixie Bldg,92 West Dazhi St, Harbin 150001, Heilongjiang, Peoples R China
[2] Harbin Inst Technol, Sch Mechatron Engn, 424 Mailbox,92 West Dazhi St, Harbin 150001, Heilongjiang, Peoples R China
关键词
Feature selection; Kernel function; Mutual information; Redundancy evaluation; Relevance assignation; FILTER METHOD; SCORE;
D O I
10.1016/j.knosys.2020.106439
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the complication of the subjects and environment of the machine learning, feature selection methods have been used more frequently as an effective mean of dimension reduction. However, existing feature selection methods are deficient in striking a balance between the relevance evaluation accuracy with the searching efficiency. In this regard, the characteristics of the relevance between the feature set and the classification result are analyzed. Then, we propose our Relevance Assignation Feature Selection (RAFS) method based on the mutual information theory, which assigns the relevance evaluation according to the redundancy. With this method, we can estimate the contribution of each feature in a feature set, which is regarded as value of the feature and is used as the heuristic index in searching of the relevant features. A special dataset ("Grid World") with strong interactive features is designed. Using the Grid World and six other natural datasets, the proposed method is compared with six other feature selection methods. Results show that in the Grid World dataset, the RAFS method can find correct relevant features with the probability above 90%, much higher than the others. In six other datasets, the RAFS method also has the best performance in the classification accuracy. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:13
相关论文
共 32 条
[1]   Comparative Analysis of Feature Selection Algorithms for Computational Personality Prediction From Social Media [J].
Al Marouf, Ahmed ;
Hasan, Md. Kamrul ;
Mahmud, Hasan .
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2020, 7 (03) :587-599
[2]   A practical tool for maximal information coefficient analysis [J].
Albanese, Davide ;
Riccadonna, Samantha ;
Donati, Claudio ;
Franceschi, Pietro .
GIGASCIENCE, 2018, 7 (04) :1-8
[3]   Abstraction from demonstration for efficient reinforcement learning in high-dimensional domains [J].
Cobo, Luis C. ;
Subramanian, Kaushik ;
Isbell, Charles L., Jr. ;
Lanterman, Aaron D. ;
Thomaz, Andrea L. .
ARTIFICIAL INTELLIGENCE, 2014, 216 :103-128
[4]   A comparative study on feature selection for a risk prediction model for colorectal cancer [J].
Cueto-Lopez, Nahum ;
Teresa Garcia-Ordas, Maria ;
Davila-Batista, Veronica ;
Moreno, Victor ;
Aragones, Nduria ;
Alaiz-Rodriguez, Rocio .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2019, 177 :219-229
[5]   Ensemble feature selection using bi-objective genetic algorithm [J].
Das, Asit K. ;
Das, Sunanda ;
Ghosh, Arka .
KNOWLEDGE-BASED SYSTEMS, 2017, 123 :116-127
[6]   Normalized Mutual Information Feature Selection [J].
Estevez, Pablo. A. ;
Tesmer, Michel ;
Perez, Claudio A. ;
Zurada, Jacek A. .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2009, 20 (02) :189-201
[7]  
Fleuret F, 2004, J MACH LEARN RES, V5, P1531
[8]  
Hall M.A., 2000, P 17 INT C MACH LEAR, P359
[9]  
Hongbin Dong, 2019, 2019 IEEE/ACIS 18th International Conference on Computer and Information Science (ICIS). Proceedings, P208
[10]   Relations between two sets of variates [J].
Hotelling, H .
BIOMETRIKA, 1936, 28 :321-377