A Population Initialization Method Based on Similarity and Mutual Information in Evolutionary Algorithm for Bi-Objective Feature Selection

被引:0
作者
Cai, Xu [1 ]
Xue, Yu [2 ]
机构
[1] Artificial Intelligence Research Institute, School of Information and Control Engineering, China University of Mining and Technology, Xuzhou
[2] School of Software, Nanjing University of Information Science and Technology, Nanjing
来源
ACM Transactions on Evolutionary Learning and Optimization | 2024年 / 4卷 / 03期
基金
中国国家自然科学基金;
关键词
bi-objective; evolutionary algorithm; Feature selection; initialization;
D O I
10.1145/3653025
中图分类号
学科分类号
摘要
Feature selection (FS) is an important data pre-processing technique in classification. It aims to remove redundant and irrelevant features from the data, which reduces the dimensionality of data and improves the performance of the classifier. Thus, FS is a bi-objective optimization problem, and evolutionary algorithms (EAs) have been proven to be effective in solving bi-objective FS problems. EA is a population-based metaheuristic algorithm, and the quality of the initial population is an important factor affecting the performance of EA. An improper initial population may negatively affect the convergence speed of the EA and even make the algorithm fall into the local optimum. In this article, we propose a similarity and mutual information-based initialization method, named SMII, to improve the quality of the initial population. This method determines the distribution of initial solutions based on similarity and shields features with high correlation to the selected features according to mutual information. In the experiment, we embed SMII, the latest four initialization methods, and a traditional random initialization method into NSGA-II and compared their performance on 15 public datasets. The experimental results show that SMII performs best on most datasets and can effectively improve the performance of the algorithm. Moreover, we compare the performance of two other EAs before and after embedding SMII on 15 datasets, and the results further prove that the proposed method can effectively improve the search capability of the EA for FS. © 2024 Copyright held by the owner/author(s).
引用
收藏
相关论文
共 54 条
[1]  
Al-Ani A., Alsukker A., Khushaba R.N., Feature subset selection using differential evolution and a wheel based search strategy, Swarm and Evolutionary Computation, 9, pp. 15-26, (2013)
[2]  
Bommert A., Welchowski T., Schmid M., Rahnenfuhrer J., Benchmark of filter methods for feature selection in high-dimensional gene expression survival data, Briefings in Bioinformatics, 23, 1, (2022)
[3]  
Chen Y., Hu X., Fan W., Shen L., Zhang Z., Liu X., Du J., Li H., Chen Y., Li H., Fast density peak clustering for large scale data based on kNN, Knowledge-Based Systems, 187, 2020, (2020)
[4]  
Dai J., Chen J., Liu Y., Hu H., Novel multi-label feature selection via label symmetric uncertainty correlation learning and feature redundancy evaluation, Knowledge-Based Systems, 207, 2020, (2020)
[5]  
Deb K., Pratap A., Agarwal S., Tamt M., A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Transactions on Evolutionary Computation, 6, 2, pp. 182-197, (2002)
[6]  
Deniz A., Kiziloz H.E., On initial population generation in feature subset selection, Expert Systems with Applications, 137, 2019, pp. 11-21, (2019)
[7]  
Dua D., Graff C., UCI Machine Learning Repository, (2017)
[8]  
Fan Q., Bi Y., Xue B., Zhang M., Genetic programming for feature extraction and construction in image classification, Applied Soft Computing, 118, 2022, (2022)
[9]  
Hancer E., Differential evolution for feature selection: A fuzzy wrapper–filter approach, Soft Computing, 23, 2019, pp. 5233-5248, (2019)
[10]  
Hancer E., An improved evolutionary wrapper-filter feature selection approach with a new initialisation scheme, Machine Learning, (2021)