TAGA: Tabu Asexual Genetic Algorithm embedded in a filter/filter feature selection approach for high-dimensional data

被引:34
作者
Salesi, Sadegh [1 ]
Cosma, Georgina [2 ]
Mavrovouniotis, Michalis [3 ]
机构
[1] Nottingham Trent Univ, Sch Sci & Technol, Dept Comp, Nottingham, England
[2] Loughborough Univ, Sch Sci, Dept Comp Sci, Loughborough, Leics, England
[3] Univ Cyprus, Dept Elect & Comp Engn, KIOS Res & Innovat Ctr Excellence, Nicosia, Cyprus
关键词
INFORMATION; RELEVANCE;
D O I
10.1016/j.ins.2021.01.020
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Feature selection is the process of selecting an optimal subset of features required for maintaining or improving the performance of data mining models. Recently, hybrid filter/wrapper feature selection methods have shown promising results for high dimensional data. However, filter/wrapper methods lack of generalisation power, which enables the selected features to be trainable over different classifiers without having to repeat the feature selection process. To address the generalisation power problem, this paper proposes a novel evolutionary-based filter feature selection algorithm that is sequentially hybridised with the Fisher score filter algorithm in a new hybrid framework called filter/filter. The proposed algorithm is based on a long-term memory Tabu Search combined with an Asexual (i.e. mutation-based) Genetic Algorithm (TAGA). TAGA benefits from a new integer-encoded solution representation, a novel mutation operator, a new tabu list encoding scheme, and uses a minimum redundancy maximum relevance information theory-based criterion as the fitness function. Experiments were carried out on various high-dimensional datasets including image, text, and biological data. The goodness of the selected subsets was evaluated using different classifiers and the experimental results demonstrate that TAGA outperforms other conventional and state-of-the-art feature selection algorithms. (C) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页码:105 / 127
页数:23
相关论文
共 47 条
[31]   Applications of information theory, genetic algorithms, and neural models to predict oil flow [J].
Ludwig, Oswaldo, Jr. ;
Nunes, Urbano ;
Araujo, Rui ;
Schnitman, Leizer ;
Lepikson, Herman Augusto .
COMMUNICATIONS IN NONLINEAR SCIENCE AND NUMERICAL SIMULATION, 2009, 14 (07) :2870-2885
[32]  
Mafarja M.M., 2018, Soft Computing, P1
[34]   Prediction error estimation: a comparison of resampling methods [J].
Molinaro, AM ;
Simon, R ;
Pfeiffer, RM .
BIOINFORMATICS, 2005, 21 (15) :3301-3307
[35]   A Semidefinite Programming Based Search Strategy for Feature Selection with Mutual Information Measure [J].
Naghibi, Tofigh ;
Hoffmann, Sarah ;
Pfister, Beat .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (08) :1529-1541
[36]   Effective Global Approaches for Mutual Information Based Feature Selection [J].
Nguyen, Xuan Vinh ;
Chan, Jeffrey ;
Romano, Simone ;
Bailey, James .
PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, :512-521
[37]   Design and implementation of super-heterodyne nano-metrology circuits [J].
Olyaee S. ;
Dashtban Z. ;
Dashtban M.H. .
Frontiers of Optoelectronics, 2013, 6 (3) :318-326
[38]   Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy [J].
Peng, HC ;
Long, FH ;
Ding, C .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (08) :1226-1238
[39]  
Piniganti L., 2014, THESIS UNLV, V2132
[40]  
Rodriguez-Lujan I, 2010, J MACH LEARN RES, V11, P1491