Genetic Algorithm for the Mutual Information-Based Feature Selection in Univariate Time Series Data

被引:9
作者
Siddiqi, Umair F. [1 ]
Sait, Sadiq M. [1 ,2 ]
Kaynak, Okyay [3 ]
机构
[1] King Fahd Univ Petr & Minerals, Ctr Commun & IT Res, Res Inst, Dhahran 31261, Saudi Arabia
[2] King Fahd Univ Petr & Minerals, Dept Comp Engn, Dhahran 31261, Saudi Arabia
[3] Bogazici Univ, Dept Elect & Elect Engn, TR-80815 Istanbul, Turkey
关键词
Feature selection; genetic algorithm; machine learning; deep learning; optimization methods; forecasting;
D O I
10.1109/ACCESS.2020.2964803
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Filters are the fastest among the different types of feature selection methods. They employ metrics from information theory, such as mutual information (MI), Joint-MI (JMI), and minimal redundancy and maximal relevance (mRMR). The determination of the optimal feature selection set is an NP-hard problem. This work proposes the engineering of the Genetic Algorithm (GA) in which the fitness of solutions consists of two terms. The first is a feature selection metric such as MI, JMI, and mRMR, and the second term is the overlapping-coefficient that accounts for the diversity in the GA population. Experimental results show that the proposed algorithm can return multiple good quality solutions that also have minimal overlap with each other. Numerous solutions provide significant benefits when the test data contains none or missing values. Experiments were conducted using two publicly available time-series datasets. The feature sets are also applied to perform forecasting using a simple Long Short-Term Memory (LSTM) model, and the solution quality of the forecasting using different feature sets is analyzed. The proposed algorithm was compared with a popular optimization tool 'Basic Open-source Nonlinear Mixed INteger programming' (BONMIN), and a recent feature selection algorithm 'Conditional Mutual Information Considering Feature Interaction' (CMFSI). The experiments show that the multiple solutions found by the proposed method have good quality and minimal overlap.
引用
收藏
页码:9597 / 9609
页数:13
相关论文
共 29 条
[1]   Using the mutual information technique to select explanatory variables in artificial neural networks for rainfall forecasting [J].
Babel, Mukand S. ;
Badgujar, Girish B. ;
Shinde, Victor R. .
METEOROLOGICAL APPLICATIONS, 2015, 22 (03) :610-616
[2]   Mutual information and sensitivity analysis for feature selection in customer targeting: A comparative study [J].
Barraza, Nestor ;
Moro, Sergio ;
Ferreyra, Marcelo ;
de la Pena, Adolfo .
JOURNAL OF INFORMATION SCIENCE, 2019, 45 (01) :53-67
[3]   USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL-NET LEARNING [J].
BATTITI, R .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04) :537-550
[4]   Feature selection using Joint Mutual Information Maximisation [J].
Bennasar, Mohamed ;
Hicks, Yulia ;
Setchi, Rossitza .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (22) :8520-8532
[5]   TRAINING A 3-NODE NEURAL NETWORK IS NP-COMPLETE [J].
BLUM, AL ;
RIVEST, RL .
NEURAL NETWORKS, 1992, 5 (01) :117-127
[6]   Benchmark for filter methods for feature selection in high-dimensional classification data [J].
Bommert, Andrea ;
Sun, Xudong ;
Bischl, Bernd ;
Rahnenfuehrer, Joerg ;
Lang, Michel .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 143
[7]   An algorithmic framework for convex mixed integer nonlinear programs [J].
Bonami, Pierre ;
Biegler, Lorenz T. ;
Conna, Andrew R. ;
Cornuejols, Gerard ;
Grossmann, Ignacio E. ;
Laird, Carl D. ;
Lee, Jon ;
Lodi, Andrea ;
Margot, Francois ;
Sawaya, Nicolas ;
Wachter, Andreas .
DISCRETE OPTIMIZATION, 2008, 5 (02) :186-204
[8]  
Borgonovo E., 2017, SENSITIVITY ANAL, V251
[9]   A survey on feature selection methods [J].
Chandrashekar, Girish ;
Sahin, Ferat .
COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) :16-28
[10]   Feature selection for time series prediction - A combined filter and wrapper approach for neural networks [J].
Crone, Sven F. ;
Kourentzes, Nikolaos .
NEUROCOMPUTING, 2010, 73 (10-12) :1923-1936