Genetic Algorithm for the Mutual Information-Based Feature Selection in Univariate Time Series Data
被引:9
作者:
Siddiqi, Umair F.
论文数: 0引用数: 0
h-index: 0
机构:
King Fahd Univ Petr & Minerals, Ctr Commun & IT Res, Res Inst, Dhahran 31261, Saudi ArabiaKing Fahd Univ Petr & Minerals, Ctr Commun & IT Res, Res Inst, Dhahran 31261, Saudi Arabia
Siddiqi, Umair F.
[1
]
Sait, Sadiq M.
论文数: 0引用数: 0
h-index: 0
机构:
King Fahd Univ Petr & Minerals, Ctr Commun & IT Res, Res Inst, Dhahran 31261, Saudi Arabia
King Fahd Univ Petr & Minerals, Dept Comp Engn, Dhahran 31261, Saudi ArabiaKing Fahd Univ Petr & Minerals, Ctr Commun & IT Res, Res Inst, Dhahran 31261, Saudi Arabia
Sait, Sadiq M.
[1
,2
]
Kaynak, Okyay
论文数: 0引用数: 0
h-index: 0
机构:
Bogazici Univ, Dept Elect & Elect Engn, TR-80815 Istanbul, TurkeyKing Fahd Univ Petr & Minerals, Ctr Commun & IT Res, Res Inst, Dhahran 31261, Saudi Arabia
Kaynak, Okyay
[3
]
机构:
[1] King Fahd Univ Petr & Minerals, Ctr Commun & IT Res, Res Inst, Dhahran 31261, Saudi Arabia
[2] King Fahd Univ Petr & Minerals, Dept Comp Engn, Dhahran 31261, Saudi Arabia
Filters are the fastest among the different types of feature selection methods. They employ metrics from information theory, such as mutual information (MI), Joint-MI (JMI), and minimal redundancy and maximal relevance (mRMR). The determination of the optimal feature selection set is an NP-hard problem. This work proposes the engineering of the Genetic Algorithm (GA) in which the fitness of solutions consists of two terms. The first is a feature selection metric such as MI, JMI, and mRMR, and the second term is the overlapping-coefficient that accounts for the diversity in the GA population. Experimental results show that the proposed algorithm can return multiple good quality solutions that also have minimal overlap with each other. Numerous solutions provide significant benefits when the test data contains none or missing values. Experiments were conducted using two publicly available time-series datasets. The feature sets are also applied to perform forecasting using a simple Long Short-Term Memory (LSTM) model, and the solution quality of the forecasting using different feature sets is analyzed. The proposed algorithm was compared with a popular optimization tool 'Basic Open-source Nonlinear Mixed INteger programming' (BONMIN), and a recent feature selection algorithm 'Conditional Mutual Information Considering Feature Interaction' (CMFSI). The experiments show that the multiple solutions found by the proposed method have good quality and minimal overlap.
机构:
Univ Buenos Aires, Univ Nacl Tres Febrero, Caseros & Sch Engn, Buenos Aires, DF, ArgentinaUniv Buenos Aires, Univ Nacl Tres Febrero, Caseros & Sch Engn, Buenos Aires, DF, Argentina
Barraza, Nestor
Moro, Sergio
论文数: 0引用数: 0
h-index: 0
机构:
IUL, ISTAR, ISCTE, Lisbon, PortugalUniv Buenos Aires, Univ Nacl Tres Febrero, Caseros & Sch Engn, Buenos Aires, DF, Argentina
Moro, Sergio
Ferreyra, Marcelo
论文数: 0引用数: 0
h-index: 0
机构:
Dataxplore, Buenos Aires, DF, ArgentinaUniv Buenos Aires, Univ Nacl Tres Febrero, Caseros & Sch Engn, Buenos Aires, DF, Argentina
Ferreyra, Marcelo
de la Pena, Adolfo
论文数: 0引用数: 0
h-index: 0
机构:
Boldt Gaming, Buenos Aires, DF, ArgentinaUniv Buenos Aires, Univ Nacl Tres Febrero, Caseros & Sch Engn, Buenos Aires, DF, Argentina
机构:
Univ Buenos Aires, Univ Nacl Tres Febrero, Caseros & Sch Engn, Buenos Aires, DF, ArgentinaUniv Buenos Aires, Univ Nacl Tres Febrero, Caseros & Sch Engn, Buenos Aires, DF, Argentina
Barraza, Nestor
Moro, Sergio
论文数: 0引用数: 0
h-index: 0
机构:
IUL, ISTAR, ISCTE, Lisbon, PortugalUniv Buenos Aires, Univ Nacl Tres Febrero, Caseros & Sch Engn, Buenos Aires, DF, Argentina
Moro, Sergio
Ferreyra, Marcelo
论文数: 0引用数: 0
h-index: 0
机构:
Dataxplore, Buenos Aires, DF, ArgentinaUniv Buenos Aires, Univ Nacl Tres Febrero, Caseros & Sch Engn, Buenos Aires, DF, Argentina
Ferreyra, Marcelo
de la Pena, Adolfo
论文数: 0引用数: 0
h-index: 0
机构:
Boldt Gaming, Buenos Aires, DF, ArgentinaUniv Buenos Aires, Univ Nacl Tres Febrero, Caseros & Sch Engn, Buenos Aires, DF, Argentina