Evolutionary Feature Subset Selection with Compression-based Entropy Estimation

被引:2
作者
Kromer, Pavel [1 ]
Platos, Jan [1 ]
机构
[1] VSB Tech Univ Ostrava, 17 Listopadu 15, Ostrava 70833, Czech Republic
来源
GECCO'16: PROCEEDINGS OF THE 2016 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE | 2016年
关键词
Genetic algorithms; differential evolution; feature subset selection; entropy estimation;
D O I
10.1145/2908812.2908853
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Modern massive data sets often comprise of millions of records and thousands of features. Their efficient processing by traditional methods represents an increasing challenge. Feature selection methods form a family of traditional instruments for data dimensionality reduction. They aim at selecting subsets of data features so that the loss of information, contained in the full data set, is minimized. Evolutionary feature selection methods have shown good ability to identify feature subsets in very-high-dimensional data sets. Their efficiency depends, among others, on a particular optimization algorithm, feature subset representation, and objective function definition. In this paper, two evolutionary methods for fixed-length subset selection are employed to find feature subsets on the basis of their entropy, estimated by a fast data compression algorithm. The reasonability of the fitness criterion, ability of the investigated methods to find good feature subsets, and the usefulness of selected feature subsets for practical data mining, is evaluated using two well-known data sets and several widely-used classification algorithms.
引用
收藏
页码:933 / 940
页数:8
相关论文
共 30 条
  • [1] Affenzeller M, 2009, NUMER INSIGHT, pXXV
  • [2] Aggarwal CC, 2001, SIGMOD RECORD, V30, P37
  • [3] [Anonymous], SPRINGER INT SERIES
  • [4] Berger AL, 1996, COMPUT LINGUIST, V22, P39
  • [5] Biesiada J., 2005, INT C RES EL APPL IN, P1
  • [6] FPC: A High-Speed Compressor for Double-Precision Floating-Point Data
    Burtscher, Martin
    Ratanaworabhan, Paruj
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2009, 58 (01) : 18 - 31
  • [7] Cicirello VA, 2006, GECCO 2006: GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, VOL 1 AND 2, P1125
  • [8] Clustering by compression
    Cilibrasi, R
    Vitányi, PMB
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2005, 51 (04) : 1523 - 1545
  • [9] Czarn A, 2004, LECT NOTES ARTIF INT, V3339, P1246
  • [10] Nature inspired feature selection meta-heuristics
    Diao, Ren
    Shen, Qiang
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2015, 44 (03) : 311 - 340