Feature selection with partition differentiation entropy for large-scale data sets

被引:34
作者
Li, Fachao [1 ]
Zhang, Zan [2 ]
Jin, Chenxia [1 ]
机构
[1] Hebei Univ Sci & Technol, Sch Econ & Management, Shijiazhuang 050018, Peoples R China
[2] Tianjin Univ, Coll Management & Econ, Tianjin 300072, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature selection; Partition differentiation entropy; Attributes significance; Large-scale data sets; Uncertainty; INCREMENTAL FEATURE-SELECTION; ROUGH SET; ATTRIBUTE REDUCTION; OPTIMIZATION; UNCERTAINTY;
D O I
10.1016/j.ins.2015.10.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Feature selection, especially for large data sets, is a challenging problem in areas such as pattern recognition, machine learning and data mining. With the development of data collection and storage technologies, the data has become bigger than ever, thus making it difficult for learning from large data sets with traditional methods. In this paper, we introduce the partition differentiation entropy from the viewpoint of partition in rough sets to measure the significance and uncertainty of attributes, and present a feature selection method for large-scale data sets based on the information-theoretical measurement of attribute significance. Given a large-scale decision information system, the proposed method first divides it into small sub information systems according to the decision classes. Then by computing partition differentiation entropy in the sub-systems, the partition differentiation entropy of the attribute subset in the original decision information system is obtained. Accordingly, the important features are selected based on the value of partition differentiation entropy. The experimental results show that the idea of the proposed method is feasible and valid. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:690 / 700
页数:11
相关论文
共 36 条
  • [1] [Anonymous], 1995, THESIS
  • [2] Feature selection with SVD entropy: Some modification and extension
    Banerjee, Monami
    Pal, Nikhil R.
    [J]. INFORMATION SCIENCES, 2014, 264 : 118 - 134
  • [3] A Hamming distance based binary particle swarm optimization (HDBPSO) algorithm for high dimensional feature selection, classification and validation
    Banka, Haider
    Dara, Suresh
    [J]. PATTERN RECOGNITION LETTERS, 2015, 52 : 94 - 100
  • [4] Attribute selection based on a new conditional entropy for incomplete decision systems
    Dai, Jianhua
    Wang, Wentao
    Tian, Haowei
    Liu, Liang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2013, 39 : 207 - 213
  • [5] Feature selection by multi-objective optimisation: Application to network anomaly detection by hierarchical self-organising maps
    de la Hoz, Emiro
    de la Hoz, Eduardo
    Ortiz, Andres
    Ortega, Julio
    Martinez-Alvarez, Antonio
    [J]. KNOWLEDGE-BASED SYSTEMS, 2014, 71 : 322 - 338
  • [6] GRZYMALA-BUSSE J.W., 1991, Managing uncertainty in expert systems
  • [7] Transmission of information
    Hartley, RVL
    [J]. BELL SYSTEM TECHNICAL JOURNAL, 1928, 7 (03): : 535 - 563
  • [8] Feature selection for linear SVMs under uncertain data: Robust optimization based on difference of convex functions algorithms
    Hoai An Le Thi
    Xuan Thanh Vo
    Tao Pham Dinh
    [J]. NEURAL NETWORKS, 2014, 59 : 36 - 50
  • [9] Neighborhood rough set based heterogeneous feature subset selection
    Hu, Qinghua
    Yu, Daren
    Liu, Jinfu
    Wu, Congxin
    [J]. INFORMATION SCIENCES, 2008, 178 (18) : 3577 - 3594
  • [10] LEARNING IN RELATIONAL DATABASES - A ROUGH SET APPROACH
    HU, XH
    CERCONE, N
    [J]. COMPUTATIONAL INTELLIGENCE, 1995, 11 (02) : 323 - 338