Feature selection with partition differentiation entropy for large-scale data sets

被引：34

作者：

Li, Fachao ^{[1
]}

Zhang, Zan ^{[2
]}

Jin, Chenxia ^{[1
]}

机构：

[1] Hebei Univ Sci & Technol, Sch Econ & Management, Shijiazhuang 050018, Peoples R China

[2] Tianjin Univ, Coll Management & Econ, Tianjin 300072, Peoples R China

来源：

INFORMATION SCIENCES | 2016年 / 329卷

基金：

中国国家自然科学基金;

关键词：

Feature selection; Partition differentiation entropy; Attributes significance; Large-scale data sets; Uncertainty; INCREMENTAL FEATURE-SELECTION; ROUGH SET; ATTRIBUTE REDUCTION; OPTIMIZATION; UNCERTAINTY;

D O I：

10.1016/j.ins.2015.10.002

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Feature selection, especially for large data sets, is a challenging problem in areas such as pattern recognition, machine learning and data mining. With the development of data collection and storage technologies, the data has become bigger than ever, thus making it difficult for learning from large data sets with traditional methods. In this paper, we introduce the partition differentiation entropy from the viewpoint of partition in rough sets to measure the significance and uncertainty of attributes, and present a feature selection method for large-scale data sets based on the information-theoretical measurement of attribute significance. Given a large-scale decision information system, the proposed method first divides it into small sub information systems according to the decision classes. Then by computing partition differentiation entropy in the sub-systems, the partition differentiation entropy of the attribute subset in the original decision information system is obtained. Accordingly, the important features are selected based on the value of partition differentiation entropy. The experimental results show that the idea of the proposed method is feasible and valid. (C) 2015 Elsevier Inc. All rights reserved.

引用

页码：690 / 700

页数：11

共 36 条

[1] [Anonymous], 1995, THESIS
[2] Feature selection with SVD entropy: Some modification and extension
Banerjee, Monami
Pal, Nikhil R.
[J]. INFORMATION SCIENCES, 2014, 264 : 118 - 134
[3] A Hamming distance based binary particle swarm optimization (HDBPSO) algorithm for high dimensional feature selection, classification and validation
Banka, Haider
Dara, Suresh
[J]. PATTERN RECOGNITION LETTERS, 2015, 52 : 94 - 100
[4] Attribute selection based on a new conditional entropy for incomplete decision systems
Dai, Jianhua
Wang, Wentao
Tian, Haowei
Liu, Liang
[J]. KNOWLEDGE-BASED SYSTEMS, 2013, 39 : 207 - 213
[5] Feature selection by multi-objective optimisation: Application to network anomaly detection by hierarchical self-organising maps
de la Hoz, Emiro
de la Hoz, Eduardo
Ortiz, Andres
Ortega, Julio
Martinez-Alvarez, Antonio
[J]. KNOWLEDGE-BASED SYSTEMS, 2014, 71 : 322 - 338
[6] GRZYMALA-BUSSE J.W., 1991, Managing uncertainty in expert systems
[7] Transmission of information
Hartley, RVL
[J]. BELL SYSTEM TECHNICAL JOURNAL, 1928, 7 (03): : 535 - 563
[8] Feature selection for linear SVMs under uncertain data: Robust optimization based on difference of convex functions algorithms
Hoai An Le Thi
Xuan Thanh Vo
Tao Pham Dinh
[J]. NEURAL NETWORKS, 2014, 59 : 36 - 50
[9] Neighborhood rough set based heterogeneous feature subset selection
Hu, Qinghua
Yu, Daren
Liu, Jinfu
Wu, Congxin
[J]. INFORMATION SCIENCES, 2008, 178 (18) : 3577 - 3594
[10] LEARNING IN RELATIONAL DATABASES - A ROUGH SET APPROACH
HU, XH
CERCONE, N
[J]. COMPUTATIONAL INTELLIGENCE, 1995, 11 (02) : 323 - 338

← 1 2 3 4 →