Efficient feature selection filters for high-dimensional data

被引：143

作者：

Ferreira, Artur J. ^{[1
,3
]}

Figueiredo, Mario A. T. ^{[2
,3
]}

机构：

[1] Inst Super Engn Lisboa, Lisbon, Portugal

[2] Inst Super Tecn, Lisbon, Portugal

[3] Inst Telecomunicacoes, Lisbon, Portugal

来源：

PATTERN RECOGNITION LETTERS | 2012年 / 33卷 / 13期

关键词：

Feature selection; Filters; Dispersion measures; Similarity measures; High-dimensional data; FLOATING SEARCH METHODS; GENE SELECTION; STATISTICAL COMPARISONS; LOGISTIC-REGRESSION; BOUND ALGORITHM; SVM-RFE; CLASSIFIERS; INFORMATION; RELEVANCE; CRITERIA;

D O I：

10.1016/j.patrec.2012.05.019

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Feature selection is a central problem in machine learning and pattern recognition. On large datasets (in terms of dimension and/or number of instances), using search-based or wrapper techniques can be cornputationally prohibitive. Moreover, many filter methods based on relevance/redundancy assessment also take a prohibitively long time on high-dimensional datasets. In this paper, we propose efficient unsupervised and supervised feature selection/ranking filters for high-dimensional datasets. These methods use low-complexity relevance and redundancy criteria, applicable to supervised, semi-supervised, and unsupervised learning, being able to act as pre-processors for computationally intensive methods to focus their attention on smaller subsets of promising features. The experimental results, with up to 10(5) features, show the time efficiency of our methods, with lower generalization error than state-of-the-art techniques, while being dramatically simpler and faster. (c) 2012 Elsevier B.V. All rights reserved.

引用

页码：1794 / 1804

页数：11

共 68 条

[1]

[Anonymous], 2007, Prtools4. 1, A Matlab Toolbox for Pattern Recognition

[2]

[Anonymous], P EUR S ART NEUR NET

[3]

[Anonymous], 2010, ADV FEATURE SELECTIO

[4]

[Anonymous], 2008, Introduction to information retrieval

[5]

[Anonymous], 2005, ADV NEURAL INFORM PR

[6]

[Anonymous], 1998, EUR C MACH LEARN

[7]

[Anonymous], THESIS WAIKATO U HAM

[8]

[Anonymous], 1998, Feature Extraction, Construction and Selection: A Data Mining Perspective

[9]

Baldi P., 2002, DNA MICROARRAYS GENE

[10]

Beirlant J, 1997, International Journal of Mathematical and Statistical Sciences, V6, P17

← 1 2 3 4 5 6 7 →