Supervised feature subset selection with ordinal optimization

被引:10
作者
Feng, Dingcheng [1 ]
Chen, Feng [1 ]
Xu, Wenli [1 ]
机构
[1] Tsinghua Univ, Dept Automat, Tsinghua Natl Lab Informat Sci & Technol TNList, Beijing 100084, Peoples R China
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
Supervised feature selection; Uniform sampling; Ordinal optimization; Ordered performance curve; Feature scoring; Order-good-enough; Value-good enough; ALGORITHMS; SEARCH;
D O I
10.1016/j.knosys.2013.11.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The ultimate goal of supervised feature selection is to select a feature subset such that the prediction accuracy is maximized. Most of the existing methods, such as filter and embedded models, can be viewed as using approximate objectives which are different from the prediction accuracy. The wrapper models maximize the prediction accuracy directly, but the optimization has very high computational complexity. To address the limitations, we present an ordinal optimization perspective for feature selection (OOFS). Feature subset evaluation is formulated as a system simulation process with randomness. Supervised feature selection becomes maximizing the expected performance of the system, where ordinal optimization can be applied to identify a set of order-good-enough solutions with much reduced complexity and parallel computing. These solutions correspond to the really good enough (value-good-enough) solutions when the solution space structure, characterized by ordered performance curve (OPC), exhibits concave shapes. We analyze that this happens in some important applications such as image classification, where a large number features have relatively similar abilities in discrimination. We further improve the OOFS method with a feature scoring algorithm, called OOFSs. We prove that, when the performance difference of solutions increases monotonically with respect to the solution difference, the expectation of the scores reflects useful information for estimating the globally optimal solution. Experimental results in sixteen real-world datasets show that our method provides a good trade-off between prediction accuracy and computational complexity. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:123 / 140
页数:18
相关论文
共 41 条
[1]  
[Anonymous], 2006, Elements of Information Theory
[2]  
[Anonymous], SPECTRAL FEATURE SEL
[3]  
[Anonymous], ASIAN J CONTROL
[4]  
[Anonymous], P 24 AAAI C ART INT
[5]  
[Anonymous], 1996, OPTIMAL FEATURE SELE
[6]  
[Anonymous], P 29 INT C MACH LEAR
[7]  
[Anonymous], ARXIV12063268
[8]  
[Anonymous], IEEE T PATTERN ANAL
[9]  
[Anonymous], 2005, ELEMENTS STAT LEARNI
[10]  
[Anonymous], P 23 NAT C ART INT