Feature Selection Based on Structured Sparsity: A Comprehensive Study

被引:292
作者
Gui, Jie [1 ]
Sun, Zhenan [2 ]
Ji, Shuiwang [3 ]
Tao, Dacheng [4 ]
Tan, Tieniu [2 ]
机构
[1] Chinese Acad Sci, Inst Intelligent Machines, Hefei 230031, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Ctr Res Intelligent Percept & Comp, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
[3] Washington State Univ, Sch Elect Engn & Comp Sci, Pullman, WA 99164 USA
[4] Univ Technol Sydney, Fac Engn & Informat Technol, Ctr Quantum Computat & Intelligent Syst, Ultimo, NSW 2007, Australia
基金
中国国家自然科学基金; 澳大利亚研究理事会; 美国国家科学基金会;
关键词
Dimensionality reduction; feature selection; sparse; structured sparsity; SUPPORT VECTOR MACHINES; REGULARIZED FEATURE-SELECTION; ROBUST FEATURE-EXTRACTION; DIMENSIONALITY REDUCTION; MULTITASK REGRESSION; VARIABLE SELECTION; IMAGING GENETICS; FRAMEWORK; CLASSIFICATION; EFFICIENT;
D O I
10.1109/TNNLS.2016.2551724
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection (FS) is an important component of many pattern recognition tasks. In these tasks, one is often confronted with very high-dimensional data. FS algorithms are designed to identify the relevant feature subset from the original features, which can facilitate subsequent analysis, such as clustering and classification. Structured sparsity-inducing feature selection (SSFS) methods have been widely studied in the last few years, and a number of algorithms have been proposed. However, there is no comprehensive study concerning the connections between different SSFS methods, and how they have evolved. In this paper, we attempt to provide a survey on various SSFS methods, including their motivations and mathematical representations. We then explore the relationship among different formulations and propose a taxonomy to elucidate their evolution. We group the existing SSFS methods into two categories, i.e., vector-based feature selection (feature selection based on lasso) and matrix-based feature selection (feature selection based on l(r, p)-norm). Furthermore, FS has been combined with other machine learning algorithms for specific applications, such as multitask learning, multilabel learning, multiview learning, classification, and clustering. This paper not only compares the differences and commonalities of these methods based on regression and regularization strategies, but also provides useful guidelines to practitioners working in related fields to guide them how to do feature selection.
引用
收藏
页码:1490 / 1507
页数:18
相关论文
共 169 条
[1]  
Alelyani S, 2014, CH CRC DATA MIN KNOW, P29
[2]  
[Anonymous], 2011, IJCAI INT JOINT C AR
[3]  
[Anonymous], 2007, Multi-Task Feature Learning, DOI DOI 10.7551/MITPRESS/7503.003.0010
[4]  
[Anonymous], 1997, AM MATH SOC, DOI DOI 10.1090/CBMS/092
[5]  
[Anonymous], IEEE T NEUR IN PRESS
[6]  
[Anonymous], 743 U CAL DEP STAT
[7]  
[Anonymous], CUCS005960 U COL DEP
[8]  
[Anonymous], 2006, Journal of the Royal Statistical Society, Series B
[9]  
[Anonymous], 2002, P 19 INT C MACH LEAR
[10]  
[Anonymous], 2014, Advances in Neural Information Processing Systems