Groups-Keeping Solution Path Algorithm for Sparse Regression with Automatic Feature Grouping

被引：15

作者：

Gu, Bin ^{[1
]}

Liu, Guodong ^{[1
]}

Huang, Heng ^{[1
]}

机构：

[1] Univ Texas Arlington, Comp Sci & Engn, Arlington, TX 76019 USA

来源：

KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING | 2017年

关键词：

Solution path; OSCAR; automatic feature grouping; feature selection; sparse regression; VARIABLE SELECTION; REGULARIZATION; SHRINKAGE;

D O I：

10.1145/3097983.3098010

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Feature selection is one of the most important data mining research topics with many applications. In practical problems, features often have group structure to effect the outcomes. Thus, it is crucial to automatically identify homogenous groups of features for high-dimensional data analysis. Octagonal shrinkage and clustering algorithm for regression (OSCAR) is an important sparse regression approach with automatic feature grouping and selection by l(1) norm and pairwise l(infinity) norm. However, due to over-complex representation of the penalty (especially the pairwise l(infinity) norm), so far OSCAR has no solution path algorithm which is mostly useful for tuning the model. To address this challenge, in this paper, we propose a groups-keeping solution path algorithm to solve the OSCAR model (OscarGKPath). Given a set of homogenous groups of features and an accuracy bound epsilon, OscarGKPath can fit the solutions in an interval of regularization parameters while keeping the feature groups. The entire solution path can be obtained by combining multiple such intervals. We prove that all solutions in the solution path produced by OscarGKPath can strictly satisfy the given accuracy bound epsilon. The experimental results on benchmark datasets not only confirm the effectiveness of our OscarGKPath algorithm, but also show the superiority of our OscarGKPath in cross validation compared with the existing batch algorithm.

引用

页码：185 / 193

页数：9

共 35 条

[1]

Adams H, 2017, J MACH LEARN RES, V18

[2] Variable selection in regression-a tutorial [J].

Andersen, C. M. ;

Bro, R. .

JOURNAL OF CHEMOMETRICS, 2010, 24 (11-12) :728-737

[3]

[Anonymous], 2011, SOLUTION PATH GEN LA

[4]

Aybat NS, 2015, PR MACH LEARN RES, V37, P2454

[5]

Bach Francis., 2011, Optimization for Machine Learning, V5

[6]

Bache K., 2013, UCI Machine Learning Repository

[7] Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR [J].

Bondell, Howard D. ;

Reich, Brian J. .

BIOMETRICS, 2008, 64 (01) :115-123

[8]

Boyd S, 2004, CONVEX OPTIMIZATION

[9]

Cai XT, 2013, INT C COMP SUPP COOP, P23, DOI 10.1109/CSCWD.2013.6580934

[10]

Gu B., 2016, IEEE Transactions on Neural Networks and Learning Systems, DOI DOI 10.1109/TNNLS.2016.2527796

← 1 2 3 4 →