Gradient Boosted Feature Selection

被引:100
作者
Xu, Zhixiang [1 ]
Huang, Gao [2 ]
Weinberger, Kilian Q. [1 ]
Zheng, Alice X. [3 ]
机构
[1] Washington Univ, One Brookings Dr, St Louis, MO 63110 USA
[2] Tsinghua Univ, Beijing, Peoples R China
[3] GraphLab, Seattle, WA USA
来源
PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14) | 2014年
基金
美国国家科学基金会;
关键词
Feature selection; Large-scale; Gradient boosting; GENE-EXPRESSION;
D O I
10.1145/2623330.2623635
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A feature selection algorithm should ideally satisfy four conditions: reliably extract relevant features; be able to identify non-linear feature interactions; scale linearly with the number of features and dimensions; allow the incorporation of known sparsity structure. In this work we propose a novel feature selection algorithm, Gradient Boosted Feature Selection (GBFS), which satisfies all four of these requirements. The algorithm is flexible, scalable, and surprisingly straight-forward to implement as it is based on a modification of Gradient Boosted Trees. We evaluate GBFS on several real world data sets and show that it matches or outperforms other state of the art feature selection algorithms. Yet it scales to larger data set sizes and naturally allows for domain-specific side information.
引用
收藏
页码:522 / 531
页数:10
相关论文
共 34 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]  
[Anonymous], 2013, P 30 INT C INT C MAC
[3]  
[Anonymous], 2011, P 20 INT C WORLD WID, DOI DOI 10.1145/1963405.1963461
[4]  
[Anonymous], 2010, ASU FEATURE SELECTIO
[5]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[6]   Boosted multi-task learning [J].
Chapelle, Olivier ;
Shivaswamy, Pannagadatta ;
Vadrevu, Srinivas ;
Weinberger, Kilian ;
Zhang, Ya ;
Tseng, Belle .
MACHINE LEARNING, 2011, 85 (1-2) :149-173
[7]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[8]  
Duchi J., 2008, P 25 INT C MACH LEAR, P272, DOI DOI 10.1145/1390156.1390191
[9]   An introduction to anatomical ROI-based fMRI classification analysis [J].
Etzel, Joset A. ;
Gazzola, Valeria ;
Keysers, Christian .
BRAIN RESEARCH, 2009, 1282 :114-125
[10]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232