Feature Selection for Ridge Regression with Provable Guarantees

被引:23
作者
Paul, Saurabh [1 ]
Drineas, Petros [2 ]
机构
[1] Global Risk Sci, San Jose, CA 95112 USA
[2] Rensselaer Polytech Inst, Dept Comp Sci, Troy, NY 12180 USA
关键词
Classification (of information) - Regression analysis;
D O I
10.1162/NECO_a_00816
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce single-set spectral sparsification as a deterministic sampling-based feature selection technique for regularized least-squares classification, which is the classification analog to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world data sets; a subset of TechTC-300 data sets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.
引用
收藏
页码:716 / 742
页数:27
相关论文
共 21 条
[1]  
Agarwal D., 2002, Proc. 8th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, P173
[2]  
[Anonymous], 2013, ADV NEURAL INFORM PR
[3]  
[Anonymous], 2013, Advances in neural information processing systems (NIPS)
[4]  
[Anonymous], 1997, ICML
[5]  
Bach F., 2013, PMLR, P185
[6]  
Batson JD, 2009, ACM S THEORY COMPUT, P255
[7]  
Bhattacharyya C, 2004, J MACH LEARN RES, V5, P1417
[8]   Deterministic Feature Selection for k-Means Clustering [J].
Boutsidis, Christos ;
Magdon-Ismail, Malik .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2013, 59 (09) :6099-6110
[9]  
Dasgupta A, 2007, KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P230
[10]  
Davidov D., 2004, Proceedings of Sheffield SIGIR 2004. The Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P250, DOI 10.1145/1008992.1009036