A Two-Stage Approach for Learning a Sparse Model with Sharp Excess Risk Analysis

被引:0
作者
Li, Zhe [1 ]
Yang, Tianbao [1 ]
Zhang, Lijun [2 ]
Jin, Rong [3 ]
机构
[1] Univ Iowa, Dept Comp Sci, Iowa City, IA 52242 USA
[2] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China
[3] Alibaba Grp, Seattle, WA 98101 USA
来源
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2017年
基金
美国国家科学基金会;
关键词
SELECTION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper aims to provide a sharp excess risk guarantee for learning a sparse linear model without any assumptions about the strong convexity of the expected loss and the sparsity of the optimal solution in hindsight. Given a target level for the excess risk, an interesting question to ask is how many examples and how large the support set of the solution are enough for learning a good model with the target excess risk. To answer these questions, we present a two-stage algorithm that (i) in the first stage an epoch based stochastic optimization algorithm is exploited with an established O(1/epsilon) bound on the sample complexity; and (ii) in the second stage a distribution dependent randomized sparsification is presented with an O(1/epsilon) bound on the sparsity (referred to as support complexity) of the resulting model. Compared to previous works, our contributions lie at (i) we reduce the order of the sample complexity from O(1/epsilon(2)) to O(1/epsilon) without the strong convexity assumption; and (ii) we reduce the constant in O(1/epsilon) for the sparsity by exploring the distribution dependent sampling.
引用
收藏
页码:2224 / 2230
页数:7
相关论文
共 17 条
[1]  
Agarwal A., 2014, 48 ANN C INF SCI SYS, P1
[2]  
[Anonymous], 2013, ADV NEURAL INFORM PR
[3]  
[Anonymous], 1999, Athena scientific Belmont
[4]  
[Anonymous], 2011, P 24 ANN C LEARN THE
[5]  
[Anonymous], 2008, INT C MACH LEARN
[6]  
[Anonymous], 2010, Advances in Neural Information Processing Systems (NeurIPS)
[7]  
Candès EJ, 2008, IEEE SIGNAL PROC MAG, V25, P21, DOI 10.1109/MSP.2007.914731
[8]   Compressed sensing [J].
Donoho, DL .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2006, 52 (04) :1289-1306
[9]   ERROR BOUND AND REDUCED-GRADIENT PROJECTION ALGORITHMS FOR CONVEX MINIMIZATION OVER A POLYHEDRAL SET [J].
Luo, Zhi-Quan ;
Tseng, Paul .
SIAM JOURNAL ON OPTIMIZATION, 1993, 3 (01) :43-59
[10]   ROBUST STOCHASTIC APPROXIMATION APPROACH TO STOCHASTIC PROGRAMMING [J].
Nemirovski, A. ;
Juditsky, A. ;
Lan, G. ;
Shapiro, A. .
SIAM JOURNAL ON OPTIMIZATION, 2009, 19 (04) :1574-1609