RFN: A Random-Feature Based Newton Method for Empirical Risk Minimization in Reproducing Kernel Hilbert Spaces

被引:2
作者
Chang, Ting-Jui [1 ]
Shahrampour, Shahin [1 ]
机构
[1] Northeastern Univ, Dept Mech & Ind Engn, Boston, MA 02115 USA
关键词
Newton method; optimization algorithms; risk minimization; Hessian approximation; random features; OPTIMIZATION METHODS; CONVERGENCE;
D O I
10.1109/TSP.2022.3219993
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In supervised learning using kernel methods, we often encounter a large-scale finite-sum minimization over a reproducing kernel Hilbert space (RKHS). Large-scale finite-sum problems can be solved using efficient variants of Newton method, where the Hessian is approximated via sub-samples of data. In RKHS, however, the dependence of the penalty function to kernel makes standard sub-sampling approaches inapplicable, since the gram matrix is not readily available in a low-rank form. In this paper, we observe that for this class of problems, one can naturally use kernel approximation to speed up the Newton method. Focusing on randomized features for kernel approximation, we provide a novel second-order algorithm that enjoys local superlinear convergence and global linear convergence (with high probability). We derive the theoretical lower bound for the number of random features required for the approximated Hessian to be close to the true Hessian in the norm sense. Our numerical experiments on real-world data verify the efficiency of our method compared to several benchmarks.
引用
收藏
页码:5308 / 5319
页数:12
相关论文
共 46 条
[31]  
Pérez-Cruz F, 2004, IEEE SIGNAL PROC MAG, V21, P57, DOI 10.1109/MSP.2004.1296543
[32]   NEWTON SKETCH: A NEAR LINEAR-TIME OPTIMIZATION ALGORITHM WITH LINEAR-QUADRATIC CONVERGENCE [J].
Pilanci, Mert ;
Wainwright, Martin J. .
SIAM JOURNAL ON OPTIMIZATION, 2017, 27 (01) :205-245
[33]  
Rahimi A., 2008, ADV NEURAL INFORM PR, V20, P1177
[34]   Sub-sampled Newton methods [J].
Roosta-Khorasani, Farbod ;
Mahoney, Michael W. .
MATHEMATICAL PROGRAMMING, 2019, 174 (1-2) :293-326
[35]  
Schraudolph N.N., 2007, P 11 INT C ART INT S, P436
[36]   CONDITIONING OF QUASI-NEWTON METHODS FOR FUNCTION MINIMIZATION [J].
SHANNO, DF .
MATHEMATICS OF COMPUTATION, 1970, 24 (111) :647-&
[37]  
Shawe-Taylor J., 2004, Kernel methods for pattern analysis
[38]  
Sohl-Dickstein J, 2014, PR MACH LEARN RES, V32, P604
[39]   An Introduction to Matrix Concentration Inequalities [J].
Tropp, Joel A. .
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2015, 8 (1-2) :2-+
[40]   Computational Methods for Sparse Solution of Linear Inverse Problems [J].
Tropp, Joel A. ;
Wright, Stephen J. .
PROCEEDINGS OF THE IEEE, 2010, 98 (06) :948-958