Iterative Double Sketching for Faster Least-Squares Optimization

被引:0
作者
Wang, Rui [1 ,2 ]
Ouyang, Yanyan [1 ,2 ]
Xu, Wangli [1 ,2 ]
机构
[1] Renmin Univ China, Ctr Appl Stat, Beijing 100872, Peoples R China
[2] Renmin Univ China, Sch Stat, Beijing 100872, Peoples R China
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162 | 2022年
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
ALGORITHMS; REDUCTION; MATRICES;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work is concerned with the overdetermined linear least-squares problem for large scale data. We generalize the iterative Hessian sketching (IHS) algorithm and propose a new sketching framework named iterative double sketching (IDS) which uses approximations for both the gradient and the Hessian in each iteration. To understand the behavior of the IDS algorithm and choose the optimal hyperparameters, we derive the exact limit of the conditional prediction error of the IDS algorithm in the setting of Gaussian sketching. Guided by this theoretical result, we propose an efficient IDS algorithm via a new class of sequentially related sketching matrices. We give a non-asymptotic analysis of this efficient IDS algorithm which shows that the proposed algorithm achieves the state-of-the-art trade-off between accuracy and efficiency.
引用
收藏
页数:29
相关论文
共 37 条
[31]   A fast randomized algorithm for overdetermined linear least-squares regression [J].
Rokhlin, Vladimir ;
Tygert, Mark .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2008, 105 (36) :13212-13217
[32]  
Sarlós T, 2006, ANN IEEE SYMP FOUND, P143
[33]  
Thorup M., 2004, SODA, V4, P615
[34]   IMPROVED ANALYSIS OF THE SUBSAMPLED RANDOMIZED HADAMARD TRANSFORM [J].
Tropp, Joel A. .
ADVANCES IN DATA SCIENCE AND ADAPTIVE ANALYSIS, 2011, 3 (1-2) :115-126
[35]  
Wainwright MJ, 2019, CA ST PR MA, P1, DOI 10.1017/9781108627771
[36]  
Wang D, 2018, AAAI CONF ARTIF INTE, P1439
[37]   Sketching as a Tool for Numerical Linear Algebra [J].
Woodruff, David P. .
FOUNDATIONS AND TRENDS IN THEORETICAL COMPUTER SCIENCE, 2014, 10 (1-2) :1-157