Core-elements for large-scale least squares estimation

被引:0
|
作者
Li, Mengyu [1 ]
Yu, Jun [2 ]
Li, Tao [1 ]
Meng, Cheng [1 ]
机构
[1] Renmin Univ China, Inst Stat & Big Data, Ctr Appl Stat, Beijing 100872, Peoples R China
[2] Beijing Inst Technol, Sch Math & Stat, Beijing 100081, Peoples R China
基金
中国国家自然科学基金;
关键词
Coresets; Linear model; Sparse matrix; Subset selection; NYSTROM APPROXIMATION; K-MEANS; MATRIX; COMPUTATION;
D O I
10.1007/s11222-024-10505-6
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The coresets approach, also called subsampling or subset selection, aims to select a subsample as a surrogate for the observed sample and has found extensive application in large-scale data analysis. Existing coresets methods construct the subsample using a subset of rows from the predictor matrix. Such methods can be significantly inefficient when the predictor matrix is sparse or numerically sparse. To overcome this limitation, we develop a novel element-wise subset selection approach, called core-elements, for large-scale least squares estimation. We provide a deterministic algorithm to construct the core-elements estimator, only requiring an O(nnz(X)+rp2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(\textrm{nnz}(X)+rp<^>2)$$\end{document} computational cost, where X is an nxp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\times p$$\end{document} predictor matrix, r is the number of elements selected from each column of X, and nnz(<middle dot>)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{nnz}(\cdot )$$\end{document} denotes the number of non-zero elements. Theoretically, we show that the proposed estimator is unbiased and approximately minimizes an upper bound of the estimation variance. We also provide an approximation guarantee by deriving a coresets-like finite sample bound for the proposed estimator. To handle potential outliers in the data, we further combine core-elements with the median-of-means procedure, resulting in an efficient and robust estimator with theoretical consistency guarantees. Numerical studies on various synthetic and real-world datasets demonstrate the proposed method's superior performance compared to mainstream competitors.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Quasi-Newton algorithms for large-scale nonlinear least-squares
    Al-Baali, M
    HIGH PERFORMANCE ALGORITHMS AND SOFTWARE FOR NONLINEAR OPTIMIZATION, 2003, 82 : 1 - 21
  • [32] SOLVING LARGE-SCALE LEAST SQUARES SEMIDEFINITE PROGRAMMING BY ALTERNATING DIRECTION METHODS
    He, Bingsheng
    Xu, Minghua
    Yuan, Xiaoming
    SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2011, 32 (01) : 136 - 152
  • [33] Distributed Least-Squares Iterative Methods in Large-Scale Networks:A Survey
    SHI Lei
    ZHAO Liang
    SONG Wenzhan
    Goutham Kamath
    WU Yuan
    LIU Xuefeng
    ZTECommunications, 2017, 15 (03) : 37 - 45
  • [34] Approximating sparse Hessian matrices using large-scale linear least squares
    Fowkes, Jaroslav M.
    Gould, Nicholas I. M.
    Scott, Jennifer A.
    NUMERICAL ALGORITHMS, 2024, 96 (04) : 1675 - 1698
  • [35] LARGE-SCALE GEODETIC LEAST-SQUARES ADJUSTMENT BY DISSECTION AND ORTHOGONAL DECOMPOSITION
    GOLUB, GH
    PLEMMONS, RJ
    LINEAR ALGEBRA AND ITS APPLICATIONS, 1980, 34 (DEC) : 3 - 28
  • [36] Gradient Projection Iterative Sketch for Large-Scale Constrained Least-Squares
    Tang, Junqi
    Golbabaee, Mohammad
    Davies, Mike E.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [37] Fast Nonlinear Least Squares Optimization of Large-Scale Semi-Sparse Problems
    Fratarcangeli, M.
    Bradley, D.
    Gruber, A.
    Zoss, G.
    Beeler, T.
    COMPUTER GRAPHICS FORUM, 2020, 39 (02) : 247 - 259
  • [38] Model reduction for large-scale dynamical systems via equality constrained least squares
    An, Yu'e
    Gu, Chuanqing
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2010, 234 (08) : 2420 - 2431
  • [39] SOLUTION OF LARGE-SCALE SPARSE LEAST-SQUARES PROBLEMS USING AUXILIARY STORAGE
    GEORGE, JA
    HEATH, MT
    PLEMMONS, RJ
    SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING, 1981, 2 (04): : 416 - 429
  • [40] VARIANCE REDUCTION IN STOCHASTIC METHODS FOR LARGE-SCALE REGULARIZED LEAST-SQUARES PROBLEMS
    Pilavci, Yusuf Yigit
    Amblard, Pierre-Olivier
    Barthelme, Simon
    Tremblay, Nicolas
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1771 - 1775