Core-elements for large-scale least squares estimation

被引:0
|
作者
Li, Mengyu [1 ]
Yu, Jun [2 ]
Li, Tao [1 ]
Meng, Cheng [1 ]
机构
[1] Renmin Univ China, Inst Stat & Big Data, Ctr Appl Stat, Beijing 100872, Peoples R China
[2] Beijing Inst Technol, Sch Math & Stat, Beijing 100081, Peoples R China
基金
中国国家自然科学基金;
关键词
Coresets; Linear model; Sparse matrix; Subset selection; NYSTROM APPROXIMATION; K-MEANS; MATRIX; COMPUTATION;
D O I
10.1007/s11222-024-10505-6
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The coresets approach, also called subsampling or subset selection, aims to select a subsample as a surrogate for the observed sample and has found extensive application in large-scale data analysis. Existing coresets methods construct the subsample using a subset of rows from the predictor matrix. Such methods can be significantly inefficient when the predictor matrix is sparse or numerically sparse. To overcome this limitation, we develop a novel element-wise subset selection approach, called core-elements, for large-scale least squares estimation. We provide a deterministic algorithm to construct the core-elements estimator, only requiring an O(nnz(X)+rp2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(\textrm{nnz}(X)+rp<^>2)$$\end{document} computational cost, where X is an nxp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\times p$$\end{document} predictor matrix, r is the number of elements selected from each column of X, and nnz(<middle dot>)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{nnz}(\cdot )$$\end{document} denotes the number of non-zero elements. Theoretically, we show that the proposed estimator is unbiased and approximately minimizes an upper bound of the estimation variance. We also provide an approximation guarantee by deriving a coresets-like finite sample bound for the proposed estimator. To handle potential outliers in the data, we further combine core-elements with the median-of-means procedure, resulting in an efficient and robust estimator with theoretical consistency guarantees. Numerical studies on various synthetic and real-world datasets demonstrate the proposed method's superior performance compared to mainstream competitors.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Large-Scale Least Squares Twin SVMs
    Tanveer, M.
    Sharma, S.
    Muhammad, K.
    ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2021, 21 (02)
  • [2] Distributed Weighted Least Squares Estimation with Fast Convergence in Large-scale Systems
    Marelli, Damin
    Fu, Minyue
    2013 IEEE 52ND ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2013, : 5432 - 5437
  • [3] Least squares estimation of spatial autoregressive models for large-scale social networks
    Huang, Danyang
    Lan, Wei
    Zhang, Hao Helen
    Wang, Hansheng
    ELECTRONIC JOURNAL OF STATISTICS, 2019, 13 (01): : 1135 - 1165
  • [4] The canonical least squares estimation of large-scale simultaneous-equations models
    Kang, Heejoon
    ECONOMIC MODELLING, 2008, 25 (02) : 191 - 200
  • [5] ON LARGE-SCALE NONLINEAR LEAST-SQUARES CALCULATIONS
    TOINT, PL
    SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING, 1987, 8 (03): : 416 - 435
  • [6] LARGE-SCALE DUAL REGULARIZED TOTAL LEAST SQUARES
    Lampe, Joerg
    Voss, Heinrich
    ELECTRONIC TRANSACTIONS ON NUMERICAL ANALYSIS, 2014, 42 : 13 - 40
  • [7] Large-scale Tikhonov regularization of total least squares
    Lampe, Joerg
    Voss, Heinrich
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2012, 238 : 95 - 108
  • [8] Model reduction of large-scale systems by least squares
    Gugercin, Serkan
    Antoulas, Athanasios C.
    LINEAR ALGEBRA AND ITS APPLICATIONS, 2006, 415 (2-3) : 290 - 321
  • [9] On solving large-scale weighted least squares problems
    Baryamureeba, V
    NUMERICAL ANALYSIS AND ITS APPLICATIONS, 2001, 1988 : 59 - 67
  • [10] Distributed weighted least-squares estimation with fast convergence for large-scale systems
    Marelli, Damian Edgard
    Fu, Minyue
    AUTOMATICA, 2015, 51 : 27 - 39