On the selection of optimal subdata for big data regression based on leverage scores

被引:0
|
作者
Chasiotis, Vasilis [1 ]
Karlis, Dimitris [1 ]
机构
[1] Athens Univ Econ & Business, Dept Stat, Athens, Greece
关键词
D-optimal designs; Design of experiments; Subdata; Linear regression; Information matrix;
D O I
10.1007/s42519-024-00420-4
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The demand of computational resources for the modeling process increases as the scale of the datasets does, since traditional approaches for regression involve inverting huge data matrices. The main problem relies on the large data size, and so a standard approach is subsampling that aims at obtaining the most informative portion of the big data. In the current paper, we explore an existing approach based on leverage scores, proposed for subdata selection in linear model discrimination. Our objective is to propose the aforementioned approach for selecting the most informative data points to estimate unknown parameters in both the first-order linear model and a model with interactions. We conclude that the approach based on leverage scores improves existing approaches, providing simulation experiments as well as a real data application.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Distributed Fuzzy Rough Prototype Selection for Big Data Regression
    Vluymans, Sarah
    Asfoor, Hasan
    Saeys, Yvan
    Cornelis, Chris
    Tolentino, Matthew
    Teredesai, Ankur
    De Cock, Martine
    2015 ANNUAL MEETING OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY DIGIPEN NAFIPS 2015, 2015,
  • [22] The Optimal Inverter DC/AC Value Selection Method Based on Big Data Technology
    Wang, Ying
    Wang, Yu
    Tan, Yongling
    Zhong, Zhuojun
    Zhang, Mingli
    Zhang, Yixuan
    COMPANION OF THE 2020 IEEE 20TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY, AND SECURITY (QRS-C 2020), 2020, : 358 - 363
  • [23] Optimal selection and verification of plant species for desertification control in Tibet based on big data
    Liu P.
    Wang X.
    Song C.
    Zhang C.
    Ao B.
    Lyu T.
    Zhang L.
    Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering, 2020, 36 (10): : 166 - 173
  • [24] Big Data Analytic Framework for Organizational Leverage
    Mathrani, Sanjay
    Lai, Xusheng
    APPLIED SCIENCES-BASEL, 2021, 11 (05): : 1 - 19
  • [25] Fast Quantum Algorithms for Least Squares Regression and Statistic Leverage Scores
    Liu, Yang
    Zhang, Shengyu
    FRONTIERS IN ALGORITHMICS (FAW 2015), 2015, 9130 : 204 - 216
  • [26] Optimal Subsampling for Functional Quasi-Mode Regression with Big Data
    Wang, Tao
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2024,
  • [27] Fast quantum algorithms for least squares regression and statistic leverage scores
    Liu, Yang
    Zhang, Shengyu
    THEORETICAL COMPUTER SCIENCE, 2017, 657 : 38 - 47
  • [28] Optimal subsample selection for massive logistic regression with distributed data
    Zuo, Lulu
    Zhang, Haixiang
    Wang, HaiYing
    Sun, Liuquan
    COMPUTATIONAL STATISTICS, 2021, 36 (04) : 2535 - 2562
  • [29] Optimal subsample selection for massive logistic regression with distributed data
    Lulu Zuo
    Haixiang Zhang
    HaiYing Wang
    Liuquan Sun
    Computational Statistics, 2021, 36 : 2535 - 2562
  • [30] A MapReduce-Based ELM for Regression in Big Data
    Wu, B.
    Yan, T. H.
    Xu, X. S.
    He, B.
    Li, W. H.
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2016, 2016, 9937 : 164 - 173