Least Squares Model Averaging for Distributed Data

被引:0
|
作者
Zhang, Haili [1 ]
Liu, Zhaobo [2 ]
Zou, Guohua [3 ]
机构
[1] Shenzhen Polytech Univ, Inst Appl Math, Shenzhen 518055, Peoples R China
[2] Shenzhen Univ, Inst Adv Study, Shenzhen 518060, Peoples R China
[3] Capital Normal Univ, Sch Math Sci, Beijing 100048, Peoples R China
基金
中国国家自然科学基金;
关键词
consistency; distributed data; divide and conquer algorithm; Mallows' criterion; model averaging; optimality; FOCUSED INFORMATION CRITERION; BIG DATA; REGRESSION; SELECTION; INFERENCE;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Divide and conquer algorithm is a common strategy applied in big data. Model averaging has the natural divide-and-conquer feature, but its theory has not been developed in big data scenarios. The goal of this paper is to fill this gap. We propose two divide-and conquer-type model averaging estimators for linear models with distributed data. Under some regularity conditions, we show that the weights from Mallows model averaging criterion converge in L-2 to the theoretically optimal weights minimizing the risk of the model averaging estimator. We also give the bounds of the in-sample and out-of-sample mean squared errors and prove the asymptotic optimality for the proposed model averaging estimators. Our conclusions hold even when the dimensions and the number of candidate models are divergent. Simulation results and a real airline data analysis illustrate that the proposed model averaging methods perform better than the commonly used model selection and model averaging methods in distributed data cases. Our approaches contribute to model averaging theory in distributed data and parallel computations, and can be applied in big data analysis to save time and reduce the computational burden.
引用
收藏
页数:59
相关论文
共 50 条
  • [31] Model averaging with high-dimensional dependent data
    Zhao, Shangwei
    Zhou, Jianhong
    Li, Hongjun
    ECONOMICS LETTERS, 2016, 148 : 68 - 71
  • [32] Partial least squares for dependent data
    Singer, Marco
    Krivobokova, Tatyana
    Munk, Axel
    De Groot, Bert
    BIOMETRIKA, 2016, 103 (02) : 351 - 362
  • [33] A Joint Least Squares and Least Absolute Deviation Model
    Duan, Junbo
    Idier, Jerome
    Wang, Yu-Ping
    Wan, Mingxi
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (04) : 543 - 547
  • [34] Averaging and Stacking Partial Least Squares Regression Models to Predict the Chemical Compositions and the Nutritive Values of Forages from Spectral Near Infrared Data
    Lesnoff, Mathieu
    Andueza, Donato
    Barotin, Charlene
    Barre, Philippe
    Bonnal, Laurent
    Pierna, Juan Antonio Fernandez
    Picard, Fabienne
    Vermeulen, Philippe
    Roger, Jean-Michel
    APPLIED SCIENCES-BASEL, 2022, 12 (15):
  • [35] A constrained least squares regression model
    Yuan, Haoliang
    Zheng, Junjie
    Lai, Loi Lei
    Tang, Yuan Yan
    INFORMATION SCIENCES, 2018, 429 : 247 - 259
  • [36] Ranking Model Averaging: Ranking Based on Model Averaging
    Feng, Ziheng
    He, Baihua
    Xie, Tianfa
    Zhang, Xinyu
    Zong, Xianpeng
    INFORMS JOURNAL ON COMPUTING, 2024,
  • [37] Model Averaging for Prediction With Fragmentary Data
    Fang, Fang
    Lan, Wei
    Tong, Jingjing
    Shao, Jun
    JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2019, 37 (03) : 517 - 527
  • [38] Model selection and model averaging for semiparametric partially linear models with missing data
    Zeng, Jie
    Cheng, Weihu
    Hu, Guozhi
    Rong, Yaohua
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2019, 48 (02) : 381 - 395
  • [39] Model averaging for generalized linear models in fragmentary data prediction
    Yuan, Chaoxia
    Wu, Yang
    Fang, Fang
    STATISTICAL THEORY AND RELATED FIELDS, 2022, 6 (04) : 344 - 352
  • [40] Renewable prediction of model averaging in the Cox proportional hazards model with streaming data
    Li, Mengyu
    Wang, Xiaoguang
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2025,