Optimal subsampling for double generalized linear models with heterogeneous massive data

被引:0
|
作者
Xiong, Zhengyu [1 ,2 ]
Jin, Haoyu [1 ,2 ]
Wu, Liucang [1 ,2 ]
Yang, Lanjun [1 ,2 ]
机构
[1] Kunming Univ Sci & Technol, Fac Sci, Kunming, Yunnan, Peoples R China
[2] Kunming Univ Sci & Technol, Ctr Appl Stat, Kunming, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
Heterogeneous massive data; double generalized linear models; optimality criterion; optimal subsampling; asymptotic properties; QUASI-LIKELIHOOD;
D O I
10.1080/03610926.2025.2467199
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
With the development of information technology, massive data under heterogeneous characteristics are generated in the economic, financial, and other fields. Traditional statistical models and existing statistical methods are often inadequate for handling dispersion modeling problems with heterogeneous massive data. In this article, the optimal subsampling of double generalized linear models is studied in heterogeneous massive data environments. Under certain conditions, the optimal subsampling probabilities of the double generalized linear models with heterogeneous data are derived based on the A-optimality criterion and L-optimality criterion, respectively. Furthermore, a two-step algorithm based on uniform sampling is developed, and the asymptotic properties of the subsample estimator from this algorithm are discussed. The results of numerical simulations and a real example show that the algorithm can improve estimation accuracy and decrease computational costs to some extent.
引用
收藏
页数:35
相关论文
共 50 条
  • [31] Bayesian Inference in Common Microeconometric Models With Massive Datasets by Double Marginalized Subsampling
    Qian, Hang
    JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2022, 40 (04) : 1484 - 1497
  • [32] Optimal Distributed Subsampling for Maximum Quasi-Likelihood Estimators With Massive Data
    Yu, Jun
    Wang, HaiYing
    Ai, Mingyao
    Zhang, Huiming
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (537) : 265 - 276
  • [33] A PARTIALLY LINEAR FRAMEWORK FOR MASSIVE HETEROGENEOUS DATA
    Zhao, Tianqi
    Cheng, Guang
    Liu, Han
    ANNALS OF STATISTICS, 2016, 44 (04): : 1400 - 1437
  • [34] Double hierarchical generalized linear models
    Lee, Y
    Nelder, JA
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2006, 55 : 139 - 167
  • [35] On diagnostics in double generalized linear models
    Paula, Gilberto A.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2013, 68 : 44 - 51
  • [36] A Subsampling Strategy for AIC-based Model Averaging with Generalized Linear Models
    Yu, Jun
    Wang, Haiying
    Ai, Mingyao
    TECHNOMETRICS, 2025, 67 (01) : 122 - 132
  • [37] Semiparametric analysis of heterogeneous data using varying-scale generalized linear models
    Xie, Minge
    Simpson, Douglas G.
    Carroll, Raymond J.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2008, 103 (482) : 650 - 660
  • [38] Optimal Crossover Designs for Generalized Linear Models
    Jankar, Jeevan
    Mandal, Abhyuday
    Yang, Jie
    JOURNAL OF STATISTICAL THEORY AND PRACTICE, 2020, 14 (02)
  • [39] Optimal Crossover Designs for Generalized Linear Models
    Jeevan Jankar
    Abhyuday Mandal
    Jie Yang
    Journal of Statistical Theory and Practice, 2020, 14
  • [40] Double hierarchical generalized linear models - Discussion
    MacKenzie, G
    Firth, D
    Rigby, RA
    Stasinopoulos, DM
    Payne, R
    Senn, S
    Browne, WJ
    Goldstein, H
    del Castillo, J
    Feddag, M
    Ha, ID
    Kim, D
    Oh, HS
    Lawson, AB
    Piegorsch, WW
    Molenberghs, G
    Verbeke, G
    Yau, KKW
    Yu, KM
    Mamon, R
    Zhang, ZZ
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2006, 55 : 167 - 185