Doubly Distributed Supervised Learning and Inference with High-Dimensional Correlated Outcomes

被引:0
作者
Hector, Emily C. [1 ]
Song, Peter X-K [2 ]
机构
[1] North Carolina State Univ, Dept Stat, Raleigh, NC 27695 USA
[2] Univ Michigan, Dept Biostat, Ann Arbor, MI 48104 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
Divide-and-conquer; Generalized method of moments; Estimating functions; Parallel computing; Scalable computing; LIKELIHOOD ESTIMATION; QUASI-LIKELIHOOD; REGRESSION; STATISTICS; BINARY; MODELS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a unified framework for supervised learning and inference procedures using the divide-and-conquer approach for high-dimensional correlated outcomes. We propose a general class of estimators that can be implemented in a fully distributed and parallelized computational scheme. Modeling, computational and theoretical challenges related to high-dimensional correlated outcomes are overcome by dividing data at both outcome and subject levels, estimating the parameter of interest from blocks of data using a broad class of supervised learning procedures, and combining block estimators in a closed-form meta-estimator asymptotically equivalent to estimates obtained by Hansen (1982)'s generalized method of moments (GMM) that does not require the entire data to be reloaded on a common server. We provide rigorous theoretical justifications for the use of distributed estimators with correlated outcomes by studying the asymptotic behaviour of the combined estimator with fixed and diverging number of data divisions. Simulations illustrate the finite sample performance of the proposed method, and we provide an R package for ease of implementation.
引用
收藏
页数:35
相关论文
共 50 条
  • [41] An efficient stochastic search for Bayesian variable selection with high-dimensional correlated predictors
    Kwon, Deukwoo
    Landi, Maria Teresa
    Vannucci, Marina
    Issaq, Haleem J.
    Prieto, DaRue
    Pfeiffer, Ruth M.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2011, 55 (10) : 2807 - 2818
  • [42] Distributed debiased estimation of high-dimensional partially linear models with jumps
    Zhao, Yan-Yong
    Zhang, Yuchun
    Liu, Yuan
    Ismail, Noriszura
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2024, 191
  • [43] A Robust Supervised Variable Selection for Noisy High-Dimensional Data
    Kalina, Jan
    Schlenker, Anna
    BIOMED RESEARCH INTERNATIONAL, 2015, 2015
  • [44] Supervised Bayesian latent class models for high-dimensional data
    Desantis, Stacia M.
    Houseman, E. Andres
    Coull, Brent A.
    Nutt, Catherine L.
    Betensky, Rebecca A.
    STATISTICS IN MEDICINE, 2012, 31 (13) : 1342 - 1360
  • [45] Modeling association between multivariate correlated outcomes and high-dimensional sparse covariates: the adaptive SVS method
    Pecanka, J.
    van der Vaart, A. W.
    Jonker, M. A.
    JOURNAL OF APPLIED STATISTICS, 2019, 46 (05) : 893 - 913
  • [46] Inference in High-Dimensional Panel Models With an Application to Gun Control
    Belloni, Alexandre
    Chernozhukov, Victor
    Hansen, Christian
    Kozbur, Damian
    JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2016, 34 (04) : 590 - 605
  • [47] Group inference of high-dimensional single-index models
    Han, Dongxiao
    Han, Miao
    Hao, Meiling
    Sun, Liuquan
    Wang, Siyang
    JOURNAL OF NONPARAMETRIC STATISTICS, 2024,
  • [48] Statistical inference for high-dimensional panel functional time series
    Zhou, Zhou
    Dette, Holger
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2023, 85 (02) : 523 - 549
  • [49] Frequency Domain Statistical Inference for High-Dimensional Time Series
    Krampe, Jonas
    Paparoditis, Efstathios
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2025,
  • [50] Inference in High-Dimensional Multivariate Response Regression with Hidden Variables
    Bing, Xin
    Cheng, Wei
    Feng, Huijie
    Ning, Yang
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (547) : 2066 - 2077