Model aggregation for doubly divided data with large size and large dimension

被引:2
|
作者
He, Baihua [1 ]
Liu, Yanyan [1 ]
Yin, Guosheng [2 ]
Wu, Yuanshan [3 ]
机构
[1] Wuhan Univ, Sch Math & Stat, Wuhan 430072, Hubei, Peoples R China
[2] Univ Hong Kong, Dept Stat & Actuarial Sci, Pokfulam Rd, Hong Kong, Peoples R China
[3] Zhongnan Univ Econ & Law, Sch Stat & Math, Wuhan 430073, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
Communication efficiency; Computation complexity; Distributed algorithm; Greedy algorithm; High dimension; One-shot approach; Prediction; Storage ability; AVERAGING APPROACH; COMBINATION;
D O I
10.1007/s00180-022-01242-3
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Massive data are often featured with high dimensionality as well as large sample size, which typically cannot be stored in a single machine and thus make both analysis and prediction challenging. We propose a distributed gridding model aggregation (DGMA) approach to predicting the conditional mean of a response variable, which overcomes the storage limitation of a single machine and the curse of high dimensionality. Specifically, on each local machine that stores partial data of relatively moderate sample size, we develop the model aggregation approach by splitting predictors wherein a greedy algorithm is developed. To obtain the optimal weights across all local machines, we further design a distributed and communication-efficient algorithm. Our procedure effectively distributes the workload and dramatically reduces the communication cost. Extensive numerical experiments are carried out on both simulated and real datasets to demonstrate the feasibility of the DGMA method.
引用
收藏
页码:509 / 529
页数:21
相关论文
共 50 条
  • [41] ON LARGE COHOMOLOGICAL DIMENSION AND TAUTNESS
    DEO, S
    MUTTEPAWAR, S
    SINGH, M
    JOURNAL OF THE AUSTRALIAN MATHEMATICAL SOCIETY SERIES A-PURE MATHEMATICS AND STATISTICS, 1984, 37 (DEC): : 391 - 404
  • [42] GAyenrding inequality in large dimension
    Lascar, Richard
    Nourrigat, Jean
    ISRAEL JOURNAL OF MATHEMATICS, 2014, 200 (01) : 79 - 84
  • [43] Using plate mapping to examine portion size and plate composition for large and small divided plates
    Sharp, David E.
    Sobal, Jeffery
    Wansink, Brian
    EATING BEHAVIORS, 2014, 15 (04) : 658 - 663
  • [44] SMALL MATRICES OF LARGE DIMENSION
    BRUALDI, RA
    CSIMA, J
    LINEAR ALGEBRA AND ITS APPLICATIONS, 1991, 150 : 227 - 241
  • [45] Large Subposets with Small Dimension
    Reiniger, Benjamin
    Yeager, Elyse
    ORDER-A JOURNAL ON THE THEORY OF ORDERED SETS AND ITS APPLICATIONS, 2016, 33 (01): : 81 - 84
  • [46] A RELATIONAL MODEL OF DATA FOR LARGE SHARED DATA BANKS
    CODD, EF
    COMMUNICATIONS OF THE ACM, 1970, 13 (06) : 377 - &
  • [47] Forcing Posets with Large Dimension to Contain Large Standard Examples
    Biro, Csaba
    Hamburger, Peter
    Por, Attila
    Trotter, William T.
    GRAPHS AND COMBINATORICS, 2016, 32 (03) : 861 - 880
  • [48] Forcing Posets with Large Dimension to Contain Large Standard Examples
    Csaba Biró
    Peter Hamburger
    Attila Pór
    William T. Trotter
    Graphs and Combinatorics, 2016, 32 : 861 - 880
  • [49] An Efficient Data Aggregation Approach for Large Scale Wireless Sensor Networks
    Karim, Lutful
    Nasser, Nidal
    Abdulsalam, Hanady
    Moukadem, Imad
    2010 IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE GLOBECOM 2010, 2010,
  • [50] FAST DIMENSION-REDUCED CLIMATE MODEL CALIBRATION AND THE EFFECT OF DATA AGGREGATION
    Chang, Won
    Haran, Murali
    Olson, Roman
    Keller, Klaus
    ANNALS OF APPLIED STATISTICS, 2014, 8 (02): : 649 - 673