Model aggregation for doubly divided data with large size and large dimension

被引:2
|
作者
He, Baihua [1 ]
Liu, Yanyan [1 ]
Yin, Guosheng [2 ]
Wu, Yuanshan [3 ]
机构
[1] Wuhan Univ, Sch Math & Stat, Wuhan 430072, Hubei, Peoples R China
[2] Univ Hong Kong, Dept Stat & Actuarial Sci, Pokfulam Rd, Hong Kong, Peoples R China
[3] Zhongnan Univ Econ & Law, Sch Stat & Math, Wuhan 430073, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
Communication efficiency; Computation complexity; Distributed algorithm; Greedy algorithm; High dimension; One-shot approach; Prediction; Storage ability; AVERAGING APPROACH; COMBINATION;
D O I
10.1007/s00180-022-01242-3
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Massive data are often featured with high dimensionality as well as large sample size, which typically cannot be stored in a single machine and thus make both analysis and prediction challenging. We propose a distributed gridding model aggregation (DGMA) approach to predicting the conditional mean of a response variable, which overcomes the storage limitation of a single machine and the curse of high dimensionality. Specifically, on each local machine that stores partial data of relatively moderate sample size, we develop the model aggregation approach by splitting predictors wherein a greedy algorithm is developed. To obtain the optimal weights across all local machines, we further design a distributed and communication-efficient algorithm. Our procedure effectively distributes the workload and dramatically reduces the communication cost. Extensive numerical experiments are carried out on both simulated and real datasets to demonstrate the feasibility of the DGMA method.
引用
收藏
页码:509 / 529
页数:21
相关论文
共 50 条
  • [21] Efficient data dissemination and aggregation in large wireless sensor networks
    Youn, JH
    Kalva, RR
    Park, S
    VTC2004-FALL: 2004 IEEE 60TH VEHICULAR TECHNOLOGY CONFERENCE, VOLS 1-7: WIRELESS TECHNOLOGIES FOR GLOBAL SECURITY, 2004, : 4602 - 4606
  • [22] Magging: Maximin Aggregation for Inhomogeneous Large-Scale Data
    Buehlmann, Peter
    Meinshausen, Nicolai
    PROCEEDINGS OF THE IEEE, 2016, 104 (01) : 126 - 135
  • [23] Dynamic binary instrumentation and data aggregation on large scale systems
    Lee, Gregory L.
    Schulz, Martin
    Ahn, Dong H.
    Bernat, Andrew
    de Supinski, Bronis R.
    Ko, Steven Y.
    Rountree, Barry
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2007, 35 (03) : 207 - 232
  • [24] Efficient aggregation algorithms on very large compressed data warehouses
    Jianzhong Li
    Yingshu Li
    Jaideep Srivastava
    Journal of Computer Science and Technology, 2000, 15 : 213 - 229
  • [25] Capturing and Processing for Large Dimension Measurement of Large Transformers
    Chi, Xie
    Ying, Liu
    Nian, Liu
    Fu, Yang
    Tang Xiaoji
    2008 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY, VOLS 1-5, 2008, : 1798 - +
  • [26] Large scale fractal aggregates using the tunable dimension cluster-cluster aggregation
    Vormoor, O
    COMPUTER PHYSICS COMMUNICATIONS, 2002, 144 (02) : 121 - 129
  • [27] Estimating population size of heterogeneous populations with large data sets and a large number of parameters
    Li, Haoqi
    Lin, Huazhen
    Yip, Paul S. F.
    Li, Yuan
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2019, 139 : 34 - 44
  • [28] Dynamics of a large extra dimension inspired hybrid inflation model
    Green, AM
    Mazumdar, A
    PHYSICAL REVIEW D, 2002, 65 (10):
  • [29] Large time behavior of a bipolar hydrodynamic model with large data and vacuum
    Zhan, Yunlei
    AIMS MATHEMATICS, 2018, 3 (01): : 56 - 65
  • [30] Monitoring Variation in a Multivariate Process When the Dimension is Large Relative to the Sample Size
    Mason, Robert L.
    Chou, Youn-Min
    Young, John C.
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2009, 38 (06) : 939 - 951