Model aggregation for doubly divided data with large size and large dimension

被引:2
|
作者
He, Baihua [1 ]
Liu, Yanyan [1 ]
Yin, Guosheng [2 ]
Wu, Yuanshan [3 ]
机构
[1] Wuhan Univ, Sch Math & Stat, Wuhan 430072, Hubei, Peoples R China
[2] Univ Hong Kong, Dept Stat & Actuarial Sci, Pokfulam Rd, Hong Kong, Peoples R China
[3] Zhongnan Univ Econ & Law, Sch Stat & Math, Wuhan 430073, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
Communication efficiency; Computation complexity; Distributed algorithm; Greedy algorithm; High dimension; One-shot approach; Prediction; Storage ability; AVERAGING APPROACH; COMBINATION;
D O I
10.1007/s00180-022-01242-3
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Massive data are often featured with high dimensionality as well as large sample size, which typically cannot be stored in a single machine and thus make both analysis and prediction challenging. We propose a distributed gridding model aggregation (DGMA) approach to predicting the conditional mean of a response variable, which overcomes the storage limitation of a single machine and the curse of high dimensionality. Specifically, on each local machine that stores partial data of relatively moderate sample size, we develop the model aggregation approach by splitting predictors wherein a greedy algorithm is developed. To obtain the optimal weights across all local machines, we further design a distributed and communication-efficient algorithm. Our procedure effectively distributes the workload and dramatically reduces the communication cost. Extensive numerical experiments are carried out on both simulated and real datasets to demonstrate the feasibility of the DGMA method.
引用
收藏
页码:509 / 529
页数:21
相关论文
共 50 条
  • [31] A method of instance selection for large size data set
    Zhang, Chao
    Pei, Zheng
    Cheng, Jianmei
    Yi, Liangzhong
    ICIC Express Letters, Part B: Applications, 2013, 4 (04): : 1015 - 1022
  • [32] Unavoidable doubly connected large graphs
    Ding, GL
    Chen, P
    DISCRETE MATHEMATICS, 2004, 280 (1-3) : 1 - 12
  • [33] CORCHOP - AN INTERACTIVE ROUTINE FOR THE DIMENSION REDUCTION OF LARGE QSAR DATA SETS
    LIVINGSTONE, DJ
    RAHR, E
    QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS, 1989, 8 (02): : 103 - 108
  • [34] DISSIPATIVE NONLINEAR SCHRODINGER EQUATIONS FOR LARGE DATA IN ONE SPACE DIMENSION
    Hoshino, Gaku
    COMMUNICATIONS ON PURE AND APPLIED ANALYSIS, 2020, 19 (02) : 967 - 981
  • [35] MODEL AGGREGATION OF LARGE-SCALE SYSTEMS WITH SYMMETRY PROPERTIES
    LUNZE, J
    SYSTEMS ANALYSIS MODELLING SIMULATION, 1989, 6 (10): : 749 - 760
  • [36] Large doubly transitive orbits on a line
    Montinaro, Alessandro
    JOURNAL OF THE AUSTRALIAN MATHEMATICAL SOCIETY, 2007, 83 : 227 - 269
  • [37] On the Bingham distribution with large dimension
    Kume, A.
    Walker, S. G.
    JOURNAL OF MULTIVARIATE ANALYSIS, 2014, 124 : 345 - 352
  • [38] LARGE BASIS DIMENSION AND METRIZABILITY
    GRUENHAGE, G
    PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY, 1976, 54 (JAN) : 397 - 400
  • [39] SMOOTH STRINGS AT LARGE DIMENSION
    PISARSKI, RD
    PHYSICAL REVIEW D, 1988, 38 (02): : 578 - 596
  • [40] Large Subposets with Small Dimension
    Benjamin Reiniger
    Elyse Yeager
    Order, 2016, 33 : 81 - 84