Distributed Computing and Inference for Big Data

被引:1
|
作者
Zhou, Ling [1 ,2 ]
Gong, Ziyang [1 ,2 ]
Xiang, Pengcheng [1 ,2 ]
机构
[1] Southwestern Univ Finance & Econ, Ctr Stat Res, Chengdu, Peoples R China
[2] Southwestern Univ Finance & Econ, Sch Stat, Chengdu, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
communication efficiency; distributed learning; federated learning; heterogeneity; statistical equivalence; DIVIDE-AND-CONQUER; CONVERGENCE; ALGORITHMS; EFFICIENCY; FRAMEWORK;
D O I
10.1146/annurev-statistics-040522-021241
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Data are distributed across different sites due to computing facility limitations or data privacy considerations. Conventional centralized methods-those in which all datasets are stored and processed in a central computing facility-are not applicable in practice. Therefore, it has become necessary to develop distributed learning approaches that have good inference or predictive accuracy while remaining free of individual data or obeying policies and regulations to protect privacy. In this article, we introduce the basic idea of distributed learning and conduct a selected review on various distributed learning methods, which are categorized by their statistical accuracy, computational efficiency, heterogeneity, and privacy. This categorization can help evaluate newly proposed methods from different aspects. Moreover, we provide up-to-date descriptions of the existing theoretical results that cover statistical equivalency and computational efficiency under different statistical learning frameworks. Finally, we provide existing software implementations and benchmark datasets, and we discuss future research opportunities.
引用
收藏
页码:533 / 551
页数:19
相关论文
共 50 条
  • [1] Distributed Big Data Analytics in Service Computing
    Yu, Weider D.
    Gottumukkala, AvinashChander
    Senthailselvi, Deenash Arivazhagan
    Maniraj, Prabhu
    Khonde, Tushar
    2017 IEEE 13TH INTERNATIONAL SYMPOSIUM ON AUTONOMOUS DECENTRALIZED SYSTEMS (ISADS 2017), 2017, : 55 - 60
  • [2] Distributed matrix computing system for big data
    Zhang, Guangtao
    INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2024, 18 (04): : 2915 - 2931
  • [3] Parallel and distributed computing for Big Data applications
    Senger, Hermes
    Geyer, Claudio
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (08): : 2412 - 2415
  • [5] A distributed computing model for big data anonymization in the networks
    Ashkouti, Farough
    Khamforoosh, Keyhan
    PLOS ONE, 2023, 18 (04):
  • [6] A Distributed Computing Platform for fMRI Big Data Analytics
    Makkie, Milad
    Li, Xiang
    Quinn, Shannon
    Lin, Binbin
    Ye, Jieping
    Mon, Geoffrey
    Liu, Tianming
    IEEE TRANSACTIONS ON BIG DATA, 2019, 5 (02) : 109 - 119
  • [7] Big Data Mining Using Public Distributed Computing
    Jurgelevicius, Albertas
    Sakalauskas, Leonidas
    INFORMATION TECHNOLOGY AND CONTROL, 2018, 47 (02): : 236 - 248
  • [8] Lightweight distributed computing framework for orchestrating high performance computing and big data
    Ince, Muhammed Numan
    Gunay, Melih
    Ledet, Joseph
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2022, 30 (04) : 1571 - 1585
  • [9] Survey of Distributed Computing Frameworks for Supporting Big Data Analysis
    Sun, Xudong
    He, Yulin
    Wu, Dingming
    Huang, Joshua Zhexue
    BIG DATA MINING AND ANALYTICS, 2023, 6 (02) : 154 - 169
  • [10] A Distributed Mobile Cloud Computing Model for Secure Big Data
    Sung, Soonhwa
    Youn, Cheong
    Kong, Eunbae
    Ryou, Jaecheol
    2016 INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN), 2016, : 312 - 316