A Survey on Large-Scale Machine Learning

被引:66
作者
Wang, Meng [1 ,2 ]
Fu, Weijie [1 ,2 ]
He, Xiangnan [3 ]
Hao, Shijie [1 ,2 ]
Wu, Xindong [1 ,2 ]
机构
[1] Hefei Univ Technol, Key Lab Knowledge Engn Big Data, Minist Educ, Hefei 230601, Anhui, Peoples R China
[2] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230601, Anhui, Peoples R China
[3] Univ Sci & Technol China, Hefei 230031, Anhui, Peoples R China
关键词
Machine learning; Computational modeling; Optimization; Predictive models; Big Data; Computational complexity; Large-scale machine learning; efficient machine learning; big data analysis; efficiency; survey; GRAPH CONSTRUCTION; BIG DATA; OPTIMIZATION; ALGORITHMS;
D O I
10.1109/TKDE.2020.3015777
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions and having been widely used in real-world applications, such as text mining, visual classification, and recommender systems. However, most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data. This issue calls for the need of Large-scale Machine Learning (LML), which aims to learn patterns from big data with comparable performance efficiently. In this paper, we offer a systematic survey on existing LML methods to provide a blueprint for the future developments of this area. We first divide these LML methods according to the ways of improving the scalability: 1) model simplification on computational complexities, 2) optimization approximation on computational efficiency, and 3) computation parallelism on computational capabilities. Then we categorize the methods in each perspective according to their targeted scenarios and introduce representative methods in line with intrinsic strategies. Lastly, we analyze their limitations and discuss potential directions as well as open issues that are promising to address in the future.
引用
收藏
页码:2574 / 2594
页数:21
相关论文
共 240 条
  • [1] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
  • [2] Agarwal A., 2011, ADV NEURAL INFORM PR, P873
  • [3] Agarwal A, 2015, PR MACH LEARN RES, V37, P78
  • [4] Agarwal Naman, 2018, ADV NEURAL INFORM PR, P7575
  • [5] Ahn S., 2012, P 29 INT COFERENCE I, P1591
  • [6] Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC
    Ahn, Sungjin
    Korattikara, Anoop
    Liu, Nathan
    Rajan, Suju
    Welling, Max
    [J]. KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 9 - 18
  • [7] Good Practice in Large-Scale Learning for Image Classification
    Akata, Zeynep
    Perronnin, Florent
    Harchaoui, Zaid
    Schmid, Cordelia
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (03) : 507 - 520
  • [8] Efficient Machine Learning for Big Data: A Review
    Al-Jarrah, Omar Y.
    Yoo, Paul D.
    Muhaidat, Sami
    Karagiannidis, George K.
    Taha, Kamal
    [J]. BIG DATA RESEARCH, 2015, 2 (03) : 87 - 93
  • [9] Alistarh D, 2017, ADV NEUR IN, V30
  • [10] [Anonymous], 2012, Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing (NSDI'12)