A Survey on Large-Scale Machine Learning

被引：79

作者：

Wang, Meng ^{[1
,2
]}

Fu, Weijie ^{[1
,2
]}

He, Xiangnan ^{[3
]}

Hao, Shijie ^{[1
,2
]}

Wu, Xindong ^{[1
,2
]}

机构：

[1] Hefei Univ Technol, Key Lab Knowledge Engn Big Data, Minist Educ, Hefei 230601, Anhui, Peoples R China

[2] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230601, Anhui, Peoples R China

[3] Univ Sci & Technol China, Hefei 230031, Anhui, Peoples R China

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2022年 / 34卷 / 06期

关键词：

Machine learning; Computational modeling; Optimization; Predictive models; Big Data; Computational complexity; Large-scale machine learning; efficient machine learning; big data analysis; efficiency; survey; GRAPH CONSTRUCTION; BIG DATA; OPTIMIZATION; ALGORITHMS;

D O I：

10.1109/TKDE.2020.3015777

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Machine learning can provide deep insights into data, allowing machines to make high-quality predictions and having been widely used in real-world applications, such as text mining, visual classification, and recommender systems. However, most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data. This issue calls for the need of Large-scale Machine Learning (LML), which aims to learn patterns from big data with comparable performance efficiently. In this paper, we offer a systematic survey on existing LML methods to provide a blueprint for the future developments of this area. We first divide these LML methods according to the ways of improving the scalability: 1) model simplification on computational complexities, 2) optimization approximation on computational efficiency, and 3) computation parallelism on computational capabilities. Then we categorize the methods in each perspective according to their targeted scenarios and introduce representative methods in line with intrinsic strategies. Lastly, we analyze their limitations and discuss potential directions as well as open issues that are promising to address in the future.

引用

页码：2574 / 2594

页数：21

共 240 条

[1]

Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265

[2]

Agarwal A, 2015, PR MACH LEARN RES, V37, P78

[3]

Agarwal N, 2018, ADV NEUR IN, V31

[4] Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC [J].

Ahn, Sungjin ;

Korattikara, Anoop ;

Liu, Nathan ;

Rajan, Suju ;

Welling, Max .

KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, :9-18

[5]

Ahn Sungjin, 2012, INT C MACHINE LEARNI

[6] Good Practice in Large-Scale Learning for Image Classification [J].

Akata, Zeynep ;

Perronnin, Florent ;

Harchaoui, Zaid ;

Schmid, Cordelia .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (03) :507-520

[7] Efficient Machine Learning for Big Data: A Review [J].

Al-Jarrah, Omar Y. ;

Yoo, Paul D. ;

Muhaidat, Sami ;

Karagiannidis, George K. ;

Taha, Kamal .

BIG DATA RESEARCH, 2015, 2 (03) :87-93

[8]

Alain Guillaume, 2015, Variance reduction in sgd by distributed importance sampling

[9]

Alistarh D, 2017, ADV NEUR IN, V30

[10]

[Anonymous], 2011, BIGLEARNING WORKSH G

← 1 2 3 4 5 6 7 8 9 10 →