A Survey on Large-Scale Machine Learning

被引：79

作者：

Wang, Meng ^{[1
,2
]}

Fu, Weijie ^{[1
,2
]}

He, Xiangnan ^{[3
]}

Hao, Shijie ^{[1
,2
]}

Wu, Xindong ^{[1
,2
]}

机构：

[1] Hefei Univ Technol, Key Lab Knowledge Engn Big Data, Minist Educ, Hefei 230601, Anhui, Peoples R China

[2] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230601, Anhui, Peoples R China

[3] Univ Sci & Technol China, Hefei 230031, Anhui, Peoples R China

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2022年 / 34卷 / 06期

关键词：

Machine learning; Computational modeling; Optimization; Predictive models; Big Data; Computational complexity; Large-scale machine learning; efficient machine learning; big data analysis; efficiency; survey; GRAPH CONSTRUCTION; BIG DATA; OPTIMIZATION; ALGORITHMS;

D O I：

10.1109/TKDE.2020.3015777

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Machine learning can provide deep insights into data, allowing machines to make high-quality predictions and having been widely used in real-world applications, such as text mining, visual classification, and recommender systems. However, most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data. This issue calls for the need of Large-scale Machine Learning (LML), which aims to learn patterns from big data with comparable performance efficiently. In this paper, we offer a systematic survey on existing LML methods to provide a blueprint for the future developments of this area. We first divide these LML methods according to the ways of improving the scalability: 1) model simplification on computational complexities, 2) optimization approximation on computational efficiency, and 3) computation parallelism on computational capabilities. Then we categorize the methods in each perspective according to their targeted scenarios and introduce representative methods in line with intrinsic strategies. Lastly, we analyze their limitations and discuss potential directions as well as open issues that are promising to address in the future.

引用

页码：2574 / 2594

页数：21

共 240 条

[91]

Farahat A., 2011, P 14 INT C ARTIFICIA, P269

[92] Additive logistic regression: A statistical view of boosting - Rejoinder [J].

Friedman, J ;

Hastie, T ;

Tibshirani, R .

ANNALS OF STATISTICS, 2000, 28 (02) :400-407

[93] Stochastic gradient boosting [J].

Friedman, JH .

COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 38 (04) :367-378

[94] Scalable Active Learning by Approximated Error Reduction [J].

Fu, Weijie ;

Wang, Meng ;

Hao, Shijie ;

Wu, Xindong .

KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, :1396-1405

[95] FLAG: Faster Learning on Anchor Graph with Label Predictor Optimization [J].

Fu, Weijie ;

Wang, Meng ;

Hao, Shijie ;

Mu, Tingting .

IEEE TRANSACTIONS ON BIG DATA, 2022, 8 (03) :579-591

[96]

Fujiwara Y, 2014, PR MACH LEARN RES, V32, P784

[97]

Gazagnadou N., 2019, PR MACH LEARN RES

[98]

Gilardi N., 2000, Journal of Geographic Information and Decision Analysis, V4, P11

[99]

Gittens A, 2016, J MACH LEARN RES, V17

[100]

Gonzalez J. E., 2012, Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation(OSDI '12), P17, DOI DOI 10.5555/2387880.2387883

← 5 6 7 8 9 10 11 12 13 14 →