Machine learning on big data: Opportunities and challenges

被引:667
作者
Zhou, Lina [1 ]
Pan, Shimei [1 ]
Wang, Jianwu [1 ]
Vasilakos, Athanasios V. [2 ]
机构
[1] UMBC, Dept Informat Syst, Baltimore, MD 21250 USA
[2] Lulea Univ Technol, Dept Comp Sci Elect & Space Engn, SE-93187 Skelleftea, Sweden
基金
美国国家科学基金会;
关键词
Machine learning; Big data; Data preprocessing; Evaluation; Parallelization; FEATURE-SELECTION; CLASSIFICATION; ALGORITHM; BREAKING;
D O I
10.1016/j.neucom.2017.01.026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning (ML) is continuously unleashing its power in a wide range of applications. It has been pushed to the forefront in recent years partly owing to the advent of big data. ML algorithms have never been better promised while challenged by big data. Big data enables ML algorithms to uncover more fine-grained patterns and make more timely and accurate predictions than ever before; on the other hand, it presents major challenges to ML such as model scalability and distributed computing. In this paper, we introduce a framework of ML on big data (MLBiD) to guide the discussion of its opportunities and challenges. The framework is centered on ML which follows the phases of preprocessing, learning, and evaluation. In addition, the framework is also comprised of four other components, namely big data, user, domain, and system. The phases of ML and the components of MLBiD provide directions for identification of associated opportunities and challenges and open up future work in many unexplored or under explored research areas.
引用
收藏
页码:350 / 361
页数:12
相关论文
共 112 条
[1]  
Abadi Abadi M M, Tensorflow: Large-scale machine learning on heterogeneous distributed systems Tensorflow: Large-scale machine learning on heterogeneous distributed systems
[2]  
Al-Jarrah O. Y. S., P 2014 IEEE 34 INT C
[3]   Power to the People: The Role of Humans in Interactive Machine Learning [J].
Amershi, Saleema ;
Cakmak, Maya ;
Knox, W. Bradley ;
Kulesza, Todd .
AI MAGAZINE, 2014, 35 (04) :105-120
[4]  
[Anonymous], 2006, NIPS
[5]  
[Anonymous], 2011, Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD)
[6]  
[Anonymous], 2013, P C EMPIRICAL METHOD
[7]  
[Anonymous], 2012, P 1 INT WORKSHOP BIG, DOI [DOI 10.1145/2351316.2351317, 10.1145/2351316.2351317]
[8]  
[Anonymous], 2012, Scaling up Machine LearningParallel and Distributed Approaches
[9]  
[Anonymous], 2016, DEEP LEARNING
[10]  
[Anonymous], 2013, P 6 BIENN C INN DAT