Machine learning on big data: Opportunities and challenges

被引:667
作者
Zhou, Lina [1 ]
Pan, Shimei [1 ]
Wang, Jianwu [1 ]
Vasilakos, Athanasios V. [2 ]
机构
[1] UMBC, Dept Informat Syst, Baltimore, MD 21250 USA
[2] Lulea Univ Technol, Dept Comp Sci Elect & Space Engn, SE-93187 Skelleftea, Sweden
基金
美国国家科学基金会;
关键词
Machine learning; Big data; Data preprocessing; Evaluation; Parallelization; FEATURE-SELECTION; CLASSIFICATION; ALGORITHM; BREAKING;
D O I
10.1016/j.neucom.2017.01.026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning (ML) is continuously unleashing its power in a wide range of applications. It has been pushed to the forefront in recent years partly owing to the advent of big data. ML algorithms have never been better promised while challenged by big data. Big data enables ML algorithms to uncover more fine-grained patterns and make more timely and accurate predictions than ever before; on the other hand, it presents major challenges to ML such as model scalability and distributed computing. In this paper, we introduce a framework of ML on big data (MLBiD) to guide the discussion of its opportunities and challenges. The framework is centered on ML which follows the phases of preprocessing, learning, and evaluation. In addition, the framework is also comprised of four other components, namely big data, user, domain, and system. The phases of ML and the components of MLBiD provide directions for identification of associated opportunities and challenges and open up future work in many unexplored or under explored research areas.
引用
收藏
页码:350 / 361
页数:12
相关论文
共 112 条
[21]  
Bagheri L., 2013, P 2013 IEEE INT C BI
[22]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[23]   Distributed feature selection: An application to microarray data classification [J].
Bolon-Canedo, V. ;
Sanchez-Marono, N. ;
Alonso-Betanzos, A. .
APPLIED SOFT COMPUTING, 2015, 30 :136-150
[24]  
Borkar V.R., 2012, IEEE Data Engineering Bulletin, V35, P24
[25]  
Bortnikov Edward., 2012, Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Computing. HotCloud'12, P18
[26]   Large-Scale Machine Learning with Stochastic Gradient Descent [J].
Bottou, Leon .
COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, :177-186
[27]   Pasting small votes for classification in large databases and on-line [J].
Breiman, L .
MACHINE LEARNING, 1999, 36 (1-2) :85-103
[28]  
Cai X., 2013, P 23 INT JOINT C ART, P2598
[29]   Online Outlier Exploration Over Large Datasets [J].
Cao, Lei ;
Wei, Mingrui ;
Yang, Di ;
Rundensteiner, Elke A. .
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, :89-98
[30]  
Catak F.O., 2015, SOFT COMPUT, P1