A Survey and Recommendations for Distributed, Parallel, Single Pass, Incremental Bayesian Classification based on MapReduce for Big Data

被引:0
作者
Shafiq, M. Omair [1 ]
Yang, Yibing [1 ]
Fekri, Maryam [1 ]
机构
[1] Carleton Univ, Sch Informat Technol, Ottawa, ON, Canada
来源
2017 IEEE 19TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS WORKSHOPS (HPCCWS): MULTICORE AND MULTITHREADED ARCHITECTURES AND ALGORITHMS (M2A2 2017) | 2017年
关键词
Distributed; Parallel; Single-pass; Incremental; Bayesian; Classification; Big Data;
D O I
10.1109/HPCCWS.2017.00013
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In the emerging digital age, massive production of data is occurred actively or passively by collecting data from users and environment via applications, sensor devices and so on. That makes it important and crucial to have the ability to process big data efficiently and effectively utilize it. The challenge to process big data is that it has high volume, velocity, variety, as well as veracity and value. In this paper, we present a survey of related work and prescribe our recommendations towards building Bayesian classification for big data environments. It is based on MapReduce and is distributed, parallel, single pass and incremental which makes it feasible to be deployed and executed on cloud computing platform We also carry out scalability analysis of the proposed solution that it can train Bayesian classifier to perform predictive analytics by processing big data with large volume, velocity and variety.
引用
收藏
页码:42 / 49
页数:8
相关论文
共 30 条
[1]  
[Anonymous], 2001, ADAP COMP MACH LEARN
[2]   Learning distributed discrete Bayesian Network Classifiers under MapReduce with Apache Spark [J].
Arias, Jacinto ;
Gamez, Jose A. ;
Puerta, Jose M. .
KNOWLEDGE-BASED SYSTEMS, 2017, 117 :16-26
[3]  
Bennett J, 2009, 2009 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING AND WORKSHOPS, P299
[4]  
Borthakur D., 2008, HADOOP APACHE PROJECT
[5]  
BRADSKI G., 2007, NIPS, P281
[6]  
Chang E. Y., NIPS 2007
[7]  
Chattratichat J., 1997, 3 INT C KNOWL DISC D
[8]  
Che Y., 2016, IEEE 28 INT C TOOLS
[9]   Data-intensive applications, challenges, techniques and technologies: A survey on Big Data [J].
Chen, C. L. Philip ;
Zhang, Chun-Yang .
INFORMATION SCIENCES, 2014, 275 :314-347
[10]  
Chen M., BIG DATA SURVEY