Parallel data intensive applications using MapReduce: a data mining case study in biomedical sciences

被引：0

作者：

Liangxiu Han

Hwee Yong Ong

机构：

[1] Manchester Metropolitan University,School of Computing, Mathematics and Digital Technology

[2] University of Edinburgh,School of Informatics

来源：

Cluster Computing | 2015年 / 18卷

关键词：

Data-intensive computing; Parallel processing; MapReduce; Cloud computing; Data mining application in biomedical science;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Performance is an open issue in data intensive applications (e.g. data mining tasks). Parallel and distributed computing systems (e.g. multicore computing, grid computing, cloud computing,etc.), along with hybrid programming models (e.g. MapReduce, MPI, etc.), is seen a sought-after solution for accelerating data-intensive applications. One of main challenges is how to exploit these advanced technologies effectively in facilitating fundamental science discoveries such as those in Biomedical Sciences. This paper explores how MapReduce and Cloud computing can accelerate performance of data intensive applications through a real data mining use case in the Biomedical Sciences. We have first adapted the data mining task using MapReduce model and then deployed it onto the Cloud. We have built an analytic model based on the MapReduce computations to evaluate the efficiency and performance of the prototype. The results, from both experiments and the evaluation model, show the performance and scalability can be enhanced through these advanced technologies.

引用

页码：403 / 418

页数：15

共 50 条

[31] Dache: A Data Aware Caching for Big-Data Applications Using The MapReduce Framework
Zhao, Yaxiong
Wu, Jie
2013 PROCEEDINGS IEEE INFOCOM, 2013, : 35 - 39
[32] A parallel algorithm for data cleansing in incomplete information systems using MapReduce
Chen, Fei
Jiang, Lin
2014 TENTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2014, : 273 - 277
[33] Parallel Data Processing in Dynamic Hybrid Computing Environment Using MapReduce
Tang, Bing
He, Haiwu
Fedak, Gilles
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2014, PT II, 2014, 8631 : 1 - 14
[34] Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Framework
Zhao, Yaxiong
Wu, Jie
Liu, Cong
TSINGHUA SCIENCE AND TECHNOLOGY, 2014, 19 (01) : 39 - 50
[35] Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Framework
Yaxiong Zhao
Jie Wu
Cong Liu
Tsinghua Science and Technology, 2014, 19 (01) : 39 - 50
[36] Dache: A data aware caching for big-data applications using the MapReduce framework
Zhao, Y. (yaxiongzhao@google.com), 1600, Tsinghua University (19): : 39 - 50
[37] FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce
Xun, Yaling
Zhang, Jifu
Qin, Xiao
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2016, 46 (03): : 313 - 325
[38] A Paralleled Big Data Algorithm with MapReduce Framework for Mining Twitter Data
Li Bing
Chan, Keith C. C.
2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING (BDCLOUD), 2014, : 121 - 128
[39] Parallel Map Matching on Massive Vehicle GPS Data Using MapReduce
Huang, Jian
Qiao, Shaoqing
Yu, Haitao
Qie, Jinhui
Liu, Chunwei
2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1498 - 1503
[40] P2P-MapReduce: Parallel data processing in dynamic Cloud environments
Marozzo, Fabrizio
Talia, Domenico
Trunfio, Paolo
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2012, 78 (05) : 1382 - 1402

← 1 2 3 4 5 →