Parallel data intensive applications using MapReduce: a data mining case study in biomedical sciences

被引：0

作者：

Liangxiu Han

Hwee Yong Ong

机构：

[1] Manchester Metropolitan University,School of Computing, Mathematics and Digital Technology

[2] University of Edinburgh,School of Informatics

来源：

Cluster Computing | 2015年 / 18卷

关键词：

Data-intensive computing; Parallel processing; MapReduce; Cloud computing; Data mining application in biomedical science;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Performance is an open issue in data intensive applications (e.g. data mining tasks). Parallel and distributed computing systems (e.g. multicore computing, grid computing, cloud computing,etc.), along with hybrid programming models (e.g. MapReduce, MPI, etc.), is seen a sought-after solution for accelerating data-intensive applications. One of main challenges is how to exploit these advanced technologies effectively in facilitating fundamental science discoveries such as those in Biomedical Sciences. This paper explores how MapReduce and Cloud computing can accelerate performance of data intensive applications through a real data mining use case in the Biomedical Sciences. We have first adapted the data mining task using MapReduce model and then deployed it onto the Cloud. We have built an analytic model based on the MapReduce computations to evaluate the efficiency and performance of the prototype. The results, from both experiments and the evaluation model, show the performance and scalability can be enhanced through these advanced technologies.

引用

页码：403 / 418

页数：15

共 50 条

[21] Efficient Results Merging for Parallel Data Clustering Using MapReduce
Bousbaci, Abdelhak
Kamel, Nadjet
DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, (DCAI 2016), 2016, 474 : 349 - 357
[22] PARALLEL KNOWLEDGE ACQUISITION ALGORITHM FOR BIG DATA USING MAPREDUCE
Qian, Jin
Xia, Min
Lv, Ping
PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL. 1, 2015, : 316 - 321
[23] Bucket MapReduce: Relieving the Disk I/O Intensity of Data-Intensive Applications in MapReduce Frameworks
Chen, Kai-Hsun
Chen, Hsin-Yuan
Wang, Chien-Min
2021 29TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2021), 2021, : 18 - 25
[24] Design of Self-Adjusting algorithm for data-intensive MapReduce Applications
Nagiwale, Amin Nazir
Umale, Manish R.
Sinha, Aditya Kumar
2015 INTERNATIONAL CONFERENCE ON ENERGY SYSTEMS AND APPLICATIONS, 2015, : 506 - 510
[25] Computation Model of Data Intensive Computing with MapReduce
Adamov, Abzetdin Z.
2020 IEEE 14TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2020), 2020,
[26] Meta-MapReduce for scalable data mining
Liu X.
Wang X.
Matwin S.
Japkowicz N.
J. Big Data, 1 (1):
[27] Large-Scale Multimedia Data Mining Using MapReduce Framework
Wang, Hanli
Shen, Yun
Wang, Lei
Zhufeng, Kuangtian
Wang, Wei
Cheng, Cheng
2012 IEEE 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), 2012,
[28] Data Cube Materialization and Mining over MapReduce
Nandi, Arnab
Yu, Cong
Bohannon, Philip
Ramakrishnan, Raghu
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (10) : 1747 - 1759
[29] Parallel labeling of massive XML data with MapReduce
Hyebong Choi
Kyong-Ha Lee
Yoon-Joon Lee
The Journal of Supercomputing, 2014, 67 : 408 - 437
[30] Parallel labeling of massive XML data with MapReduce
Choi, Hyebong
Lee, Kyong-Ha
Lee, Yoon-Joon
JOURNAL OF SUPERCOMPUTING, 2014, 67 (02) : 408 - 437

← 1 2 3 4 5 →