Parallel data intensive applications using MapReduce: a data mining case study in biomedical sciences

被引:0
|
作者
Liangxiu Han
Hwee Yong Ong
机构
[1] Manchester Metropolitan University,School of Computing, Mathematics and Digital Technology
[2] University of Edinburgh,School of Informatics
来源
Cluster Computing | 2015年 / 18卷
关键词
Data-intensive computing; Parallel processing; MapReduce; Cloud computing; Data mining application in biomedical science;
D O I
暂无
中图分类号
学科分类号
摘要
Performance is an open issue in data intensive applications (e.g. data mining tasks). Parallel and distributed computing systems (e.g. multicore computing, grid computing, cloud computing,etc.), along with hybrid programming models (e.g. MapReduce, MPI, etc.), is seen a sought-after solution for accelerating data-intensive applications. One of main challenges is how to exploit these advanced technologies effectively in facilitating fundamental science discoveries such as those in Biomedical Sciences. This paper explores how MapReduce and Cloud computing can accelerate performance of data intensive applications through a real data mining use case in the Biomedical Sciences. We have first adapted the data mining task using MapReduce model and then deployed it onto the Cloud. We have built an analytic model based on the MapReduce computations to evaluate the efficiency and performance of the prototype. The results, from both experiments and the evaluation model, show the performance and scalability can be enhanced through these advanced technologies.
引用
收藏
页码:403 / 418
页数:15
相关论文
共 50 条
  • [1] Parallel data intensive applications using MapReduce: a data mining case study in biomedical sciences
    Han, Liangxiu
    Ong, Hwee Yong
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2015, 18 (01): : 403 - 418
  • [2] Accelerating Biomedical Data-Intensive Applications using MapReduce
    Han, Liangxiu
    Ong, Hwee Yong
    2012 ACM/IEEE 13TH INTERNATIONAL CONFERENCE ON GRID COMPUTING (GRID), 2012, : 49 - 57
  • [3] Leveraging Data Intensive Applications on a Pervasive Computing Platform: the case of MapReduce
    Steffenel, Luiz Angelo
    Pinheiro, Manuele Kirch
    6TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT-2015), THE 5TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY (SEIT-2015), 2015, 52 : 1034 - 1039
  • [4] Big data mining with parallel computing: A comparison of distributed and MapReduce methodologies
    Tsai, Chih-Fong
    Lin, Wei-Chao
    Ke, Shih-Wen
    JOURNAL OF SYSTEMS AND SOFTWARE, 2016, 122 : 83 - 92
  • [5] MapReduce Across Distributed Clusters for Data-intensive Applications
    Wang, Lizhe
    Tao, Jie
    Marten, Holger
    Streit, Achim
    Khan, Samee U.
    Kolodziej, Joanna
    Chen, Dan
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 2004 - 2011
  • [6] Analysis of Massive Industrial Data using MapReduce Framework for Parallel Processing
    Aly, Mohab
    Yacout, Soumaya
    Shaban, Yasser
    2017 ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, 2017,
  • [7] Performance of Scalable Off-The-Shelf Hardware for Data-intensive Parallel Processing using MapReduce
    Fadzil, Ahmad Firdaus Ahmad
    Khalid, Noor Elaiza Abdul
    Manaf, Mazani
    2012 7TH INTERNATIONAL CONFERENCE ON COMPUTING AND CONVERGENCE TECHNOLOGY (ICCCT2012), 2012, : 379 - 384
  • [8] Parallel Associative Classification Data Mining Frameworks Based MapReduce
    Thabtah, Fadi
    Hammoud, Suhel
    Abdel-Jaber, Hussein
    PARALLEL PROCESSING LETTERS, 2015, 25 (02)
  • [9] Hybrid Data Mining Algorithm in Cloud Computing using MapReduce Framework
    Sahay, Siddharth
    Khetarpal, Suruchi
    Pradhan, Tribikram
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION CONTROL AND COMPUTING TECHNOLOGIES (ICACCCT), 2016, : 507 - 511
  • [10] Biomedical Case Studies in Data Intensive Computing
    Fox, Geoffrey
    Qiu, Xiaohong
    Beason, Scott
    Choi, Jong
    Ekanayake, Jaliya
    Gunarathne, Thilina
    Rho, Mina
    Tang, Haixu
    Devadasan, Neil
    Liu, Gilbert
    CLOUD COMPUTING, PROCEEDINGS, 2009, 5931 : 2 - 18