Parallel data intensive applications using MapReduce: a data mining case study in biomedical sciences

被引:0
|
作者
Liangxiu Han
Hwee Yong Ong
机构
[1] Manchester Metropolitan University,School of Computing, Mathematics and Digital Technology
[2] University of Edinburgh,School of Informatics
来源
Cluster Computing | 2015年 / 18卷
关键词
Data-intensive computing; Parallel processing; MapReduce; Cloud computing; Data mining application in biomedical science;
D O I
暂无
中图分类号
学科分类号
摘要
Performance is an open issue in data intensive applications (e.g. data mining tasks). Parallel and distributed computing systems (e.g. multicore computing, grid computing, cloud computing,etc.), along with hybrid programming models (e.g. MapReduce, MPI, etc.), is seen a sought-after solution for accelerating data-intensive applications. One of main challenges is how to exploit these advanced technologies effectively in facilitating fundamental science discoveries such as those in Biomedical Sciences. This paper explores how MapReduce and Cloud computing can accelerate performance of data intensive applications through a real data mining use case in the Biomedical Sciences. We have first adapted the data mining task using MapReduce model and then deployed it onto the Cloud. We have built an analytic model based on the MapReduce computations to evaluate the efficiency and performance of the prototype. The results, from both experiments and the evaluation model, show the performance and scalability can be enhanced through these advanced technologies.
引用
收藏
页码:403 / 418
页数:15
相关论文
共 50 条
  • [31] Dache: A Data Aware Caching for Big-Data Applications Using The MapReduce Framework
    Zhao, Yaxiong
    Wu, Jie
    2013 PROCEEDINGS IEEE INFOCOM, 2013, : 35 - 39
  • [32] A parallel algorithm for data cleansing in incomplete information systems using MapReduce
    Chen, Fei
    Jiang, Lin
    2014 TENTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2014, : 273 - 277
  • [33] Parallel Data Processing in Dynamic Hybrid Computing Environment Using MapReduce
    Tang, Bing
    He, Haiwu
    Fedak, Gilles
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2014, PT II, 2014, 8631 : 1 - 14
  • [34] Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Framework
    Zhao, Yaxiong
    Wu, Jie
    Liu, Cong
    TSINGHUA SCIENCE AND TECHNOLOGY, 2014, 19 (01) : 39 - 50
  • [35] Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Framework
    Yaxiong Zhao
    Jie Wu
    Cong Liu
    Tsinghua Science and Technology, 2014, 19 (01) : 39 - 50
  • [36] Dache: A data aware caching for big-data applications using the MapReduce framework
    Zhao, Y. (yaxiongzhao@google.com), 1600, Tsinghua University (19): : 39 - 50
  • [37] FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce
    Xun, Yaling
    Zhang, Jifu
    Qin, Xiao
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2016, 46 (03): : 313 - 325
  • [38] A Paralleled Big Data Algorithm with MapReduce Framework for Mining Twitter Data
    Li Bing
    Chan, Keith C. C.
    2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING (BDCLOUD), 2014, : 121 - 128
  • [39] Parallel Map Matching on Massive Vehicle GPS Data Using MapReduce
    Huang, Jian
    Qiao, Shaoqing
    Yu, Haitao
    Qie, Jinhui
    Liu, Chunwei
    2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1498 - 1503
  • [40] P2P-MapReduce: Parallel data processing in dynamic Cloud environments
    Marozzo, Fabrizio
    Talia, Domenico
    Trunfio, Paolo
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2012, 78 (05) : 1382 - 1402