Parallel data intensive applications using MapReduce: a data mining case study in biomedical sciences

被引:0
|
作者
Liangxiu Han
Hwee Yong Ong
机构
[1] Manchester Metropolitan University,School of Computing, Mathematics and Digital Technology
[2] University of Edinburgh,School of Informatics
来源
Cluster Computing | 2015年 / 18卷
关键词
Data-intensive computing; Parallel processing; MapReduce; Cloud computing; Data mining application in biomedical science;
D O I
暂无
中图分类号
学科分类号
摘要
Performance is an open issue in data intensive applications (e.g. data mining tasks). Parallel and distributed computing systems (e.g. multicore computing, grid computing, cloud computing,etc.), along with hybrid programming models (e.g. MapReduce, MPI, etc.), is seen a sought-after solution for accelerating data-intensive applications. One of main challenges is how to exploit these advanced technologies effectively in facilitating fundamental science discoveries such as those in Biomedical Sciences. This paper explores how MapReduce and Cloud computing can accelerate performance of data intensive applications through a real data mining use case in the Biomedical Sciences. We have first adapted the data mining task using MapReduce model and then deployed it onto the Cloud. We have built an analytic model based on the MapReduce computations to evaluate the efficiency and performance of the prototype. The results, from both experiments and the evaluation model, show the performance and scalability can be enhanced through these advanced technologies.
引用
收藏
页码:403 / 418
页数:15
相关论文
共 50 条
  • [41] Data-intensive applications, challenges, techniques and technologies: A survey on Big Data
    Chen, C. L. Philip
    Zhang, Chun-Yang
    INFORMATION SCIENCES, 2014, 275 : 314 - 347
  • [42] Big Data Applications Using Workflows for Data Parallel Computing
    Wang, Jianwu
    Crawl, Daniel
    Altintas, Ilkay
    Li, Weizhong
    COMPUTING IN SCIENCE & ENGINEERING, 2014, 16 (04) : 11 - 21
  • [43] Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds
    Rao, B. Thirumala
    Reddy, L. S. S.
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2013, 13 (06): : 105 - 112
  • [44] A new data mining algorithm based on MapReduce and hadoop
    Yang, Xianfeng
    Lian, Liming
    International Journal of Signal Processing, Image Processing and Pattern Recognition, 2014, 7 (02) : 131 - 142
  • [45] i2MapReduce: Incremental MapReduce for Mining Evolving Big Data
    Zhang, Yanfeng
    Chen, Shimin
    Wang, Qiang
    Yu, Ge
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (07) : 1906 - 1919
  • [46] Data Intensive Parallel Feature Selection Method Study
    Sun, Zhanquan
    Li, Zhao
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 2256 - 2262
  • [47] New approach in Big Data Mining for frequent itemset using mapreduce in HDFS
    Nikam, Pallavi V.
    Deshpande, Deepa S.
    2018 3RD INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2018,
  • [48] Mining Frequent Itemsets with Vertical Data Layout in MapReduce
    Jen, Tao-Yuan
    Marinica, Claudia
    Ghariani, Abir
    INFORMATION SEARCH, INTEGRATION AND PERSONALIZATION, ISIP 2014, 2016, 497 : 66 - 82
  • [49] Parallel similarity joins on massive high-dimensional data using MapReduce
    Ma, Youzhong
    Meng, Xiaofeng
    Wang, Shaoya
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (01) : 166 - 183
  • [50] A MapReduce based Parallel Algorithm for CIM Data Verification
    Liu, Yang
    Shen, Xiaodong
    Xu, Lixiong
    Li, Maozhen
    2014 11TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2014, : 704 - 709