Data Analytics in the Cloud with Flexible MapReduce Workflows

被引:0
|
作者
Goncalves, Carlos [1 ,2 ]
Assuncao, Luis [1 ,2 ]
Cunha, Jose C. [2 ]
机构
[1] Univ Nova Lisboa, Inst Super Engn Lisboa, P-1200 Lisbon, Portugal
[2] Univ Nova Lisboa, Fac Ciencias Tecnol, Dept Informat, CITI, P-1200 Lisbon, Portugal
来源
2012 IEEE 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM) | 2012年
关键词
MapReduce; Workflow; Text Mining; Cloud; MAP-REDUCE;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data analytic applications are characterized by large data sets that are subject to a series of processing phases. Some of these phases are executed sequentially but others can be executed concurrently or in parallel on clusters, grids or clouds. The MapReduce programming model has been applied to process large data sets in cluster and cloud environments. For developing an application using MapReduce there is a need to install/configure/access specific frameworks such as Apache Hadoop or Elastic MapReduce in Amazon Cloud It would he desirable to provide more flexibility in adjusting such configurations according to the application characteristics. Furthermore the composition. of the multiple phases of a data analytic application requires the specification of all the phases and their orchestration. The original MapReduce model and environment lacks flexible support for such configuration and composition. Recognizing that scientific workflows have been successfully applied to modeling complex applications, this paper describes our experiments on implementing MapReduce as sub-workflows in the A WARD framework (Autonomic Workflow Activities Reconfigurable and Dynamic). A text mining data analytic application is modeled as a complex workflow with multiple phases, where individual workflow nodes support MapReduce computations. As in typical MapReduce environments, the end user only needs to define the application algorithms for input data processing and for the map and reduce functions. In the paper we present experimental results when using the A WARD framework to execute MapReduce workflows deployed over multiple Amazon EC2 (Elastic Compute Cloud) instances.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Scheduling of Big Data Application Workflows in Cloud and Inter-Cloud Environments
    Rani, Kezia B.
    Babu, Vinaya A.
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2862 - 2864
  • [22] Moving Hadoop to the Cloud for Big Data Analytics
    Astrova, Irina
    Koschel, Arne
    Heine, Felix
    Kalja, Ahto
    DATABASES AND INFORMATION SYSTEMS X (DB&IS 2018), 2019, 315 : 195 - 209
  • [23] Challenges of Cloud Computing & Big Data Analytics
    Gupta, Anita
    Mehrotra, Abhay
    Khan, P. M.
    2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2015, : 1112 - 1115
  • [24] An intelligent surveillance video analytics framework using NACT-Hadoop/MapReduce on cloud services
    Nirmalan, R.
    Gokulakrishnan, K.
    DISTRIBUTED AND PARALLEL DATABASES, 2021, 39 (04) : 873 - 889
  • [25] Autonomic deployment decision making for big data analytics applications in the cloud
    Lu, Qinghua
    Li, Zheng
    Zhang, Weishan
    Yang, Laurence T.
    SOFT COMPUTING, 2017, 21 (16) : 4501 - 4512
  • [26] Handling Big Data Using MapReduce Over Hybrid Cloud
    Saxena, Ankur
    Chaurasia, Ankur
    Kaushik, Neeraj
    Kaushik, Nidhi
    INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, VOL 2, 2019, 56 : 135 - 144
  • [27] Optimizing Cloud MapReduce for Processing Stream Data using Pipelining
    Karve, Rutvik
    Dahiphale, Devendra
    Chhajer, Amit
    UKSIM FIFTH EUROPEAN MODELLING SYMPOSIUM ON COMPUTER MODELLING AND SIMULATION (EMS 2011), 2011, : 344 - 349
  • [28] From the Cloud to the Atmosphere: Running MapReduce across Data Centers
    Jayalath, Chamikara
    Stephen, Julian
    Eugster, Patrick
    IEEE TRANSACTIONS ON COMPUTERS, 2014, 63 (01) : 74 - 87
  • [29] An intelligent surveillance video analytics framework using NACT-Hadoop/MapReduce on cloud services
    R. Nirmalan
    K. Gokulakrishnan
    Distributed and Parallel Databases, 2021, 39 : 873 - 889
  • [30] An Advanced MapReduce: Cloud MapReduce, Enhancements and Applications
    Dahiphale, Devendra
    Karve, Rutvik
    Vasilakos, Athanasios V.
    Liu, Huan
    Yu, Zhiwei
    Chhajer, Amit
    Wang, Jianmin
    Wang, Chaokun
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2014, 11 (01): : 101 - 115