Data Analytics in the Cloud with Flexible MapReduce Workflows

被引:0
|
作者
Goncalves, Carlos [1 ,2 ]
Assuncao, Luis [1 ,2 ]
Cunha, Jose C. [2 ]
机构
[1] Univ Nova Lisboa, Inst Super Engn Lisboa, P-1200 Lisbon, Portugal
[2] Univ Nova Lisboa, Fac Ciencias Tecnol, Dept Informat, CITI, P-1200 Lisbon, Portugal
来源
2012 IEEE 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM) | 2012年
关键词
MapReduce; Workflow; Text Mining; Cloud; MAP-REDUCE;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data analytic applications are characterized by large data sets that are subject to a series of processing phases. Some of these phases are executed sequentially but others can be executed concurrently or in parallel on clusters, grids or clouds. The MapReduce programming model has been applied to process large data sets in cluster and cloud environments. For developing an application using MapReduce there is a need to install/configure/access specific frameworks such as Apache Hadoop or Elastic MapReduce in Amazon Cloud It would he desirable to provide more flexibility in adjusting such configurations according to the application characteristics. Furthermore the composition. of the multiple phases of a data analytic application requires the specification of all the phases and their orchestration. The original MapReduce model and environment lacks flexible support for such configuration and composition. Recognizing that scientific workflows have been successfully applied to modeling complex applications, this paper describes our experiments on implementing MapReduce as sub-workflows in the A WARD framework (Autonomic Workflow Activities Reconfigurable and Dynamic). A text mining data analytic application is modeled as a complex workflow with multiple phases, where individual workflow nodes support MapReduce computations. As in typical MapReduce environments, the end user only needs to define the application algorithms for input data processing and for the map and reduce functions. In the paper we present experimental results when using the A WARD framework to execute MapReduce workflows deployed over multiple Amazon EC2 (Elastic Compute Cloud) instances.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Mastiff: A MapReduce-based System for Time-based Big Data Analytics
    Guo, Sijie
    Xiong, Jin
    Wang, Weiping
    Lee, Rubao
    2012 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2012, : 72 - 80
  • [42] Big Data Analytics:Predicting Academic Course Preference Using Hadoop Inspired MapReduce
    Guleria, Pratiyush
    Sood, Manu
    2017 FOURTH INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP), 2017, : 328 - 331
  • [43] Cloud-BS: A MapReduce-based bisulfite sequencing aligner on cloud
    Choi, Joungmin
    Park, Yoonjae
    Kim, Sun
    Chae, Heejoon
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2018, 16 (06)
  • [44] Experiences Teaching MapReduce in the Cloud
    Rabkin, Ariel
    Reiss, Charles
    Katz, Randy
    Patterson, David
    SIGCSE 12: PROCEEDINGS OF THE 43RD ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, 2011, : 601 - 606
  • [45] Cuckoo: Opportunistic MapReduce on Ephemeral and Heterogeneous Cloud Resources
    Dartois, Jean-Emile
    Ribeiro, Heverson B.
    Boukhobza, Jalil
    Barais, Olivier
    2019 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (IEEE CLOUD 2019), 2019, : 396 - 403
  • [46] Big Data with Cloud Computing: an insight on the computing environment, MapReduce, and programming frameworks
    Fernandez, Alberto
    del Rio, Sara
    Lopez, Victoria
    Bawakid, Abdullah
    del Jesus, Maria J.
    Benitez, Jose M.
    Herrera, Francisco
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 4 (05) : 380 - 409
  • [47] System G Data Store: Big, Rich Graph Data Analytics in the Cloud
    Canim, Mustafa
    Chang, Yuan-Chi
    PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E 2013), 2013, : 328 - 337
  • [48] Using the MapReduce Approach for the Spatio-Temporal Data Analytics in Road Traffic Crowdsensing Application
    Armoogum, Sandhya
    Munchetty-Chendriah, Shevam
    COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2017, 2018, 252 : 405 - 415
  • [49] Mapreduce fuzzy c-means ensemble clustering with gentle adaboost for big data analytics
    Padmapriya K.M.
    Anandhi B.
    Vijayakumar M.
    International Journal of Business Intelligence and Data Mining, 2021, 19 (02): : 170 - 188
  • [50] Mobile Sensor Data Classification for Human Activity Recognition using MapReduce on Cloud
    Paniagua, Carlos
    Flores, Huber
    Srirama, Satish Narayana
    ANT 2012 AND MOBIWIS 2012, 2012, 10 : 585 - 592