Data Analytics in the Cloud with Flexible MapReduce Workflows

被引:0
|
作者
Goncalves, Carlos [1 ,2 ]
Assuncao, Luis [1 ,2 ]
Cunha, Jose C. [2 ]
机构
[1] Univ Nova Lisboa, Inst Super Engn Lisboa, P-1200 Lisbon, Portugal
[2] Univ Nova Lisboa, Fac Ciencias Tecnol, Dept Informat, CITI, P-1200 Lisbon, Portugal
来源
2012 IEEE 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM) | 2012年
关键词
MapReduce; Workflow; Text Mining; Cloud; MAP-REDUCE;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data analytic applications are characterized by large data sets that are subject to a series of processing phases. Some of these phases are executed sequentially but others can be executed concurrently or in parallel on clusters, grids or clouds. The MapReduce programming model has been applied to process large data sets in cluster and cloud environments. For developing an application using MapReduce there is a need to install/configure/access specific frameworks such as Apache Hadoop or Elastic MapReduce in Amazon Cloud It would he desirable to provide more flexibility in adjusting such configurations according to the application characteristics. Furthermore the composition. of the multiple phases of a data analytic application requires the specification of all the phases and their orchestration. The original MapReduce model and environment lacks flexible support for such configuration and composition. Recognizing that scientific workflows have been successfully applied to modeling complex applications, this paper describes our experiments on implementing MapReduce as sub-workflows in the A WARD framework (Autonomic Workflow Activities Reconfigurable and Dynamic). A text mining data analytic application is modeled as a complex workflow with multiple phases, where individual workflow nodes support MapReduce computations. As in typical MapReduce environments, the end user only needs to define the application algorithms for input data processing and for the map and reduce functions. In the paper we present experimental results when using the A WARD framework to execute MapReduce workflows deployed over multiple Amazon EC2 (Elastic Compute Cloud) instances.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Big data analytics for retail industry using MapReduce-Apriori framework
    Verma, Neha
    Malhotra, Dheeraj
    Singh, Jatinder
    JOURNAL OF MANAGEMENT ANALYTICS, 2020, 7 (03) : 424 - 442
  • [32] Big Data: Tutorial and guidelines on information and process fusion for analytics algorithms with MapReduce
    Ramirez-Gallego, Sergio
    Fernandez, Alberto
    Garcia, Salvador
    Chen, Min
    Herrera, Francisco
    INFORMATION FUSION, 2018, 42 : 51 - 61
  • [33] Programming Visual and Script-based Big Data Analytics Workflows on Clouds
    Belcastro, Loris
    Marozzo, Fabrizio
    Talia, Domenico
    Trunfio, Paolo
    BIG DATA AND HIGH PERFORMANCE COMPUTING, 2015, 26 : 18 - 31
  • [34] Providing Caches for Reduce Tasks in a MapReduce Cloud
    Huang, Tzu-Chi
    Chu, Kuo-Chih
    Chen, Jhe-Ru
    Zeng, Xue-Yan
    Shieh, Ce-Kuen
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA), 2016, : 252 - 255
  • [35] A Performance Analysis of MapReduce Applications on Big Data in Cloud based Hadoop
    Gohil, Parth
    Garg, Dweepna
    Panchal, Bakul
    2014 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2014,
  • [36] Smart MapReduce Cloud: Applying Extra Processing to Intermediate Data on Demand
    Huang, Tzu-Chi
    Chu, Kuo-Chih
    Tsai, Ming-Fong
    2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 799 - 804
  • [37] MapReduce++ - Efficient processing of MapReduce jobs in the cloud
    Zhang, Guigang
    Li, Chao
    Zhang, Yong
    Xing, Chunxiao
    Yang, Jijiang
    Journal of Computational Information Systems, 2012, 8 (14): : 5757 - 5764
  • [38] Application of Big Data Analytics via Cloud Computing
    Yetis, Yunus
    Sara, Ruthvik Goud
    Erol, Berat A.
    Kaplan, Halid
    Akuzum, Abdurrahman
    Jamshidi, Mo
    2016 WORLD AUTOMATION CONGRESS (WAC), 2016,
  • [39] Hybrid Data Mining Algorithm in Cloud Computing using MapReduce Framework
    Sahay, Siddharth
    Khetarpal, Suruchi
    Pradhan, Tribikram
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION CONTROL AND COMPUTING TECHNOLOGIES (ICACCCT), 2016, : 507 - 511
  • [40] Efficient Alignment of Next Generation Sequencing Data Using MapReduce on the Cloud
    AlSaad, Rawan
    Malluhi, Qutaibah
    Abouelhoda, Mohamed
    2012 CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE (CIBEC), 2012, : 18 - 22