Reproducible and Portable Big Data Analytics in the Cloud

被引:3
|
作者
Wang, Xin [1 ]
Guo, Pei [1 ]
Li, Xingyan [1 ]
Gangopadhyay, Aryya [1 ]
Busart, Carl E. [2 ]
Freeman, Jade [2 ]
Wang, Jianwu [1 ]
机构
[1] Univ Maryland, Dept Informat Syst, Baltimore, MD 21250 USA
[2] DEVCOM Army Res Lab, Adelphi, MD 20783 USA
基金
美国国家科学基金会; 美国国家航空航天局;
关键词
Big data analytics; cloud computing; portability; reproducibility; serverless;
D O I
10.1109/TCC.2023.3245081
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cloud computing has become a major approach to help reproduce computational experiments. Yet there are still two main difficulties in reproducing batch based Big Data analytics (including descriptive and predictive analytics) in the cloud. The first is how to automate end-to-end scalable execution of analytics including distributed environment provisioning, analytics pipeline description, parallel execution, and resource termination. The second is that an application developed for one cloud is difficult to be reproduced in another cloud, a.k.a. vendor lock-in problem. To tackle these problems, we leverage serverless computing and containerization techniques for automated scalable execution and reproducibility, and utilize the adapter design pattern to enable application portability and reproducibility across different clouds. We propose and develop an open-source toolkit that supports 1) fully automated end-to-end execution and reproduction via a single command, 2) automated data and configuration storage for each execution, 3) flexible client modes based on user preferences, 4) execution history query, and 5) simple reproduction of existing executions in the same environment or a different environment. We did extensive experiments on both AWS and Azure using four Big Data analytics applications that run on virtual CPU/GPU clusters. The experiments show our toolkit can achieve good execution performance, scalability, and efficient reproducibility for cloud-based Big Data analytics.
引用
收藏
页码:2966 / 2982
页数:17
相关论文
共 50 条
  • [21] A Cloud Framework for Big Data Analytics Workflows on Azure
    Marozzo, Fabrizio
    Talia, Domenico
    Trunfio, Paolo
    CLOUD COMPUTING AND BIG DATA, 2013, 23 : 182 - 191
  • [22] Big Data with Integrated Cloud Computing For Healthcare Analytics
    Jangade, Rajesh
    Chauhan, Ritu
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 4068 - 4071
  • [23] Big Data Analytics Labs in the Cloud Spaces for Teamwork
    Nunez-del-Prado, Miguel
    Rodriguez, Michelle
    2017 7TH WORLD ENGINEERING EDUCATION FORUM (WEEF), 2017, : 499 - 503
  • [24] A Cloud Based Environment for Big Data Analytics in Healthcare
    Chauhan, Ritu
    Jangade, Rajesh
    Mudunuru, Vimal K.
    PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR 2016), 2018, 614 : 315 - 321
  • [25] Editorial: Big scientific data analytics on HPC and cloud
    Wang, Jianwu
    Yin, Junqi
    Nguyen, Mai H.
    Wang, Jingbo
    Xu, Weijia
    FRONTIERS IN BIG DATA, 2024, 7
  • [26] Application of Big Data Analytics via Cloud Computing
    Yetis, Yunus
    Sara, Ruthvik Goud
    Erol, Berat A.
    Kaplan, Halid
    Akuzum, Abdurrahman
    Jamshidi, Mo
    2016 WORLD AUTOMATION CONGRESS (WAC), 2016,
  • [27] Big Data Analytics for Higher Education in The Cloud Era
    Al Hadwer, Ali
    Gillis, Dan
    Rezania, Davar
    2019 4TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (ICBDA 2019), 2019, : 203 - 207
  • [28] Decision Framework for Engaging Cloud-Based Big Data Analytics Vendors
    Ayaburi, Emmanuel Wusuhon Yanibo
    Maasberg, Michele
    Lee, Jaeung
    JOURNAL OF CASES ON INFORMATION TECHNOLOGY, 2020, 22 (04) : 60 - 74
  • [29] Cloud-Based Software Platform for Big Data Analytics in Smart Grids
    Simmhan, Yogesh
    Aman, Saima
    Kumbhare, Aloe
    Liu, Rongyang
    Stevens, Sam
    Zhou, Qunzhi
    Prasanna, Viktor
    COMPUTING IN SCIENCE & ENGINEERING, 2013, 15 (04) : 38 - 47
  • [30] Systematic Survey: Secure and Privacy-Preserving Big Data Analytics in Cloud
    Amaithi Rajan, Arun
    Vetriselvi, V.
    JOURNAL OF COMPUTER INFORMATION SYSTEMS, 2024, 64 (01) : 136 - 156