Reproducible and Portable Big Data Analytics in the Cloud

被引:3
作者
Wang, Xin [1 ]
Guo, Pei [1 ]
Li, Xingyan [1 ]
Gangopadhyay, Aryya [1 ]
Busart, Carl E. [2 ]
Freeman, Jade [2 ]
Wang, Jianwu [1 ]
机构
[1] Univ Maryland, Dept Informat Syst, Baltimore, MD 21250 USA
[2] DEVCOM Army Res Lab, Adelphi, MD 20783 USA
基金
美国国家科学基金会; 美国国家航空航天局;
关键词
Big data analytics; cloud computing; portability; reproducibility; serverless;
D O I
10.1109/TCC.2023.3245081
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cloud computing has become a major approach to help reproduce computational experiments. Yet there are still two main difficulties in reproducing batch based Big Data analytics (including descriptive and predictive analytics) in the cloud. The first is how to automate end-to-end scalable execution of analytics including distributed environment provisioning, analytics pipeline description, parallel execution, and resource termination. The second is that an application developed for one cloud is difficult to be reproduced in another cloud, a.k.a. vendor lock-in problem. To tackle these problems, we leverage serverless computing and containerization techniques for automated scalable execution and reproducibility, and utilize the adapter design pattern to enable application portability and reproducibility across different clouds. We propose and develop an open-source toolkit that supports 1) fully automated end-to-end execution and reproduction via a single command, 2) automated data and configuration storage for each execution, 3) flexible client modes based on user preferences, 4) execution history query, and 5) simple reproduction of existing executions in the same environment or a different environment. We did extensive experiments on both AWS and Azure using four Big Data analytics applications that run on virtual CPU/GPU clusters. The experiments show our toolkit can achieve good execution performance, scalability, and efficient reproducibility for cloud-based Big Data analytics.
引用
收藏
页码:2966 / 2982
页数:17
相关论文
共 50 条
  • [41] A cloud-based architecture for Big-Data Analytics in Smart Grid: A Proposal
    Mayilvaganan, M.
    Sabitha, M.
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2013, : 256 - 259
  • [42] CF4BDA: A Conceptual Framework for Big Data Analytics Applications in the Cloud
    Lu, Qinghua
    Li, Zheng
    Kihl, Maria
    Zhu, Liming
    Zhang, Weishan
    IEEE ACCESS, 2015, 3 : 1944 - 1952
  • [43] Cost-Effective Cloud Server Provisioning for Predictable Performance of Big Data Analytics
    Xu, Fei
    Zheng, Haoyue
    Jiang, Huan
    Shao, Wujie
    Liu, Haikun
    Zhou, Zhi
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (05) : 1036 - 1051
  • [44] A New Secure Model for The Use of Cloud Computing in Big Data Analytics
    Chaoui, Habiba
    Makdoun, Ibtissam
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, DATA AND CLOUD COMPUTING (ICC 2017), 2017,
  • [45] Towards cloud based big data analytics for smart future cities
    Khan Z.
    Anjum A.
    Soomro K.
    Tahir M.A.
    Journal of Cloud Computing, 4 (1)
  • [46] Autonomic deployment decision making for big data analytics applications in the cloud
    Lu, Qinghua
    Li, Zheng
    Zhang, Weishan
    Yang, Laurence T.
    SOFT COMPUTING, 2017, 21 (16) : 4501 - 4512
  • [47] Cloud Computing Enabled Big Multi-Omics Data Analytics
    Koppad, Saraswati
    Annappa, B.
    Gkoutos, Georgios, V
    Acharjee, Animesh
    BIOINFORMATICS AND BIOLOGY INSIGHTS, 2021, 15
  • [48] Structuring Cloud Computing Using Big Data Analytics Solution: A Survey
    Pawar, Vikul J.
    Kharat, Kailash D.
    Pardeshi, Suraj
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND ELECTRONICS SYSTEMS (ICCES), 2016, : 625 - 630
  • [49] Autonomic deployment decision making for big data analytics applications in the cloud
    Qinghua Lu
    Zheng Li
    Weishan Zhang
    Laurence T. Yang
    Soft Computing, 2017, 21 : 4501 - 4512
  • [50] ANALYSING CLOUD SIMULATION RESULTS USING BIG DATA ANALYTICS MODEL
    Baaskar, Hari R.
    Sujitha, K.
    Praveen, K.
    2015 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2015,