Reproducible and Portable Big Data Analytics in the Cloud

被引:3
|
作者
Wang, Xin [1 ]
Guo, Pei [1 ]
Li, Xingyan [1 ]
Gangopadhyay, Aryya [1 ]
Busart, Carl E. [2 ]
Freeman, Jade [2 ]
Wang, Jianwu [1 ]
机构
[1] Univ Maryland, Dept Informat Syst, Baltimore, MD 21250 USA
[2] DEVCOM Army Res Lab, Adelphi, MD 20783 USA
基金
美国国家科学基金会; 美国国家航空航天局;
关键词
Big data analytics; cloud computing; portability; reproducibility; serverless;
D O I
10.1109/TCC.2023.3245081
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cloud computing has become a major approach to help reproduce computational experiments. Yet there are still two main difficulties in reproducing batch based Big Data analytics (including descriptive and predictive analytics) in the cloud. The first is how to automate end-to-end scalable execution of analytics including distributed environment provisioning, analytics pipeline description, parallel execution, and resource termination. The second is that an application developed for one cloud is difficult to be reproduced in another cloud, a.k.a. vendor lock-in problem. To tackle these problems, we leverage serverless computing and containerization techniques for automated scalable execution and reproducibility, and utilize the adapter design pattern to enable application portability and reproducibility across different clouds. We propose and develop an open-source toolkit that supports 1) fully automated end-to-end execution and reproduction via a single command, 2) automated data and configuration storage for each execution, 3) flexible client modes based on user preferences, 4) execution history query, and 5) simple reproduction of existing executions in the same environment or a different environment. We did extensive experiments on both AWS and Azure using four Big Data analytics applications that run on virtual CPU/GPU clusters. The experiments show our toolkit can achieve good execution performance, scalability, and efficient reproducibility for cloud-based Big Data analytics.
引用
收藏
页码:2966 / 2982
页数:17
相关论文
共 50 条
  • [31] Heuristic Based Resource Provisioning Approach for Big Data Analytics in Cloud Environment
    Wu Y.-W.
    Wu H.
    Ren J.
    Zhang W.-B.
    Wei J.
    Wang T.
    Zhong H.
    Ruan Jian Xue Bao/Journal of Software, 2020, 31 (06): : 1860 - 1874
  • [32] A Scalable and Productive Workflow-based Cloud Platform for Big Data Analytics
    Chen, Chao
    Yan, Yuzhong
    Huang, Lei
    Dong, Xishuang
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA), 2016, : 104 - 108
  • [33] The Knowledge Management Context of Cloud Based big Data Analytics
    Neaga, Irina
    Liu, Shaofeng
    PROCEEDINGS OF THE 15TH EUROPEAN CONFERENCE ON KNOWLEDGE MANAGEMENT (ECKM 2014), VOLS 1-3, 2014, : 1339 - 1343
  • [34] Cloud Big Data Lake for Advanced Analytics in Semiconductor Manufacturing
    Sun, Susan
    Ye, Jeff
    Schwarthoff, Hubert
    Rosin, Jon
    Vakkalagadda, Varalakshmi
    Chang, Jimmy
    Ubbara, Sesidhar Reddy
    Chinthakindi, Anil
    2024 35TH ANNUAL SEMI ADVANCED SEMICONDUCTOR MANUFACTURING CONFERENCE, ASMC, 2024,
  • [35] Integration of Cloud and Big Data Analytics for Future Smart Cities
    Kang, Jungho
    Park, Jong Hyuk
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2019, 15 (06): : 1259 - 1264
  • [36] A Productive Cloud Computing Platform Research for Big Data Analytics
    Yan, Yuzhong
    Chen, Chao
    Huang, Lei
    2015 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), 2015, : 499 - 502
  • [37] Big Data Analytics from the Rich Cloud to the Frugal Edge
    Awaysheh, Feras M.
    Tommasini, Riccardo
    Awad, Ahmed
    2023 IEEE INTERNATIONAL CONFERENCE ON EDGE COMPUTING AND COMMUNICATIONS, EDGE, 2023, : 319 - 329
  • [38] A Comparative Investigation on the Use of Cloud Computing for Big Data Analytics
    Lew, Wei Chun
    Rana, Muhammad Ehsan
    Hameed, Vazeerudeen Abdul
    PROCEEDINGS OF THE 2022 16TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM 2022), 2022,
  • [39] Detection of SLA Violation for Big Data Analytics Applications in Cloud
    Zeng, Xuezhi
    Garg, Saurabh
    Barika, Mutaz
    Bista, Sanat
    Puthal, Deepak
    Zomaya, Albert Y.
    Ranjan, Rajiv
    IEEE TRANSACTIONS ON COMPUTERS, 2021, 70 (05) : 746 - 758
  • [40] A Critical Review of Cloud Computing Environment for Big Data Analytics
    Dzulhikam, Dzulaisar
    Rana, Muhammad Ehsan
    2022 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATIONS (DASA), 2022, : 76 - 81