Accelerating Big Data Applications Using Lightweight Virtualization Framework on Enterprise Cloud

被引:0
|
作者
Bhimani, Janki [1 ]
Yang, Zhengyu [1 ]
Leeser, Miriam [1 ]
Mi, Ningfang [1 ]
机构
[1] Northeastern Univ, Dept Elect & Comp Engn, 360 Huntington Ave, Boston, MA 02115 USA
基金
美国国家科学基金会;
关键词
Virtual Machine (VM); Container; Docker; Apache Spark; Big Data; Cloud Computing; Resource Management; Task Assignment; Workload Evaluation & Estimation; MAPREDUCE;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Hypervisor-based virtualization technology has been successfully used to deploy high-performance and scalable infrastructure for Hadoop, and now Spark applications. Container-based virtualization techniques are becoming an important option, which is increasingly used due to their lightweight operation and better scaling when compared to Virtual Machines (VM). With containerization techniques such as Docker becoming mature and promising better performance, we can use Docker to speed-up big data applications. However, as applications have different behaviors and resource requirements, before replacing traditional hypervisor-based virtual machines with Docker, it is important to analyze and compare performance of applications running in the cloud with VMs and Docker containers. VM provides distributed resource management for different virtual machines running with their own allocated resources, while Docker relies on shared pool of resources among all containers. Here, we investigate the performance of different Apache Spark applications using both Virtual Machines (VM) and Docker containers. While others have looked at Docker's performance, this is the first study that compares these different virtualization frameworks for a big data enterprise cloud environment using Apache Spark. In addition to makespan and execution time, we also analyze different resource utilization (CPU, disk, memory, etc.) by Spark applications. Our results show that Spark using Docker can obtain speed-up of over 10 times when compared to using VM. However, we observe that this may not apply to all applications due to different workload patterns and different resource management schemes performed by virtual machines and containers. Our work can guide application developers, system administrators and researchers to better design and deploy big data applications on their platforms to improve the overall performance.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] The Impact of Cloud Computing and Open (Big) Data on the Enterprise Architecture Framework
    Lnenicka, Martin
    Komarkova, Jitka
    INNOVATION MANAGEMENT AND SUSTAINABLE ECONOMIC COMPETITIVE ADVANTAGE: FROM REGIONAL DEVELOPMENT TO GLOBAL GROWTH, VOLS I - VI, 2015, 2015, : 1679 - 1683
  • [2] Security measures for the Big Data, Virtualization and the Cloud Infrastructure
    Bahulikar, Saurabh
    2016 1ST INDIA INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING (IICIP), 2016,
  • [3] Towards Service-oriented Enterprise Architectures for Big Data Applications in the Cloud
    Zimmermann, Alfred
    Pretz, Michael
    Zimmermann, Gertrud
    Firesmith, Donald G.
    Petrov, Ilia
    El-Sheikh, Eman
    17TH IEEE INTERNATIONAL ENTERPRISE DISTRIBUTED OBJECT COMPUTING CONFERENCE WORKSHOPS (EDOCW 2013), 2013, : 130 - 135
  • [4] Big Data Drives Cloud Adoption in Enterprise
    Liu, Huan
    IEEE INTERNET COMPUTING, 2013, 17 (04) : 68 - 71
  • [5] PRIMEBALL: A Parallel Processing Framework Benchmark for Big Data Applications in the Cloud
    Ferrarons, Jaume
    Adhana, Mulu
    Colmenares, Carlos
    Pietrowska, Sandra
    Bentayeb, Fadila
    Darmont, Jerome
    PERFORMANCE CHARACTERIZATION AND BENCHMARKING, 2014, 8391 : 109 - 124
  • [6] ADiBA Big Data Adoption Framework: Accelerating Big Data Revolution 5.0
    Daut, Norhayati
    Salim, Naomie
    Howe, Chan Weng
    Zainal, Anazida
    Huspi, Sharin Hazlin
    Ghazali, Masitah
    Ahmad, Fatimah Shafinaz
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON DATA SCIENCE, TECHNOLOGY AND APPLICATIONS (DATA), 2022, : 549 - 556
  • [7] A Big Data Framework for Cloud Monitoring
    Zareian, Saeed
    Fokaefs, Marios
    Khazaei, Hamzeh
    Litoiu, Marin
    Zhang, Xi
    2016 IEEE/ACM 2ND INTERNATIONAL WORKSHOP ON BIG DATA SOFTWARE ENGINEERING (BIGDSE 2016), 2016, : 58 - 64
  • [8] Architecting Enterprise Applications for the Cloud: The Unicorn Universe Cloud Framework
    Beranek, Marek
    Stastny, Marek
    Kovar, Vladimir
    Feuerlicht, George
    SERVICE-ORIENTED COMPUTING - ICSOC 2017 WORKSHOPS, 2018, 10797 : 259 - 270
  • [9] ACCELERATING ELEARNING FOR CLOUD SERVICES AND BIG DATA PLATFORMS IN HEALTHCARE
    Suciu, George
    Todoran, Gyorgy
    Banica, Raluca
    ELEARNING VISION 2020!, VOL I, 2016, : 304 - 311
  • [10] Data Organization Patterns for Cloud Enterprise Applications
    Wei, Yi
    Wu, Lei
    Liu, Shijun
    Pan, Li
    Meng, Xiangxu
    2014 ASIA-PACIFIC SERVICES COMPUTING CONFERENCE (APSCC), 2014, : 1 - 7