Accelerating Big Data Applications Using Lightweight Virtualization Framework on Enterprise Cloud

被引:0
|
作者
Bhimani, Janki [1 ]
Yang, Zhengyu [1 ]
Leeser, Miriam [1 ]
Mi, Ningfang [1 ]
机构
[1] Northeastern Univ, Dept Elect & Comp Engn, 360 Huntington Ave, Boston, MA 02115 USA
基金
美国国家科学基金会;
关键词
Virtual Machine (VM); Container; Docker; Apache Spark; Big Data; Cloud Computing; Resource Management; Task Assignment; Workload Evaluation & Estimation; MAPREDUCE;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Hypervisor-based virtualization technology has been successfully used to deploy high-performance and scalable infrastructure for Hadoop, and now Spark applications. Container-based virtualization techniques are becoming an important option, which is increasingly used due to their lightweight operation and better scaling when compared to Virtual Machines (VM). With containerization techniques such as Docker becoming mature and promising better performance, we can use Docker to speed-up big data applications. However, as applications have different behaviors and resource requirements, before replacing traditional hypervisor-based virtual machines with Docker, it is important to analyze and compare performance of applications running in the cloud with VMs and Docker containers. VM provides distributed resource management for different virtual machines running with their own allocated resources, while Docker relies on shared pool of resources among all containers. Here, we investigate the performance of different Apache Spark applications using both Virtual Machines (VM) and Docker containers. While others have looked at Docker's performance, this is the first study that compares these different virtualization frameworks for a big data enterprise cloud environment using Apache Spark. In addition to makespan and execution time, we also analyze different resource utilization (CPU, disk, memory, etc.) by Spark applications. Our results show that Spark using Docker can obtain speed-up of over 10 times when compared to using VM. However, we observe that this may not apply to all applications due to different workload patterns and different resource management schemes performed by virtual machines and containers. Our work can guide application developers, system administrators and researchers to better design and deploy big data applications on their platforms to improve the overall performance.
引用
收藏
页数:7
相关论文
共 50 条
  • [11] Automatic Data Reuse for Accelerating Data Intensive Applications in the Cloud
    Han, Liangxiu
    Xie, Zheng
    Baldock, Richard
    2013 8TH INTERNATIONAL CONFERENCE FOR INTERNET TECHNOLOGY AND SECURED TRANSACTIONS (ICITST), 2013, : 596 - +
  • [12] Optimising enterprise financial sharing process using cloud computing and big data approaches
    Deng, Yimin
    INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING, 2022, 13 (2-3) : 272 - 281
  • [13] Enterprise Knowledge Collaboration for Decision Making Based on Cloud Big Data Using HDFS
    Azim, Riasat
    Barua, Shawon
    Rahman, Jesmin
    Rahman, A. B. M. Munibur
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON INNOVATION AND MANAGEMENT, VOLS I & II, 2016, : 923 - 926
  • [14] AN ENTERPRISE ORIENTED VIEW ON THE CLOUD INTEGRATION APPROACHES - HYBRID CLOUD AND BIG DATA
    Palanimalai, Shanmugasundaram
    Paramasivam, Ilango
    BIG DATA, CLOUD AND COMPUTING CHALLENGES, 2015, 50 : 163 - 168
  • [15] Accelerating Big Data Infrastructure and Applications (Ongoing collaboration)
    Brown, Kevin
    Xu, Tianqi
    Iwabuchi, Keita
    Sato, Kento
    Moody, Adam
    Mohror, Kathryn
    Jain, Nikhil
    Bhatele, Abhinav
    Schulz, Martin
    Pearce, Roger
    Gokhale, Maya
    Matuoka, Satoshi
    2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS (ICDCSW), 2017, : 343 - 347
  • [16] CF4BDA: A Conceptual Framework for Big Data Analytics Applications in the Cloud
    Lu, Qinghua
    Li, Zheng
    Kihl, Maria
    Zhu, Liming
    Zhang, Weishan
    IEEE ACCESS, 2015, 3 : 1944 - 1952
  • [17] Evaluation of HPC-Big Data Applications Using Cloud Platforms
    Salaria, Shweta
    Brown, Kevin
    Jitsumoto, Hideyuki
    Matsuoka, Satoshi
    2017 17TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2017, : 1053 - 1061
  • [18] Router Framework for Secured Network Virtualization in Data Center of IaaS Cloud
    Nimkar, Anant V.
    Ghosh, Soumya K.
    PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, NETWORKING AND INFORMATICS, ICACNI 2015, VOL 2, 2016, 44 : 475 - 483
  • [19] An Implementation of Private Cloud's Service Model (IaaS) using Lightweight Virtualization
    Bansal, Amit Kumar
    Gupta, Vipin
    Kaur, Harsimran
    2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, AND OPTIMIZATION TECHNIQUES (ICEEOT), 2016, : 950 - 954
  • [20] Accelerating Big Data Analytics Using FPGAs
    Neshatpour, Katayoun
    Malik, Maria
    Ghodrat, Mohammad Ali
    Homayoun, Houman
    2015 IEEE 23RD ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2015, : 164 - 164