Cost-Effective Cloud Server Provisioning for Predictable Performance of Big Data Analytics

被引:32
作者
Xu, Fei [1 ]
Zheng, Haoyue [1 ]
Jiang, Huan [1 ]
Shao, Wujie [1 ]
Liu, Haikun [2 ]
Zhou, Zhi [3 ]
机构
[1] East China Normal Univ, Dept Comp Sci & Technol, Shanghai Key Lab Multidimens Informat Proc, 3663 N Zhongshan Rd, Shanghai 200062, Peoples R China
[2] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Cluster & Grid Comp Lab, Serv Comp Technol & Syst Lab, 1037 Luoyu Rd, Wuhan 430074, Hubei, Peoples R China
[3] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangdong Key Lab Big Data Anal & Proc, 132 E Waihuan Rd, Guangzhou 510006, Guangdong, Peoples R China
关键词
Predictable performance; big data analytics; cloud computing; transient server provisioning; data checkpointing;
D O I
10.1109/TPDS.2018.2873397
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Cloud datacenters are underutilized due to server over-provisioning. To increase datacenter utilization, cloud providers offer users an option to run workloads such as big data analytics on the underutilized resources, in the form of cheap yet revocable transient servers (e.g., EC2 spot instances, GCE preemptible instances). Though at highly reduced prices, deploying big data analytics on the unstable cloud transient servers can severely degrade the job performance due to instance revocations. To tackle this issue, this paper proposes iSpot, a cost-effective transient server provisioning framework for achieving predictable performance in the cloud, by focusing on Spark as a representative Directed Acyclic Graph (DAG)-style big data analytics workload. It first identifies the stable cloud transient servers during the job execution by devising an accurate Long Short-Term Memory (LSTM)-based price prediction method. Leveraging automatic job profiling and the acquired DAG information of stages, we further build an analytical performance model and present a lightweight critical data checkpointing mechanism for Spark, to enable our design of iSpot provisioning strategy for guaranteeing the job performance on stable transient servers. Extensive prototype experiments on both EC2 spot instances and GCE preemptible instances demonstrate that, iSpot is able to guarantee the performance of big data analytics running on cloud transient servers while reducing the job budget by up to 83.8 percent in comparison to the state-of-the-art server provisioning strategies, yet with acceptable runtime overhead.
引用
收藏
页码:1036 / 1051
页数:16
相关论文
共 56 条
  • [1] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
  • [2] Alipourfard O, 2017, PROCEEDINGS OF NSDI '17: 14TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, P469
  • [3] [Anonymous], 2018, GCLOUD COMPUTE
  • [4] [Anonymous], 2018, Amazon EC2 Spot instances
  • [5] [Anonymous], 2018, FREQUENT ITEMSET MIN
  • [6] [Anonymous], 2018, HOW SPOT FLEET WORKS
  • [7] [Anonymous], 2018, Google Cloud Storage Incident #18003
  • [8] [Anonymous], 2018, ALIYUN ECS SPOT INST
  • [9] [Anonymous], AMAZON SIMPLE STORAG
  • [10] [Anonymous], 2018, Autoregressive integrated moving average