IPSO: A Scaling Model for Data-Intensive Applications

被引:1
|
作者
Li, Zhongwei [1 ]
Duan, Feng [1 ]
Minh Nguyen [1 ]
Che, Hao [1 ]
Lei, Yu [1 ]
Jiang, Hong [1 ]
机构
[1] Univ Texas Arlington, Dept Comp Sci & Engn, Arlington, TX 76019 USA
关键词
scale-out workload; cloud computing; speedup; performance evaluation; Amdahl's Law; Gustafson's Law; AMDAHLS LAW;
D O I
10.1109/ICDCS.2019.00032
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Today's data center applications are predominantly data-intensive, calling for scaling out the workload to a large number of servers for parallel processing. Unfortunately, the existing scaling laws, notably, Amdahl's and Gustafson's laws are inadequate to characterize the scaling properties of dataintensive workloads. To fill this void, in this paper, we put forward a new scaling model, called In-Proportion and Scale-Out-induced scaling model (IPSO). IPSO generalizes the existing scaling models in two important aspects. First, it accounts for the possible in-proportion scaling, i.e., the scaling of the serial portion of the workload in proportion to the scaling of the parallelizable portion of the workload. Second, it takes into account the possible scaleout-induced scaling, i.e., the scaling of the collective overhead or workload induced by scaling out. IPSO exposes scaling properties of data-intensive workloads, rendering the existing scaling laws its special cases. In particular, IPSO reveals two new pathological scaling properties. Namely, the speedup may level off even in the case of the fixed-time workload underlying Gustafson's law, and it may peak and then fall as the system scales out. Extensive MapReduce and Spark-based case studies demonstrate that IPSO successfully captures diverse scaling properties of data-intensive applications. As a result, it can serve as a diagnostic tool to gain insights on or even uncover counter-intuitive root causes of observed scaling behaviors, especially pathological ones, for data-intensive applications. Finally, preliminary results also demonstrate the promising prospects of IPSO to facilitate effective resource provisioning to achieve the best speedup-versus-cost tradeoffs for data-intensive applications.
引用
收藏
页码:238 / 248
页数:11
相关论文
共 50 条
  • [41] A framework for data partitioning for C++ data-intensive applications
    Milidonis, A
    Dimitroulakos, G
    Galanis, MD
    Kakarountas, AP
    Theodoridis, G
    Goutis, C
    Catthoor, F
    DESIGN AUTOMATION FOR EMBEDDED SYSTEMS, 2004, 9 (02) : 101 - 121
  • [42] Improvement Of Data Throughput In Data-Intensive Cloud Computing Applications
    Ibrahim, Ibrahim Adel
    Bassiouni, Mostafa
    2019 IEEE FIFTH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2019), 2019, : 49 - 54
  • [43] Decoupling computation and data scheduling in distributed data-intensive applications
    Ranganathan, K
    Foster, I
    11TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, PROCEEDINGS, 2002, : 352 - 358
  • [44] Heuristic Data Placement for Data-Intensive Applications in Heterogeneous Cloud
    Zhao, Qing
    Xiong, Congcong
    Wang, Peng
    JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, 2016, 2016
  • [45] Testing Data Consistency of Data-Intensive Applications Using QuickCheck
    Castro, Laura M.
    Arts, Thomas
    ELECTRONIC NOTES IN THEORETICAL COMPUTER SCIENCE, 2011, 271 : 41 - 62
  • [46] Deadline based scheduling for data-intensive applications in clouds
    Fu Xiong
    Cang Yeliang
    Zhu Lipeng
    Hu Bin
    Deng Song
    Wang Dong
    TheJournalofChinaUniversitiesofPostsandTelecommunications, 2016, 23 (06) : 8 - 15
  • [47] Level of detail concepts in data-intensive Web applications
    Comai, S
    WEB ENGINEERING, PROCEEDINGS, 2005, 3579 : 209 - 220
  • [48] An adaptive meta-scheduler for data-intensive applications
    Shi, XH
    Jin, H
    Qiang, WZ
    Zou, DQ
    GRID AND COOPERATIVE COMPUTING, PT 2, 2004, 3033 : 830 - 837
  • [49] NSM: A distributed storage architecture for data-intensive applications
    Ali, Z
    Malluhi, Q
    20TH IEEE/11TH NASA GODDARD CONFERENCE ON MASS STORAGE AND TECHNOLOGIES (MSST 2003), PROCEEDINGS, 2003, : 87 - 91
  • [50] A Model and Survey of Distributed Data-Intensive Systems
    Margara, Alessandro
    Cugola, Gianpaolo
    Felicioni, Nicolo
    Cilloni, Stefano
    ACM COMPUTING SURVEYS, 2024, 56 (01)