New Performance Modeling Methods for Parallel Data Processing Applications

被引:11
|
作者
Bhimani, Janki [1 ]
Mi, Ningfang [1 ]
Leeser, Miriam [1 ]
Yang, Zhengyu [1 ]
机构
[1] Northeastern Univ, 360 Huntington Ave, Boston, MA 02115 USA
来源
ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION | 2019年 / 29卷 / 03期
基金
美国国家科学基金会;
关键词
Performance modeling; queuing theory; Markov model; distributed systems; execution time; parallel calculation; communication network; prediction;
D O I
10.1145/3309684
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Predicting the performance of an application running on parallel computing platforms is increasingly becoming important because of its influence on development time and resource management. However, predicting the performance with respect to parallel processes is complex for iterative and multi-stage applications. This research proposes a performance approximation approach FiM to predict the calculation time with FiM-Cal and communication time with FiM-Com of an application running on a distributed framework. FiM-Cal consists of two key components that are coupled with each other: (1) a Stochastic. Markov Model to capture non-deterministic runtime that often depends on parallel resources, e.g., number of processes, and (2) a machine-learning model that extrapolates the parameters for calibrating our Markov model when we have changes in application parameters such as dataset. Along with the parallel calculation time, parallel computing platforms consume some data transfer time to communicate among different nodes. FiM-Com consists of a simulation queuing model to quickly estimate communication time. Our new modeling approach considers different design choices along multiple dimensions, namely (i) process-level parallelism, (ii) distribution of cores on multi-processor platform, (iii) application related parameters, and (iv) characteristics of datasets. The major contribution of our prediction approach is that FiM can provide an accurate prediction of parallel processing time for the datasets that have a much larger size than that of the training datasets. We evaluate our approach with NAS Parallel Benchmarks and real iterative data processing applications. We compare the predicted results (e.g., end-to-end execution time) with actual experimental measurements on a real distributed platform. We also compare our work with an existing prediction technique based on machine learning. We rank the number of processes according to the actual and predicted results from FLM and calculate the correlation between the actual and predicted rankings. Our results show that FiM obtains a high correlation in the range of 0.80 to 0.99, which indicates considerable accuracy of our technique. Such prediction provides data analysts a useful insight of optimal configuration of parallel resources (e.g., number of processes and number of cores) and also helps system designers to investigate the impact of changes in application parameters on system performance.
引用
收藏
页数:24
相关论文
共 50 条
  • [31] Automatic performance modeling approach to performance profiling of Web applications
    Huang, X. (huangxiang@otcaix.iscas.ac.cn), 1600, Chinese Academy of Sciences (23): : 786 - 801
  • [32] Drug Combination Modeling: Methods and Applications in Drug Development
    Pearson, Rachael A.
    Wicha, Sebastian G.
    Okour, Malek
    JOURNAL OF CLINICAL PHARMACOLOGY, 2023, 63 (02) : 151 - 165
  • [33] The Applications of Soft Computing Methods for Seepage Modeling: A Review
    Nourani, Vahid
    Behfar, Nazanin
    Dabrowska, Dominika
    Zhang, Yongqiang
    WATER, 2021, 13 (23)
  • [34] Comparison of New Modeling Methods for Postnatal Weight in ELBW Infants Using Prenatal and Postnatal Data
    Porcelli, Peter J.
    Rosenbloom, S. Trent
    JOURNAL OF PEDIATRIC GASTROENTEROLOGY AND NUTRITION, 2014, 59 (01) : E2 - E8
  • [35] Modeling and Optimization of Performance and Cost of Serverless Applications
    Lin, Changyuan
    Khazaei, Hamzeh
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (03) : 615 - 632
  • [36] Performance Modeling of HPC Applications on Overcommitted Systems
    Minami, Shohei
    Endo, Toshio
    Nomura, Akihiro
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING IN ASIA-PACIFIC REGION (HPC ASIA 2021), 2020, : 129 - 132
  • [37] Phase Aware Performance Modeling for Cloud Applications
    Bhattacharyya, Arnamoy
    Amza, Cristiana
    de Lara, Eyal
    2020 IEEE 13TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2020), 2020, : 507 - 511
  • [38] Transient and steady-state performance modeling of parallel processors
    Mickle, MH
    APPLIED MATHEMATICAL MODELLING, 1998, 22 (07) : 533 - 543
  • [39] Performance Modeling Tools for Parallel Sparse Linear Algebra Computations
    Cicotti, Pietro
    Li, Xiaoye S.
    Baden, Scott B.
    PARALLEL COMPUTING: FROM MULTICORES AND GPU'S TO PETASCALE, 2010, 19 : 83 - 90
  • [40] Characterization and Modeling of PIDX Parallel I/O for Performance Optimization
    Kumar, Sidharth
    Saha, Avishek
    Vishwanath, Venkatram
    Carns, Philip
    Schmidt, John A.
    Scorzelli, Giorgio
    Kolla, Hemanth
    Grout, Ray
    Latham, Robert
    Ross, Robert
    Papka, Michael E.
    Chen, Jacqueline
    Pascucci, Valerio
    2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2013,