MERPSYS: An environment for simulation of parallel application execution on large scale HPC systems

被引:21
作者
Czarnul, Pawel [1 ]
Kuchta, Jaroslaw [1 ]
Matuszek, Mariusz [1 ]
Proficz, Jerzy [2 ]
Rosciszewski, Pawel [1 ]
Wojcik, Michal [1 ]
Szymanski, Julian [1 ]
机构
[1] Gdansk Univ Technol, Fac Elect Telecommun & Informat, Dept Comp Architecture, Narutowicza 11-12, PL-80233 Gdansk, Poland
[2] Acad Comp Ctr, Narutowicza 11-12, PL-80233 Gdansk, Poland
关键词
Parallel computing; Performance simulation; Simulation environment; Cluster systems; CLUSTERS; TOOLKIT;
D O I
10.1016/j.simpat.2017.05.009
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper we present a new environment called MERPSYS that allows simulation of parallel application execution time on cluster-based systems. The environment offers a modeling application using the Java language extended with methods representing message passing type communication routines. It also offers a graphical interface for building a system model that incorporates various hardware components such as CPUs, GPUs, interconnects and easily allows various formulas to model execution and communication times of particular blocks of code. A simulator engine within the MERPSYS environment simulates execution of the application that consists of processes with various codes, to which distinct labels are assigned. The simulator runs one Java thread per label and scales computations and communication times adequately. This approach allows fast coarse-grained simulation of large applications on large-scale systems. We have performed tests and verification of results from the simulator for three real parallel applications implemented with C/MPI and run on real HPC clusters: a master-slave code computing similarity measures of points in a multidimensional space, a geometric single program multiple data parallel application with heat distribution and a divide-and-conquer application performing merge sort. In all cases the simulator gave results very similar to the real ones on configurations tested up to 1000 processes. Furthermore, it allowed us to make predictions of execution times on configurations beyond the hardware resources available to us. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:124 / 140
页数:17
相关论文
共 34 条
[1]  
Ahmed A, 2014, IEEE INT ADV COMPUT, P866, DOI 10.1109/IAdCC.2014.6779436
[2]  
[Anonymous], 2005, DATA MINING
[3]   GSSIM - A tool for distributed computing experiments [J].
Bak, Slawomir ;
Krystek, Marcin ;
Kurowski, Krzysztof ;
Oleksiak, Ariel ;
Piatek, Wojciech ;
Weglarz, Jan .
SCIENTIFIC PROGRAMMING, 2011, 19 (04) :231-251
[4]  
Bashar A., INT J COMPUT INF ENG, V1, P1
[5]   Parallel geometric multigrid for global weather prediction [J].
Buckeridge, Sean ;
Scheichl, Robert .
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, 2010, 17 (2-3) :325-342
[6]   GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing [J].
Buyya, R ;
Murshed, M .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2002, 14 (13-15) :1175-1220
[7]   CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms [J].
Calheiros, Rodrigo N. ;
Ranjan, Rajiv ;
Beloglazov, Anton ;
De Rose, Cesar A. F. ;
Buyya, Rajkumar .
SOFTWARE-PRACTICE & EXPERIENCE, 2011, 41 (01) :23-50
[8]   SimGrid: a Generic Framework for Large-Scale Distributed Experiments [J].
Casanova, Henri ;
Legrand, Arnaud ;
Quinson, Martin .
2008 UKSIM TENTH INTERNATIONAL CONFERENCE ON COMPUTER MODELING AND SIMULATION, 2008, :126-131
[9]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[10]  
Czarnul P, 2004, LECT NOTES COMPUT SC, V3241, P234