Optimization of Parallel Discrete Event Simulator for Multi-core Systems

被引:24
作者
Jagtap, Deepak [1 ]
Abu-Ghazaleh, Nael [1 ]
Ponomarev, Dmitry [1 ]
机构
[1] SUNY Binghamton, Dept Comp Sci, Binghamton, NY 13901 USA
来源
2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS) | 2012年
关键词
CHIP;
D O I
10.1109/IPDPS.2012.55
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Parallel Discrete Event Simulation (PDES) can substantially improve performance and capacity of simulation, allowing the study of larger, more detailed models, in shorter times. PDES is a fine-grained parallel application whose performance and scalability are limited by communication latencies. Traditionally, PDES simulation kernels use processes that communicate using message passing; shared memory is used to optimize message passing for processes running on the same machine. We report on our experiences in implementing a thread-based version of the ROSS simulator. The multithreaded implementation eliminates multiple message copying and significantly minimizes synchronization delays. We study the performance of the simulator on two hardware platforms: a Core i7 machine and a 48-core AMD Opteron Magny-Cours system. We identify performance bottlenecks and propose and evaluate mechanisms to overcome them. Results show that multithreaded implementation improves performance over the MPI version by up to a factor of 3 for the Core i7 machine and 1.2 on Magny-cours for 48-way simulation.
引用
收藏
页码:520 / 531
页数:12
相关论文
共 31 条
[1]  
Andrews G.R., 2000, Foundations of Multithreaded, Parallel, and Distributed Programming
[2]  
[Anonymous], P 11 WORKSH PAR DIST
[3]  
[Anonymous], P ACM IEEE SCS WORKS
[4]   MYRINET - A GIGABIT-PER-SECOND LOCAL-AREA-NETWORK [J].
BODEN, NJ ;
COHEN, D ;
FELDERMAN, RE ;
KULAWIK, AE ;
SEITZ, CL ;
SEIZOVIC, JN ;
SU, WK .
IEEE MICRO, 1995, 15 (01) :29-36
[5]   A dynamic load balancing algorithm for conservative parallel simulations [J].
Boukerche, A ;
Das, SK .
MASCOTS '97 - FIFTH INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS, AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS, PROCEEDINGS, 1997, :32-37
[6]  
Chandramowlishwaran A., 2010, IEEE INT PAR DISTR P
[7]   Optimizing communication in Time-Warp simulators [J].
Chetlur, M ;
Abu-Ghazaleh, N .
TWELFTH WORKSHOP ON PARALLEL AND DISTRIBUTED SIMULATION - PADS'98, PROCEEDINGS, 1998, :64-71
[8]   CACHE HIERARCHY AND MEMORY SUBSYSTEM OF THE AMD OPTERON PROCESSOR [J].
Conway, Pat ;
Kalyanasundharam, Nathan ;
Donley, Gregg ;
Lepak, Kevin ;
Hughes, Bill .
IEEE MICRO, 2010, 30 (02) :16-29
[9]   OpenMP: An industry standard API for shared-memory programming [J].
Dagum, L ;
Menon, R .
IEEE COMPUTATIONAL SCIENCE & ENGINEERING, 1998, 5 (01) :46-55
[10]  
DAS S, 1994, 1994 WINTER SIMULATION CONFERENCE PROCEEDINGS, P1332