Communication-aware thread mapping using the translation lookaside buffer

被引:3
作者
Cruz, Eduardo H. M. [1 ]
Diener, Matthias [1 ]
Navaux, Philippe O. A. [1 ]
机构
[1] Univ Fed Rio Grande do Sul, Inst Informat, Porto Alegre, RS, Brazil
关键词
thread mapping; shared memory; translation lookaside buffer;
D O I
10.1002/cpe.3487
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Threads of parallel applications need to communicate in order to fulfill their tasks. The communication performance between the cores in modern multi-core architectures differs because of the memory and interconnection hierarchies. In these architectures, it is important to map the threads of parallel applications by taking into account the communication between them, to improve their performance and energy consumption. In parallel applications based on shared memory, communication is implicit, which makes it difficult to detect the communication pattern between the threads. In this paper, we introduce a new lightweight mechanism to detect the communication pattern between threads of shared memory applications using the translation lookaside buffer. Our mechanism relies on hardware features, which make it transparent to the programmer and allow the detection to be performed by the operating system during the execution of the application. We also developed a heuristic mapping algorithm that uses the detected pattern to dynamically map the threads to cores. Experiments were performed with applications from the NAS-OMP and PARSEC parallel benchmark suites in a simulated machine as well as a real machine. Results show that our mechanism can substantially improve parallel application performance, as well as processor and DRAM energy consumption. Copyright (c) 2015 John Wiley & Sons, Ltd.
引用
收藏
页码:4970 / 4992
页数:23
相关论文
共 33 条
  • [21] Jeannot E, 2010, LECT NOTES COMPUT SC, V6272, P199, DOI 10.1007/978-3-642-15291-7_20
  • [22] Jin H., 1999, NAS99011
  • [23] Klug Tobias., 2008, Transactions on High-Performance Embedded Architectures and Compilers (Transactions on HiPEAC), V3, P219
  • [24] Martin M.M., 2005, SIGARCH COMPUT ARCHI, V33, P92, DOI DOI 10.1145/1105734.1105747
  • [25] MIPS, 1996, MIPS R10000 MICR US
  • [26] Osiakwan C. N. K., 1990, Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing 1990 (Cat. No.TH0328-5), P880, DOI 10.1109/SPDP.1990.143503
  • [27] Pellegrini Francois, 1994, SCAL HIGH PERF COMP, DOI [10.1109/SHPCC.1994.296682, DOI 10.1109/SHPCC.1994.296682]
  • [28] Thread Assignment of Multithreaded Network Applications in Multicore/Multithreaded Processors
    Radojkovic, Petar
    Cakarevic, Vladimir
    Verdu, Javier
    Pajuelo, Alex
    Cazorla, Francisco J.
    Nemirovsky, Mario
    Valero, Mateo
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2013, 24 (12) : 2513 - 2525
  • [29] Su C., 2012, SIGMETRICS PERFORM E, V40, P106, DOI DOI 10.1145/2381056.2381079
  • [30] Thoziyoor Shyamkumar., 2008, Technical Report