Communication-aware thread mapping using the translation lookaside buffer

被引:3
作者
Cruz, Eduardo H. M. [1 ]
Diener, Matthias [1 ]
Navaux, Philippe O. A. [1 ]
机构
[1] Univ Fed Rio Grande do Sul, Inst Informat, Porto Alegre, RS, Brazil
关键词
thread mapping; shared memory; translation lookaside buffer;
D O I
10.1002/cpe.3487
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Threads of parallel applications need to communicate in order to fulfill their tasks. The communication performance between the cores in modern multi-core architectures differs because of the memory and interconnection hierarchies. In these architectures, it is important to map the threads of parallel applications by taking into account the communication between them, to improve their performance and energy consumption. In parallel applications based on shared memory, communication is implicit, which makes it difficult to detect the communication pattern between the threads. In this paper, we introduce a new lightweight mechanism to detect the communication pattern between threads of shared memory applications using the translation lookaside buffer. Our mechanism relies on hardware features, which make it transparent to the programmer and allow the detection to be performed by the operating system during the execution of the application. We also developed a heuristic mapping algorithm that uses the detected pattern to dynamically map the threads to cores. Experiments were performed with applications from the NAS-OMP and PARSEC parallel benchmark suites in a simulated machine as well as a real machine. Results show that our mechanism can substantially improve parallel application performance, as well as processor and DRAM energy consumption. Copyright (c) 2015 John Wiley & Sons, Ltd.
引用
收藏
页码:4970 / 4992
页数:23
相关论文
共 33 条
  • [1] [Anonymous], 2012, Technical report
  • [2] *ARM, 2005, ARM ARCH REF MAN
  • [3] Azimi Reza, 2009, Operating Systems Review, V43, P56, DOI 10.1145/1531793.1531803
  • [4] A Communication Characterisation of Splash-2 and Parsec
    Barrow-Williams, Nick
    Fensch, Christian
    Moore, Simon
    [J]. PROCEEDINGS OF THE 2009 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION, 2009, : 86 - 97
  • [5] The PARSEC Benchmark Suite: Characterization and Architectural Implications
    Bienia, Christian
    Kumar, Sanjeev
    Singh, Jaswinder Pal
    Li, Kai
    [J]. PACT'08: PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2008, : 72 - 81
  • [6] BOKHARI SH, 1981, IEEE T COMPUT, V30, P207, DOI 10.1109/TC.1981.1675756
  • [7] The Future of Microprocessors
    Borkar, Shekhar
    Chien, Andrew A.
    [J]. COMMUNICATIONS OF THE ACM, 2011, 54 (05) : 67 - 77
  • [8] Broquedis F., 2010, INT PARALLEL DISTRIB, P1
  • [9] hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications
    Broquedis, Francois
    Clet-Ortega, Jerome
    Moreaud, Stephanie
    Furmento, Nathalie
    Goglin, Brice
    Mercier, Guillaume
    Thibault, Samuel
    Namyst, Raymond
    [J]. PROCEEDINGS OF THE 18TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2010, : 180 - 186
  • [10] Castro M, 2012, LECT NOTES COMPUT SC, V7484, P465, DOI 10.1007/978-3-642-32820-6_47