Exploiting Hierarchical Locality in Deep Parallel Architectures

被引：8

作者：

Anbar, Ahmad ^{[1
]}

Serres, Olivier ^{[1
]}

Kayraklioglu, Engin ^{[1
]}

Badawy, Abdel-Hameed A. ^{[1
,2
]}

El-Ghazawi, Tarek ^{[1
]}

机构：

[1] George Washington Univ, 20101 Acad Way,Suite 333, Washington, DC 20037 USA

[2] Arkansas Tech Univ, Russellville, AR 72801 USA

来源：

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION | 2016年 / 13卷 / 02期

关键词：

Design; Algorithms; Performance; PGAS; hierarchical locality exploitation; productivity; PHLAME; PHAST;

D O I：

10.1145/2897783

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Parallel computers are becoming deeply hierarchical. Locality-aware programming models allow programmers to control locality at one level through establishing affinity between data and executing activities. This, however, does not enable locality exploitation at other levels. Therefore, we must conceive an efficient abstraction of hierarchical locality and develop techniques to exploit it. Techniques applied directly by programmers, beyond the first level, burden the programmer and hinder productivity. In this article, we propose the Parallel Hierarchical Locality Abstraction Model for Execution (PHLAME). PHLAME is an execution model to abstract and exploit machine hierarchical properties through locality-aware programming and a runtime that takes into account machine characteristics, as well as a data sharing and communication profile of the underlying application. This article presents and experiments with concepts and techniques that can drive such runtime system in support of PHLAME. Our experiments show that our techniques scale up and achieve performance gains of up to 88%.

引用

页数：25

共 31 条

[1] PHLAME: Hierarchical Locality Exploitation using the PGAS Model [J].

Anbar, Ahmad ;

Serres, Olivier ;

Kayraklioglu, Engin ;

Badawy, Abdel-Hameed A. ;

El-Ghazawi, Tarek .

2015 9TH INTERNATIONAL CONFERENCE ON PARTITIONED GLOBAL ADDRESS SPACE PROGRAMMING MODELS (PGAS), 2015, :82-89

[2]

Anbar Ahmad, 2014, P 20 IEEE INT C PAR

[3]

[Anonymous], 1991, NAS PARALLEL BENCHMA, DOI [DOI 10.1145/125826.125925, 10.1145/125826.125925]

[4]

Bonachea D., 2002, TECHNICAL REPORT

[5] hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications [J].

Broquedis, Francois ;

Clet-Ortega, Jerome ;

Moreaud, Stephanie ;

Furmento, Nathalie ;

Goglin, Brice ;

Mercier, Guillaume ;

Thibault, Samuel ;

Namyst, Raymond .

PROCEEDINGS OF THE 18TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2010, :180-186

[6] Efficient algorithms for all-to-all communications in multiport message-passing systems [J].

Bruck, J ;

Ho, CT ;

Kipnis, S ;

Upfal, E ;

Weathersby, D .

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1997, 8 (11) :1143-1156

[7] Parallel programmability and the Chapel language [J].

Chamberlain, B. L. ;

Callahan, D. ;

Zima, H. P. .

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2007, 21 (03) :291-312

[8] CACHE HIERARCHY AND MEMORY SUBSYSTEM OF THE AMD OPTERON PROCESSOR [J].

Conway, Pat ;

Kalyanasundharam, Nathan ;

Donley, Gregg ;

Lepak, Kevin ;

Hughes, Bill .

IEEE MICRO, 2010, 30 (02) :16-29

[9] Using the Translation Lookaside Buffer to Map Threads in Parallel Applications Based on Shared Memory [J].

Cruz, Eduardo H. M. ;

Diener, Matthias ;

Navaux, Philippe O. A. .

2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, :532-543

[10] OpenMP: An industry standard API for shared-memory programming [J].

Dagum, L ;

Menon, R .

IEEE COMPUTATIONAL SCIENCE & ENGINEERING, 1998, 5 (01) :46-55

← 1 2 3 4 →