XPlacer: Automatic Analysis of Data Access Patterns on Heterogeneous CPU/GPU Systems

被引：2

作者：

Pirkelbauer, Peter ^{[1
,2
]}

Lin, Pei-Hung ^{[1
]}

Vanderbruggen, Tristan ^{[1
]}

Liao, Chunhua ^{[1
]}

机构：

[1] Lawrence Livermore Natl Lab, Ctr Appl Sci Comp, Livermore, CA 94550 USA

[2] Univ Cent Florida, Dept Comp Sci, Orlando, FL 32816 USA

来源：

2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020 | 2020年

关键词：

GPGPU; heterogeneous systems; high-performance computing; code instrumentation;

D O I：

10.1109/IPDPS47924.2020.00106

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents XPlacer, a framework to automatically analyze problematic data access patterns in C++ and CUDA code. XPlacer records heap memory operations in both host and device code for later analysis. To this end, XPlacer instruments read and write operations, function calls, and kernel launches. Programmers mark points in the program execution where the recorded data is analyzed and anomalies diagnosed. XPlacer reports data access anti-patterns, including alternating CPU/GPU accesses to the same memory, memory with low access density, and unnecessary data transfers. The diagnostic also produces summative information about the recorded accesses, which aids users in identifying code that could degrade performance. The paper evaluates XPlacer using LULESH, a Lawrence Livermore proxy application, Rodina benchmarks, and an implementation of the Smith-Waterman algorithm. XPlacer diagnosed several performance issues in these codes. The elimination of a performance problem in LULESH resulted in a 3x speedup on a heterogeneous platform combining Intel CPUs and Nvidia GPUs.

引用

页码：997 / 1007

页数：11

共 29 条

[1]

Anantpur J, 2017, INT SYM CODE GENER, P50, DOI 10.1109/CGO.2017.7863728

[2]

[Anonymous], 2017, 148822017E ISOIEC JT

[3]

[Anonymous], 2013, LLNLTR641973

[4] Functionality and performance of NVLink with IBM POWER9 processors [J].

Appelhans, David ;

Auerbach, Gadiel ;

Averill, Duane ;

Black, Ryan ;

Brown, Aaron ;

Buono, Daniele ;

Cash, Ron ;

Chen, Dong ;

Deindl, Michael ;

Duffy, Darren ;

Eastman, Gay ;

Evangelinos, Constantinos ;

George, Joji ;

Goldade, James ;

Grinberg, Leopold ;

Haring, Ruud ;

Irish, John ;

Jackson, Jonathan ;

Kahle, James ;

Klaus, John ;

Kowalski, Walt ;

Lambrecht, Lonny ;

Madduluri, Nirmal ;

McJunkin, Steve ;

Mikos, James ;

Mokrzycki, Laura ;

Nathanson, Ben ;

Ohmacht, Martin ;

Paruthi, Viresh ;

Priyadharshini, Usha ;

Rajagopalan, Uma ;

Reysa, John ;

Rogers, Brian ;

Sabotta, Christopher ;

Schaal, Marcel ;

Schardt, Paul ;

Senger, Robert ;

Sexton, James ;

Shedivy, Dave ;

Sugavanam, Krishnan ;

Sugawara, Yutaka ;

Tuen, Nathaniel ;

Valk, Kenneth ;

Wheeler, Grant ;

Woodward, Sandy ;

Ratzlaff, Eugene ;

Brandhofer, Sebastian ;

Eisley, Noel ;

Liu, Xing ;

La Van, Tracy .

IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2018, 62 (4-5)

[5]

Bari MAS, 2018, PROCEEDINGS OF 2018 IEEE/ACM PERFORMANCE MODELING, BENCHMARKING AND SIMULATION OF HIGH PERFORMANCE COMPUTER SYSTEMS (PMBS 2018), P83, DOI [10.1109/PMBS.2018.00013, 10.1109/PMBS.2018.8641666]

[6]

Boehme D., 2016, SC 16

[7] Benchmarking Harp-DAAL: High Performance Hadoop on KNL Clusters [J].

Chen, Langshi ;

Peng, Bo ;

Zhang, Bingjing ;

Liu, Tony ;

Zou, Yiming ;

Jiang, Lei ;

Henschel, Robert ;

Stewart, Craig ;

Zhang, Zhang ;

Mccallum, Emily ;

Tom, Zahniser ;

Jon, Omer ;

Qiu, Judy .

2017 IEEE 10TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2017, :82-89

[8]

Cook S, 2013, CUDA PROGRAMMING: A DEVELOPER'S GUIDE TO PARALLEL COMPUTING WITH GPUS, P1, DOI 10.1016/B978-0-12-415933-4.00001-6

[9]

Derge Gilmer J, 2001, STL tutorial and reference guide: C++ programming with the standard template library

[10]

Harris Mark, 2017, Unified Memory for CUDA Beginners

← 1 2 3 →