Gallatin: A General-Purpose GPU Memory Manager

被引：1

作者：

McCoy, Hunter ^{[1
]}

Pandey, Prashant ^{[1
]}

机构：

[1] Univ Utah, Salt Lake City, UT 84112 USA

来源：

PROCEEDINGS OF THE 29TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2024 | 2024年

关键词：

GPU; Memory allocation; Concurrent data structures; High performance computing;

D O I：

10.1145/3627535.3638499

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Dynamic memory management is critical for efficiently porting modern data processing pipelines to GPUs. However, building a general-purpose dynamic memory manager on GPUs is challenging due to the massive parallelism and weak memory coherence. Existing state-of-the-art GPU memory managers, Ouroboros and Reg-Eff, employ traditional data structures such as arrays and linked lists to manage memory objects. They build specialized pipelines to achieve performance for a fixed set of allocation sizes and fall back to the CUDA allocator for allocating large sizes. In the process, they lose general-purpose usability and fail to support critical applications such as streaming graph processing. In this paper, we introduce Gallatin, a general-purpose and high-performance GPU memory manager. Gallatin uses the van Emde Boas (vEB) tree data structure to manage memory objects efficiently and supports allocations of any size. Furthermore, we develop a highly-concurrentGPUimplementation of the vEB tree which can be broadly used in other GPU applications. It supports constant time insertions, deletions, and successor operations for a given memory size. In our evaluation, we compare Gallatin with state-of-theart specialized allocator variants. Gallatin is up to 374x faster on single-sized allocations and up to 264xfaster on mixed-size allocations than the next-best allocator. In scalability benchmarks, Gallatin is up to 254x times faster than the next-best allocator as the number of threads increases. For the graph benchmarks, Gallatin is 1.5x faster than the state-of-the-art for bulk insertions, slightly faster for bulk deletions, and is 3x faster than the next-best allocator for all graph expansion tests.

引用

页码：364 / 376

页数：13

共 50 条

[1] TASK MANAGER FOR GENERAL-PURPOSE OPERATING SYSTEMS
Martyshkin, Alexey, I
TURISMO-ESTUDOS E PRATICAS, 2020,
[2] A Context Manager for General-purpose Operating Systems
Olsen, Diogo
Maziero, Carlos
2012 BRAZILIAN SYMPOSIUM ON COMPUTING SYSTEM ENGINEERING (SBESC 2012), 2012, : 157 - 160
[3] SIFT Implementation and Optimization for General-Purpose GPU
Heymann, S.
Mueller, K.
Smolic, A.
Froelich, B.
Wiegand, T.
WSCG 2007, FULL PAPERS PROCEEDINGS I AND II, 2007, : 317 - +
[4] A performance model for general-purpose computation on GPU
Institute of Computer Science and Technology, Peking University, Beijing 100871, China
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao, 2009, 9 (1219-1226):
[5] General-purpose computing on GPU Pixel processing
Ockay, Milos
2017 COMMUNICATION AND INFORMATION TECHNOLOGIES (KIT), 2017, : 115 - 118
[6] RFID manager - Providing a general-purpose RFID platform
Katsunori, Noma
Takahiro, Murakami
NEC TECHNICAL JOURNAL, 2006, 1 (02): : 97 - 100
[7] A general purpose contention manager for software transactions on the GPU
Shen, Qi
Sharp, Craig
Davison, Richard
Ushaw, Gary
Ranjan, Rajiv
Zomaya, Albert Y.
Morgan, Graham
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2020, 139 (139) : 1 - 17
[8] Memory Encryption for General-Purpose Processors
Gueron, Shay
IEEE SECURITY & PRIVACY, 2016, 14 (06) : 54 - 62
[9] A GENERAL-PURPOSE MEMORY RELIABILITY SIMULATOR
LIBSON, MR
HARVEY, HE
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1984, 28 (02) : 196 - 205
[10] Contract-Based General-Purpose GPU Programming
Kolesnichenko, Alexey
Poskitt, Christopher M.
Nanz, Sebastian
Meyer, Bertrand
GPCE'15: PROCEEDINGS OF THE 2015 ACM SIGPLAN INTERNATIONAL CONFERENCE ON GENERATIVE PROGRAMMING: CONCEPTS AND EXPERIENCES, 2015, : 75 - 84

← 1 2 3 4 5 →