Parsimony: Enabling SIMD/Vector Programming in Standard Compiler Flows

被引:1
作者
Kandiah, Vijay [1 ]
Lustig, Daniel [2 ]
Villa, Oreste [2 ]
Nellans, David [2 ]
Hardavellas, Nikos [1 ]
机构
[1] Northwestern Univ, Evanston, IL 60208 USA
[2] NVIDIA, San Jose, CA USA
来源
PROCEEDINGS OF THE 21ST ACM/IEEE INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, CGO 2023 | 2023年
关键词
Parallel Computing; Vectorization; Code Translation; Single-instruction Multiple-data; Compiler Design;
D O I
10.1145/3579990.3580019
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Achieving peak throughput on modern CPUs requires maximizing the use of single-instruction, multiple-data (SIMD) or vector compute units. Single-program, multiple-data (SPMD) programming models are an effective way to use high-level programming languages to target these ISAs. Unfortunately, many SPMD frameworks have evolved to have either overly-restrictive language specifications or under-specified programming models, and this has slowed the widescale adoption of SPMD-style programming. This paper introduces Parsimony (PARallel SIMd), a SPMD programming approach built with semantics designed to be compatible with multiple languages and to cleanly integrate into the standard optimizing compiler toolchains for those languages. We first explain the Parsimony programming model semantics and how they enable a standalone compiler IR-to-IR pass that can perform vectorization independently of other passes, improving the language and toolchain compatibility of SPMD programming. We then demonstrate a LLVM prototype of the Parsimony approach that matches the performance of ispc, a popular but more restrictive SPMD approach, and achieves 97% of the performance of hand-written AVX-512 SIMD intrinsics on over 70 benchmarks ported from the Simd Library. We finally discuss where Parsimony has exposed parts of existing language and compiler flows where slight improvements could further enable improved SPMD program vectorization.
引用
收藏
页码:186 / 198
页数:13
相关论文
共 46 条
[11]  
Intel, 2022, Intel 64 and IA-32 architectures software devel- oper's manual combined volumes: 1,2A,2B,2C,2D,3A,3B,3C,3Dand4,, V1-4
[12]  
Jakob Wenzel, 2019, Enoki: structured vectorization and differentiation on modern processor architectures
[13]  
Kandiah Vijay, 2023, Zenodo, DOI 10.5281/ZENODO.7524279
[14]  
Karrenberg, 2015, AUTOMATIC SIMD VECTO, P85
[15]  
Kuck D. J., 1980, COMPSAC 80. IEEE Computer Society's Fourth International Computer Software & Applications Conference, P709
[16]   Exploiting superword level parallelism with multimedia instruction sets [J].
Larsen, S ;
Amarasinghe, S .
ACM SIGPLAN NOTICES, 2000, 35 (05) :145-156
[17]  
Lattner C., 2002, Ph. D. Dissertation
[18]   Exploring the Design Space of SPMD Divergence Management on Data-Parallel Architectures [J].
Lee, Yunsup ;
Grover, Vinod ;
Krashinsky, Ronny ;
Stephenson, Mark ;
Keckler, Stephen W. ;
Asanovic, Krste .
2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2014, :101-113
[19]  
Lee YS, 2013, INT SYM CODE GENER, P182
[20]  
Leissa Roland, 2014, P 2014 WORKSHOP PROG, P17