Substitution of kernel functions based on pattern matching on schedule trees

被引:0
作者
Chen, Zi-Xuan [1 ]
Yang, Wuu [1 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, Dept Comp Sci, Hsinchu, Taiwan
来源
53RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2024 | 2024年
关键词
pattern matching; polyhedral compilation; x86-64; RISC-V; GPU; TRANSFORMATIONS;
D O I
10.1145/3677333.3678152
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rise of AI, computing hardware with varying architectures has emerged. For some frequently used AI kernels, these hardwares provide special accelerators and related instructions. For example, since the Volta architecture, Nvidia GPUs have provided tensor cores to optimize operations related to matrix multiplication. The vector extension of the RISC-V architecture provides instruction-level parallelism for kernels. We design and implement a language for pattern matching with which a user can define patterns for kernels. We identify segments of the schedule trees that match the defined patterns and replace the segments with calls to kernel functions (in libraries) or intrinsics that are optimized for the specific accelerators. In the experiments, the Polybench benchmarks are optimized for (and hence linked with) the following libraries: CBLAS on the x64 platform, CuBLAS with tensor-core instructions on GPU, OpenBLAS containing vector instructions on the RISC-V platform (software emulation, using the vector-instruction emulation ability provided by the Ara vector unit). The average (geomean) performance improvements on selected BLAS benchmarks are (1) run-time speedup is 1.38x for CBLAS on the x64 platform; (2) run-time improvement is 5.27x for CuBLAS with tensor-core instructions on GPU; (3) cycle-count speedup is 5.78x for OpenBLAS containing vector instructions on the RISC-V platform.
引用
收藏
页码:48 / 57
页数:10
相关论文
共 25 条
[11]   When Polyhedral Transformations Meet SIMD Code Generation [J].
Kong, Martin ;
Veras, Richard ;
Stock, Kevin ;
Franchetti, Franz ;
Pouchet, Louis-Noel ;
Sadayappan, P. .
ACM SIGPLAN NOTICES, 2013, 48 (06) :127-138
[12]  
LLVM, 2020, LLVM Language Reference Manual-llvm 9 documentation
[13]  
LLVM, 2020, The LLVM Compiler Infrastructure
[14]  
Loechner Vincent., 1999, PolyLib: A library for manipulating parameterized polyhedra
[15]  
NVIDIA, Nvidia tensor cores
[16]  
OpenMP Architecture Review Board, 2008, OpenMP Application Program Interface Version 3.0
[17]  
Patwardhan Abhishek, 2016, Texturizing PPCG: Supporting Texture Memory in a Polyhedral Compiler
[18]  
Verdoolaege S, 2010, LECT NOTES COMPUT SC, V6327, P299, DOI 10.1007/978-3-642-15582-6_49
[19]  
Verdoolaege Sven, 2013, ACM T ARCHIT CODE OP, V9, p54:1, DOI [10.1145/2400682.2400713, DOI 10.1145/2400682.2400713]
[20]  
Verdoolaege Sven, 2016, Presburger Formulas and Polyhedral Compilation, DOI [10.13140/RG.2.1.1174.6323, DOI 10.13140/RG.2.1.1174.6323]