Substitution of kernel functions based on pattern matching on schedule trees

被引:0
作者
Chen, Zi-Xuan [1 ]
Yang, Wuu [1 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, Dept Comp Sci, Hsinchu, Taiwan
来源
53RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2024 | 2024年
关键词
pattern matching; polyhedral compilation; x86-64; RISC-V; GPU; TRANSFORMATIONS;
D O I
10.1145/3677333.3678152
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rise of AI, computing hardware with varying architectures has emerged. For some frequently used AI kernels, these hardwares provide special accelerators and related instructions. For example, since the Volta architecture, Nvidia GPUs have provided tensor cores to optimize operations related to matrix multiplication. The vector extension of the RISC-V architecture provides instruction-level parallelism for kernels. We design and implement a language for pattern matching with which a user can define patterns for kernels. We identify segments of the schedule trees that match the defined patterns and replace the segments with calls to kernel functions (in libraries) or intrinsics that are optimized for the specific accelerators. In the experiments, the Polybench benchmarks are optimized for (and hence linked with) the following libraries: CBLAS on the x64 platform, CuBLAS with tensor-core instructions on GPU, OpenBLAS containing vector instructions on the RISC-V platform (software emulation, using the vector-instruction emulation ability provided by the Ara vector unit). The average (geomean) performance improvements on selected BLAS benchmarks are (1) run-time speedup is 1.38x for CBLAS on the x64 platform; (2) run-time improvement is 5.27x for CuBLAS with tensor-core instructions on GPU; (3) cycle-count speedup is 5.78x for OpenBLAS containing vector instructions on the RISC-V platform.
引用
收藏
页码:48 / 57
页数:10
相关论文
共 25 条
[1]  
Baghdadi R, 2018, Arxiv, DOI arXiv:1804.10694
[2]   Code generation in the polyhedral model is easier than you think [J].
Bastoul, C .
13TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION TECHNIQUES, PROCEEDINGS, 2004, :7-16
[3]  
Bastoul Cedric, 2008, Clan-A polyhedral representation extractor for high level programs
[4]  
Bondhugula U, 2008, LECT NOTES COMPUT SC, V4959, P132
[5]  
Bondhugula Uday, 2008, ACM SIGPLAN C PROGRA
[6]  
Cavalcante M, 2019, Arxiv, DOI arXiv:1906.00478
[7]   Declarative Loop Tactics for Domain-specific Optimization [J].
Chelini, Lorenzo ;
Zinenko, Oleksandr ;
Grosser, Tobias ;
Corporaal, Henk .
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2019, 16 (04)
[8]  
Feautrier P., 2002, Technical Report
[9]  
Bhaskaracharya SG, 2020, Arxiv, DOI arXiv:2006.12645
[10]  
Grosser Tobias, 2012, Polly-Polyhedral optimization in LLVM