LIBSHALOM: Optimizing Small and Irregular -Shaped Matrix Multiplications on ARMv8 Multi -Cores

被引:21
作者
Yang, Weiling [1 ]
Fang, Jianbin [1 ]
Dong, Dezun [1 ]
Su, Xing [1 ]
Wang, Zheng [2 ]
机构
[1] Natl Univ Def Technol, Coll Comp Sci, Beijing, Peoples R China
[2] Univ Leeds, Sch Comp, Leeds, W Yorkshire, England
来源
SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS | 2021年
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Matrix Multiplication; Small and Irregular-Shaped; ARMv8; MultiCore; Performance Optimization; PERFORMANCE;
D O I
10.1145/3458817.3476217
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
General Matrix Multiplication (GEMM) is a key subroutine in highperformance computing. While the mainstream linear algebra libraries can deliver high performance on large arid regular-shaped GEMM, they are inadequate for optimizing small and irregularshaped GEMMs, which are commonly seen in new HPC applications. Some of the recent works in this direction have made promising progress on x86 architectures and GPUs but still leave much room for improvement on emerging I IPC hardware built upon the ARMv8 architecture. We present truSuAt on, an open -source library for optimizing small and irregular -shaped GEMMs, explicitly targeting the ARMv8 architecture. LasSitAtom builds upon the classical Coto algoritlun but tailors it to minimize the expensive memory accessing overhead for data packing and processing small matrices. It uses analytic methods to determine GEMM kernel optimization parameters, enhancing the computation and parallelization efficiency of the GEMM kernels. We evaluate LIBSHALOM by applying it to three ARMv8 mtdti-core architectures and comparing it against five mainstream linear algebra libraries. Experimental results show that LIESIIALOM can consistently outperform existing solutions across C,EMM workloads and kirdware architectures.
引用
收藏
页数:15
相关论文
共 59 条
[1]  
[Anonymous], 2015, 3 INT C LEARN REPR
[2]  
[Anonymous], LIBSHALOM
[3]  
[Anonymous], Intel mkl
[4]  
[Anonymous], KUNPENG 920
[5]  
[Anonymous], NEK5000 NEKBOX
[6]  
[Anonymous], OPENCL BLAS
[7]  
[Anonymous], SCI SOFTWARE NUMERIC
[8]  
[Anonymous], ARM Performance Libraries Reference Manual
[9]  
[Anonymous], ARMV9
[10]   Pushing Back the Limit of Ab-initio Quantum Transport Simulations on Hybrid Supercomputers [J].
Calderara, Mauro ;
Brueck, Sascha ;
Pedersen, Andreas ;
Bani-Hashemian, Mohammad H. ;
VandeVondele, Joost ;
Luisier, Mathieu .
PROCEEDINGS OF SC15: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2015,