A Machine Learning Approach Towards Runtime Optimisation of Matrix Multiplication

被引：3

作者：

Xia, Yufan ^{[1
]}

De La Pierre, Marco ^{[2
]}

Barnard, Amanda S. ^{[1
]}

Barca, Giuseppe Maria Junior ^{[1
]}

机构：

[1] Australian Natl Univ, Canberra, Australia

[2] Pawsey Supercomp Res Ctr, Perth, Australia

来源：

2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS | 2023年

关键词：

GEMM; BLAS; Machine learning; BLIS; MKL; Linear Algebra; Multiple threads; ALGEBRA; ALGORITHMS; GEMM;

D O I：

10.1109/IPDPS54959.2023.00059

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The GEneral Matrix Multiplication (GEMM) is one of the essential algorithms in scientific computing. Single-thread GEMM implementations are well-optimised with techniques like blocking and autotuning. However, due to the complexity of modern multi-core shared memory systems, it is challenging to determine the number of threads that minimises the multi-thread GEMM runtime. We present a proof-of-concept approach to building an Architecture and Data-Structure Aware Linear Algebra (ADSALA) software library that uses machine learning to optimise the runtime performance of BLAS routines. More specifically, our method uses a machine learning model on-the-fly to automatically select the optimal number of threads for a given GEMM task based on the collected training data. Test results on two different HPC node architectures, one based on a two-socket Intel Cascade Lake and the other on a two-socket AMD((R)) Zen 3, revealed a 25 to 40 per cent speedup compared to traditional GEMM implementations in BLAS when using GEMM of memory usage within 100 MB.

引用

页码：524 / 534

页数：11

共 50 条

[31] Using Machine Learning to Estimate Utilization and Throughput for OpenCL-Based Matrix-Vector Multiplication (MVM)
Naher, Jannatun
Gloster, Clay
Doss, Christopher C.
Jadhav, Shrikant S.
2020 10TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2020, : 365 - 372
[32] A machine learning approach to synchronization of automata
Podolak, Igor
Roman, Adam
Szykula, Marek
Zielinski, Bartosz
EXPERT SYSTEMS WITH APPLICATIONS, 2018, 97 : 357 - 371
[33] Towards Optimal Multiple Constant Multiplication: A Hypergraph Approach
Gustafsson, Oscar
2008 42ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-4, 2008, : 1805 - 1809
[34] A machine learning-oriented pseudo-field approach to accelerate runtime of molecular dynamics simulation of liquids
Khan, Md. Akib
Morshed, A. K. M. Monjur
Paul, Titan C.
MOLECULAR SIMULATION, 2023, 49 (15) : 1442 - 1451
[35] IMPLEMENTATION OF HYPERPARAMETER OPTIMISATION AND OVER-SAMPLING IN DETECTING CYBERBULLYING USING MACHINE LEARNING APPROACH
Ali, Wan Noor Hamiza Wan
Mohd, Masnizah
Fauzi, Fariza
Shirai, Kiyoaki
Noor, Muhammad Junaidi Mahamad
MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2021, : 78 - 100
[36] A data-driven machine learning approach for the 3D printing process optimisation
Nguyen, Phuong Dong
Nguyen, Thanh Q.
Tao, Q. B.
Vogel, Frank
Nguyen-Xuan, H.
VIRTUAL AND PHYSICAL PROTOTYPING, 2022, 17 (04) : 768 - 786
[37] A Reinforcement Learning Approach to Powertrain Optimisation
Matallah, Hocine
Javied, Asad
Williams, Alexander
Abdo, Ashraf Fahmy
Belblidia, Fawzi
SUSTAINABLE DESIGN AND MANUFACTURING, SDM 2022, 2023, 338 : 252 - 261
[38] Machine learning accelerates high throughput design and screening of MOF mixed-matrix membranes towards He separation
Wu, Jiasheng
Guo, Yanan
Liu, Guozhen
Liu, Gongping
Jin, Wanqin
JOURNAL OF MEMBRANE SCIENCE, 2025, 717
[39] Analysis of the hyperparameter optimisation of four machine learning satellite imagery classification methods
Alonso-Sarria, Francisco
Valdivieso-Ros, Carmen
Gomariz-Castillo, Francisco
COMPUTATIONAL GEOSCIENCES, 2024, 28 (03) : 551 - 571
[40] Adaptive OpenMP Task Scheduling Using Runtime APIs and Machine Learning
Qawasmeh, Ahmad R.
Malik, Abid M.
Chapman, Barbara M.
2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 889 - 895

← 1 2 3 4 5 →