Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUs

被引：0

作者：

Wei, Bingxin ^{[1
]}

Wang, Yizhuo ^{[1
]}

Chang, Fangli ^{[1
]}

Gao, Jianhua ^{[1
]}

Ji, Weixing ^{[1
]}

机构：

[1] Beijing Inst Technol, Sch Comp Sci & Technol, 5 South St, Beijing 100081, Peoples R China

来源：

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS | 2024年 / 38卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Sparse general matrix-matrix multiplication; graphics processing unit; machine learning; gradient boosting decision tree;

D O I：

10.1177/10943420241231928

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sparse General Matrix-Matrix Multiplication (SpGEMM) has played an important role in a number of applications. So far, many efficient algorithms have been proposed to improve the performance of SpGEMM on GPUs. However, the performance of each algorithm for matrices of different structures varies a lot. There is no algorithm that can achieve the optimal performance of SpGEMM computation on all matrices. In this article, we design a machine learning based approach for predicting the optimal SpGEMM algorithm on input matrices. By extracting features from input matrices, we utilize LightGBM and XGBoost to train different lightweight models. The models are capable of predicting the best performing algorithm with low inference overhead and high accuracy for the given input matrices. We also investigate the impact of tree depth on model accuracy and inference overhead. Our evaluation shows that an increase in tree depth leads to a corresponding increase in prediction accuracy, reaching a maximum of approximately 85%, while resulting in increased inference overhead of approximately 10 A mu s. Compared with the state-of-the-art algorithms on three GPU platforms, our method achieves better overall performance.

引用

页码：245 / 259

页数：15

共 31 条

[1]

Azad A, 2016, Arxiv, DOI arXiv:1510.00844

[2] Parallel Triangle Counting and Enumeration using Matrix Algebra [J].

Azad, Ariful ;

Buluc, Aydin ;

Gilbert, John .

2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, :804-811

[3] EXPOSING FINE-GRAINED PARALLELISM IN ALGEBRAIC MULTIGRID METHODS [J].

Bell, Nathan ;

Dalton, Steven ;

Olson, Luke N. .

SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2012, 34 (04) :C123-C152

[4]

Buluc Aydin, 2008, 2008 37th International Conference on Parallel Processing (ICPP), P503, DOI 10.1109/ICPP.2008.45

[5]

Buluç A, 2008, 2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, P1876

[6] XGBoost: A Scalable Tree Boosting System [J].

Chen, Tianqi ;

Guestrin, Carlos .

KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794

[7] Model-driven Autotuning of Sparse Matrix-Vector Multiply on GPUs [J].

Choi, Jee W. ;

Singh, Amik ;

Vuduc, Richard W. .

ACM SIGPLAN NOTICES, 2010, 45 (05) :115-125

[8] Size-estimation framework with applications to transitive closure and reachability [J].

Cohen, E .

JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (03) :441-453

[9]

Dalton Steven., 2014, CUSP GENERIC PARALLE

[10] The University of Florida Sparse Matrix Collection [J].

Davis, Timothy A. ;

Hu, Yifan .

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2011, 38 (01)

← 1 2 3 4 →