共 40 条
[1]
KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators
[J].
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE,
2016, 42 (03)
[2]
Akiva-Hochman R., 2022, Lecture Notes in Computer Science, P130
[4]
VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores
[J].
SC23:INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS,
2023,
[5]
Probing the Efficacy of Hardware-Aware Weight Pruning to Optimize the SpMM routine on Ampere GPUs
[J].
PROCEEDINGS OF THE 2022 31ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT 2022,
2022,
:135-147
[6]
Chen A.-T., 2021, P 9 INT S COMP NETW, P1
[8]
DSSA: Dual-Side Sparse Systolic Array Architecture for Accelerating Convolutional Neural Network Training
[J].
51ST INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2022,
2022,
[9]
Chetlur S, 2014, Arxiv, DOI arXiv:1410.0759
[10]
Chitty-Venkata K. T., 2022, P 31 INT S HIGH PERF