共 38 条
[1]
Brown TB, 2020, ADV NEUR IN, V33
[2]
Probing the Efficacy of Hardware-Aware Weight Pruning to Optimize the SpMM routine on Ampere GPUs
[J].
PROCEEDINGS OF THE 2022 31ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT 2022,
2022,
:135-147
[3]
Chen Z., 2023, P 28 ACM SIGPLAN ANN, P369
[4]
Efficient Tensor Core -Based GPU Kernels for Structured Sparsity under Reduced Precision
[J].
SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS,
2021,
[5]
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[6]
Frantar E, 2023, Arxiv, DOI [arXiv:2301.00774, DOI 10.48550/ARXIV.2301.00774]
[7]
Frantar E, 2021, Arxiv, DOI arXiv:2107.03356
[8]
Gale T, 2019, Arxiv, DOI [arXiv:1902.09574, 10.48550/arXiv.1902.09574]
[9]
Sparse GPU Kernels for Deep Learning
[J].
PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20),
2020,
[10]
Google Research, 2020, Deep Learning Matrix Collection