共 31 条
[1]
High-Performance Tensor Contractions for GPUs
[J].
INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE 2016 (ICCS 2016),
2016, 80
:108-118
[2]
Performance, Design, and Autotuning of Batched GEMM for GPUs
[J].
HIGH PERFORMANCE COMPUTING,
2016, 9697
:21-38
[3]
Agullo Emmanuel, 2010, GPU COMPUTING GEMS, V2
[4]
[Anonymous], PARALLEL COMPUTING
[5]
[Anonymous], 2019, IEEE Std 754-2019 (Revision of IEEE 754-2008), P1, DOI [DOI 10.1109/IEEESTD.2008.4610935, 10.1109/IEEESTD.2017.8091139, 10.1109/IEEESTD.2019.8766229, DOI 10.1109/IEEESTD.2019.8766229]
[6]
[Anonymous], 2015, FULL WALK SGEMM IMPL
[7]
[Anonymous], 2013, CORR
[8]
Chellapilla K., 2006, 10 INT WORKSH FRONT
[9]
Chetlur S., 2014, cudnn: Efficient primitives for deep learning
[10]
Gupta S, 2015, PR MACH LEARN RES, V37, P1737