共 35 条
[1]
Fast Batched Matrix Multiplication for Small Sizes using Half-Precision Arithmetic on GPUs
[J].
2019 IEEE 33RD INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2019),
2019,
:111-122
[3]
[Anonymous], 2018, NVIDIA CUBLAS LIB
[4]
Ballard G., 2012, P 24 ANN ACM S PAR A, P193, DOI DOI 10.1145/2312005.2312044
[5]
Barrachina S, 2008, 2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, P3103
[6]
Chtchelkanova A, 1997, CONCURRENCY-PRACT EX, V9, P837, DOI 10.1002/(SICI)1096-9128(199709)9:9<837::AID-CPE267>3.0.CO
[7]
2-2
[9]
OpenMP: An industry standard API for shared-memory programming
[J].
IEEE COMPUTATIONAL SCIENCE & ENGINEERING,
1998, 5 (01)
:46-55
[10]
Fatahalian K., 2004, PROC ACM SIGGRAPHEUR, P133, DOI 10.1145/1058129.1058148