共 48 条
[1]
DeepSpeed-Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
[J].
SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS,
2022,
[2]
[Anonymous], About us
[3]
[Anonymous], GPUDirect Storage
[4]
Bai ZH, 2020, PROCEEDINGS OF THE 14TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '20), P499
[5]
Bao H., 2022, ICLR
[6]
Ben Noach M, 2020, 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), P884
[7]
devblog.pytorchlightning, About us
[8]
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[9]
Demystifying the Placement Policies of the NVIDIA GPU Thread Block Scheduler for Concurrent Kernels
[J].
Performance Evaluation Review,
2021, 48 (03)
:81-88
[10]
Gujarati A, 2020, PROCEEDINGS OF THE 14TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '20), P443