共 179 条
[1]
Agrawal A, 2023, Arxiv, DOI [arXiv:2308.16369, 10.48550/ARXIV.2308.16369]
[2]
Topology-Aware GPU Scheduling for Learning Workloads in Cloud Environments
[J].
SC'17: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS,
2017,
[3]
DeepSpeed-Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
[J].
SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS,
2022,
[4]
Varuna: Scalable, Low-cost Training of Massive Deep Learning Models
[J].
PROCEEDINGS OF THE SEVENTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS '22),
2022,
:472-487
[5]
Bai J., 2023, PREPRINT, DOI [arXiv:2309.16609, 10.48550/arXiv.2309.16609, DOI 10.48550/ARXIV.2309.16609]
[6]
Beltagy I, 2020, Arxiv, DOI arXiv:2004.05150
[7]
Bi Jun, 2023, ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, P314, DOI 10.1145/3582016.3582061
[8]
Bian Z, 2024, arXiv
[9]
Brown TB, 2020, ADV NEUR IN, V33
[10]
Cambier L., 2020, Proc. the 8th International Conference on Learning Representations