共 32 条
- [21] Paszke A, 2019, ADV NEUR IN, V32
- [22] Capuchin: Tensor-based GPU Memory Management for Deep Learning [J]. TWENTY-FIFTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXV), 2020, : 891 - 905
- [23] Peters M. E., 2018, P 2018 C N AM CHAPT, P2227
- [24] ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning [J]. SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,
- [25] ZeRO: Memory Optimizations Toward Training Trillion Parameter Models [J]. PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,
- [26] DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters [J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3505 - 3506
- [27] Ren J, 2021, PROCEEDINGS OF THE 2021 USENIX ANNUAL TECHNICAL CONFERENCE, P551
- [28] Vaswani A, 2017, ADV NEUR IN, V30