Design Tradeoffs in CXL-Based Memory Pools for Public Cloud Platforms

被引:23
作者
Berger, Daniel. S. S. [1 ]
Ernst, Daniel [2 ]
Li, Huaicheng [3 ]
Zardoshti, Pantea [4 ]
Shah, Monish [5 ]
Rajadnya, Samir [5 ]
Lee, Scott [6 ]
Hsu, Lisa [7 ]
Agarwal, Ishwar [8 ]
Hill, Mark. D. D. [5 ,9 ]
Bianchini, Ricardo [7 ]
机构
[1] Microsoft Azure, Azure Syst Res Grp, Redmond, WA 98052 USA
[2] Microsoft Azure, Leading Edge Architecture Pathfinding LEAP, Redmond, WA 98052 USA
[3] Virginia Tech, Blacksburg, VA 24061 USA
[4] Microsoft Azure, AzSR Grp, Redmond, WA 98052 USA
[5] Microsoft Azure, LEAP Grp, Redmond, WA 98052 USA
[6] Microsoft, Redmond, WA 98052 USA
[7] Microsoft Azure, Redmond, WA 98052 USA
[8] Intel Corp, Santa Clara, CA 95054 USA
[9] Univ Wisconsin Madison, Madison, WI 53715 USA
关键词
Servers; Memory management; Cloud computing; Bandwidth; Random access memory; Costs; Hardware;
D O I
10.1109/MM.2023.3241586
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Dynamic random-access memory (DRAM) is a key driver of performance and cost in public cloud servers. At the same time, a significant amount of DRAM is underutilized due to fragmented use across servers. Emerging interconnects such as Compute Express Link (CXL) offers a path toward improving utilization through memory pooling. However, the design space of CXL-based memory systems is large, with key questions around the size, reach, and topology of the memory pool. At the same time, using pools require navigating complex design constraints around performance, virtualization, and management. This article discusses why cloud providers should deploy CXL memory pools, key design constraints, and observations in designing toward practical deployment. We identify configuration examples with significant positive return of investment.
引用
收藏
页码:30 / 38
页数:9
相关论文
共 20 条
[1]  
AsteraLabs Leo, 2022, MEM CONN PLATF CXL 1
[2]   Rethinking Software Runtimes for Disaggregated Memory [J].
Calciu, Irina ;
Imran, M. Talha ;
Puddu, Ivan ;
Kashyap, Sanidhya ;
Al Maruf, Hasan ;
Mutlu, Onur ;
Kolli, Aasheesh .
ASPLOS XXVI: TWENTY-SIXTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2021, :79-92
[3]  
Computeexpresslink, 2020, CXL SPEC
[4]   Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms [J].
Cortez, Eli ;
Bonde, Anand ;
Muzio, Alexandre ;
Russinovich, Mark ;
Fontoura, Marcus ;
Bianchini, Ricardo .
PROCEEDINGS OF THE TWENTY-SIXTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES (SOSP '17), 2017, :153-167
[5]  
Gao PX, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P249
[6]  
Gu JC, 2017, PROCEEDINGS OF NSDI '17: 14TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, P649
[7]   Clio: A Hardware-Software Co-Designed Disaggregated Memory System [J].
Guo, Zhiyuan ;
Shan, Yizhou ;
Luo, Xuhao ;
Huang, Yutong ;
Zhang, Yiying .
ASPLOS '22: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2022, :417-433
[8]  
Hadary O, 2020, PROCEEDINGS OF THE 14TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '20), P845
[9]  
intel, 2015, INT RES DIR TECHN IN
[10]  
Lesokhin I, 2017, TWENTY-SECOND INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXII), P449, DOI 10.1145/3037697.3037710