A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services

被引:30
|
作者
Putnam, Andrew [1 ]
Caulfield, Adrian M. [1 ]
Chung, Eric S. [1 ]
Chiou, Derek [1 ,2 ]
Constantinides, Kypros [3 ]
Demme, John [4 ]
Esmaeilzadeh, Hadi [5 ]
Fowers, Jeremy [1 ]
Gopal, Gopi Prashanth [1 ]
Gray, Jan [1 ]
Haselman, Michael [1 ]
Hauck, Scott [1 ,6 ]
Heil, Stephen [1 ]
Hormati, Amir [7 ]
Kim, Joo-Young [1 ]
Lanka, Sitaram [1 ]
Larus, James [8 ]
Peterson, Eric [1 ]
Pope, Simon [1 ]
Smith, Aaron [1 ]
Thong, Jason [1 ]
Xiao, Phillip Yi [1 ]
Burger, Doug [1 ]
机构
[1] Microsoft, Redmond, WA 98052 USA
[2] Univ Texas Austin, Austin, TX 78712 USA
[3] Amazon Web Serv, Boston, MA USA
[4] Columbia Univ, New York, NY USA
[5] Georgia Inst Technol, Atlanta, GA 30332 USA
[6] Univ Washington, Seattle, WA 98195 USA
[7] Google Inc, Mountain View, CA USA
[8] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
关键词
D O I
10.1145/2996868
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we designed and built a composable, reconfigurable hardware fabric based on field programmable gate arrays (FPGA). Each server in the fabric contains one FPGA, and all FPGAs within a 48-server rack are interconnected over a low-latency, high-bandwidth network. We describe a medium-scale deployment of this fabric on a bed of 1632 servers, and measure its effectiveness in accelerating the ranking component of the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system. Under high load, the large-scale reconfigurable fabric improves the ranking throughput of each server by 95% at a desirable latency distribution or reduces tail latency by 29% at a fixed throughput. In other words, the reconfigurable fabric enables the same throughput using only half the number of servers.
引用
收藏
页码:114 / 122
页数:9
相关论文
共 50 条
  • [1] A RECONFIGURABLE FABRIC FOR ACCELERATING LARGE-SCALE DATACENTER SERVICES
    Putnam, Andrew
    Caulfield, Adrian M.
    Chung, Eric S.
    Chiou, Derek
    Constantinides, Kypros
    Demme, John
    Esmaeilzadeh, Hadi
    Fowers, Jeremy
    Gopal, Gopi Prashanth
    Gray, Jan
    Haselman, Michael
    Hauck, Scott
    Heil, Stephen
    Hormati, Amir
    Kim, Joo-Young
    Lanka, Sitaram
    Larus, James
    Peterson, Eric
    Pope, Simon
    Smith, Aaron
    Thong, Jason
    Xiao, Phillip Yi
    Burger, Doug
    IEEE MICRO, 2015, 35 (03) : 10 - 22
  • [2] A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services
    Putnam, Andrew
    Caulfield, Adrian M.
    Chung, Eric S.
    Chiou, Derek
    Constantinides, Kypros
    Demme, John
    Esmaeilzadeh, Hadi
    Fowers, Jeremy
    Gopal, Gopi Prashanth
    Gray, Jan
    Haselman, Michael
    Hauck, Scott
    Heil, Stephen
    Hormati, Amir
    Kim, Joo-Young
    Lanka, Sitaram
    Larus, James
    Peterson, Eric
    Pope, Simon
    Smith, Aaron
    Thong, Jason
    Xiao, Phillip Yi
    Burger, Doug
    2014 ACM/IEEE 41ST ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2014, : 13 - 24
  • [3] Large-Scale Reconfigurable Computing in a Microsoft Datacenter
    Putnam, Andrew
    2014 IEEE HOT CHIPS 26 SYMPOSIUM (HCS), 2014,
  • [4] DxPU: Large-scale Disaggregated GPU Pools in the Datacenter
    He, Bowen
    Zheng, Xiao
    Chen, Yuan
    Li, Weinan
    Zhou, Yajin
    Long, Xin
    Zhang, Pengcheng
    Lu, Xiaowei
    Jiang, Linquan
    Liu, Qiang
    Cai, Dennis
    Zhang, Xiantao
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2023, 20 (04)
  • [5] A Hybrid Testbed for Performance Evaluation of Large-Scale Datacenter Networks
    Pilimon, Artur
    Ruepp, Sarah
    2018 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS (ICNC), 2018, : 409 - 413
  • [6] Accelerating Large-Scale Genomic Analysis with Spark
    Li, Xueqi
    Tan, Guangming
    Zhang, Chunming
    Li, Xu
    Zhang, Zhonghai
    Sun, Ninghui
    2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 747 - 751
  • [7] A Large-Scale Reconfigurable Smart Sensory Chip
    Peng, Sheng-Yu
    Gurun, Gokce
    Twigg, Christopher M.
    Qureshi, Muhammad S.
    Basu, Arindam
    Brink, Stephen
    Hasler, Paul E.
    Degertekin, F. L.
    ISCAS: 2009 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-5, 2009, : 2145 - +
  • [8] The Configurable Cloud - Accelerating Hyperscale Datacenter Services with FPGA
    Putnam, Andre
    2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 1587 - 1587
  • [9] Automated Dynamic Resource Provisioning and Monitoring in Virtualized Large-scale Datacenter
    Abar, Sameera
    Lemarinier, Pierre
    Theodoropoulos, Georgios K.
    O'Hare, Gregory M. P.
    2014 IEEE 28TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA), 2014, : 961 - 970
  • [10] LARGE-SCALE MODELS AND LARGE-SCALE THINKING - THE CASE OF THE HEALTH-SERVICES
    SMITH, P
    OMEGA-INTERNATIONAL JOURNAL OF MANAGEMENT SCIENCE, 1995, 23 (02): : 145 - 157