Serving DNNs in Real Time at Datacenter Scale with Project Brainwave

被引:208
作者
Chung, Eric [1 ]
Fowers, Jeremy [1 ]
Ovtcharov, Kalin [1 ]
Papamichael, Michael [1 ]
Caulfield, Adrian [1 ]
Massengill, Todd [1 ]
Liu, Ming [1 ]
Lo, Daniel [1 ]
Alkalay, Shlomi [1 ]
Haselman, Michael [1 ]
Abeydeera, Maleen [1 ]
Adams, Logan [1 ]
Angepat, Hari [1 ]
Boehn, Christian [1 ]
Chiou, Derek [1 ]
Firestein, Oren [1 ]
Forin, Alessandro [1 ]
Gatlin, Kang Su [1 ]
Ghandi, Mahdi [1 ]
Heil, Stephen [1 ]
Holohan, Kyle [1 ]
El Husseini, Ahmad [1 ]
Juhasz, Tamas [1 ]
Kagi, Kara [1 ]
Kovvuri, Ratna K. [1 ]
Lanka, Sitaram [1 ]
van Megen, Friedel [1 ]
Mukhortov, Dima [1 ]
Patel, Prerak [1 ]
Perez, Brandon [1 ]
Rapsang, Amanda Grace [1 ]
Reinhardt, Steven K. [1 ]
Rouhani, Bita Darvish [1 ]
Sapek, Adam [1 ]
Seera, Raja [1 ]
Shekar, Sangeetha [1 ]
Sridharan, Balaji [1 ]
Weisz, Gabriel [1 ]
Woods, Lisa [1 ]
Xiao, Phillip Yi [1 ]
Zhang, Dan [1 ]
Zhao, Ritchie [1 ]
Burger, Doug [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
关键词
Deep learning; FPGA; Hardware; Inference; Quantization;
D O I
10.1109/MM.2018.022071131
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
To meet the computational demands required of deep learning, cloud operators are turning toward specialized hardware for improved efficiency and performance. Project Brainwave, Microsoft's principal infrastructure for AI serving in real time, accelerates deep neural network (DNN) inferencing in major services such as Bing's intelligent search features and Azure. Exploiting distributed model parallelism and pinning over low-latency hardware microservices, Project Brainwave serves state-of-the-art, pre-trained DNN models with high efficiencies at low batch sizes. A high-performance, precision-adaptable FPGA soft processor is at the heart of the system, achieving up to 39.5 teraflops (Tflops) of effective performance at Batch 1 on a state-of-the-art Intel Stratix 10 FPGA.
引用
收藏
页码:8 / 20
页数:13
相关论文
共 8 条
[1]  
Abadi M., 2016, TENSORFLOW SYSTEM LA
[2]  
[Anonymous], 2014, P 41 ANN INT S COMP
[3]  
[Anonymous], 2017, 44 ANN INT S COMPUTE
[4]  
Caulfield A., 2016, 49 ANN IEEE ACM INT
[5]  
Chen Y., 2014, 47 ANN IEEE ACM INT
[6]  
Guo KY, 2016, IEEE HOT CHIP SYMP
[7]  
Ouyang J, 2014, IEEE HOT CHIP SYMP
[8]  
Seide F., 2016, P 22 ACM SIGKDD INT