Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective

被引:400
作者
Hazelwood, Kim [1 ]
Bird, Sarah [1 ]
Brooks, David [1 ]
Chintala, Soumith [1 ]
Diril, Utku [1 ]
Dzhulgakov, Dmytro [1 ]
Fawzy, Mohamed [1 ]
Jia, Bill [1 ]
Jia, Yangqing [1 ]
Kalro, Aditya [1 ]
Law, James [1 ]
Lee, Kevin [1 ]
Lu, Jason [1 ]
Noordhuis, Pieter [1 ]
Smelyanskiy, Misha [1 ]
Xiong, Liang [1 ]
Wang, Xiaodong [1 ]
机构
[1] Facebook Inc, Menlo Pk, CA 94025 USA
来源
2018 24TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA) | 2018年
关键词
D O I
10.1109/HPCA.2018.00059
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Machine learning sits at the core of many essential products and services at Facebook. This paper describes the hardware and software infrastructure that supports machine learning at global scale. Facebook's machine learning workloads are extremely diverse: services require many different types of models in practice. This diversity has implications at all layers in the system stack. In addition, a sizable fraction of all data stored at Facebook flows through machine learning pipelines, presenting significant challenges in delivering data to high-performance distributed training flows. Computational requirements are also intense, leveraging both GPU and CPU platforms for training and abundant CPU capacity for real-time inference. Addressing these and other emerging challenges continues to require diverse efforts that span machine learning algorithms, software, and hardware design.
引用
收藏
页码:620 / 629
页数:10
相关论文
共 7 条
[1]  
Alemdar H., 2016, ABS160900222 CORR
[2]  
[Anonymous], 2016, ABS160202830 CORR
[3]  
[Anonymous], 2017, SYNTHESIS LECT COMPU
[4]  
Candela J.Q., 2017, FACEBOOK MICROSOFT I
[5]  
Goyal Priya, 2017, ABS170602677 CORR
[6]  
Han S., 2016, INT C LEARNING REPRE
[7]  
He X., 2014, P 8 INT WORKSH DAT M, P1