DaDianNao: A Neural Network Supercomputer

被引:140
作者
Luo, Tao [1 ]
Liu, Shaoli [1 ]
Li, Ling [2 ]
Wang, Yuqing [1 ]
Zhang, Shijin [1 ]
Chen, Tianshi [1 ]
Xu, Zhiwei [1 ]
Temam, Olivier [3 ]
Chen, Yunji [1 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing 100190, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
[3] Inria Scalay, F-91120 Palaiseau, France
关键词
Machine learning; neuron network; supercomputer; multi-chip; interconnect; CNN; DNN; SILICON PHOTONICS; LOW-COST; DESIGN; POWER;
D O I
10.1109/TC.2016.2574353
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Many companies are deploying services largely based on machine-learning algorithms for sophisticated processing of large amounts of data, either for consumers or industry. The state-of-the-art and most popular such machine-learning algorithms are Convolutional and Deep Neural Networks (CNNs and DNNs), which are known to be computationally and memory intensive. A number of neural network accelerators have been recently proposed which can offer high computational capacity/area ratio, but which remain hampered by memory accesses. However, unlike the memory wall faced by processors on general-purpose workloads, the CNNs and DNNs memory footprint, while large, is not beyond the capability of the on-chip storage of a multi-chip system. This property, combined with the CNN/DNN algorithmic characteristics, can lead to high internal bandwidth and low external communications, which can in turn enable high-degree parallelism at a reasonable area cost. In this article, we introduce a custom multi-chip machine-learning architecture along those lines, and evaluate performance by integrating electrical and optical inter-chip interconnects separately. We show that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 656.63 x over a GPU, and reduce the energy by 184. 05 x on average for a 64-chip system. We implement the node down to the place and route at 28 nm, containing a combination of custom storage and computational units, with electrical inter-chip interconnects.
引用
收藏
页码:73 / 88
页数:16
相关论文
共 63 条
[1]   True North: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip [J].
Akopyan, Filipp ;
Sawada, Jun ;
Cassidy, Andrew ;
Alvarez-Icaza, Rodrigo ;
Arthur, John ;
Merolla, Paul ;
Imam, Nabil ;
Nakamura, Yutaka ;
Datta, Pallab ;
Nam, Gi-Joon ;
Taba, Brian ;
Beakes, Michael ;
Brezzo, Bernard ;
Kuang, Jente B. ;
Manohar, Rajit ;
Risk, William P. ;
Jackson, Bryan ;
Modha, Dharmendra S. .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2015, 34 (10) :1537-1557
[2]  
[Anonymous], 2012, P 17 C EL POW DISTR
[3]  
[Anonymous], 2009, J TOXICOL
[4]  
[Anonymous], TECH REP
[5]  
[Anonymous], ASPL 16 16 INT C
[6]  
[Anonymous], P HOT CHIPS
[7]  
[Anonymous], 2011, CVPR 2011 WORKSH
[8]  
[Anonymous], 2012, P ANN C NEUR INF PRO
[9]  
[Anonymous], DEEP LEARN UNSUPERVI
[10]  
[Anonymous], P IEEE INT EL DEV M