DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning

被引:1040
作者
Chen, Tianshi [1 ]
Du, Zidong [1 ]
Sun, Ninghui [1 ]
Wang, Jia [1 ]
Wu, Chengyong [1 ]
Chen, Yunji [1 ]
Temam, Olivier [2 ]
机构
[1] ICT, SKLCA, Beijing, Peoples R China
[2] Inria, Le Chesnay, France
关键词
NEURAL-NETWORKS; RECOGNITION;
D O I
10.1145/2541940.2541967
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Machine-Learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural Networks, i.e., CNNs and DNNs) are proving to be state-of-the-art across many applications. As architectures evolve towards heterogeneous multi-cores composed of a mix of cores and accelerators, a machine-learning accelerator can achieve the rare combination of efficiency (due to the small number of target algorithms) and broad application scope. Until now, most machine-learning accelerator designs have focused on efficiently implementing the computational part of the algorithms. However, recent state-of-the-art CNNs and DNNs are characterized by their large size. In this study, we design an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy. We show that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s (key NN operations such as synaptic weight multiplications and neurons outputs additions) in a small footprint of 3.02 mm(2) and 485 mW; compared to a 128-bit 2GHz SIMD processor, the accelerator is 117.87x faster, and it can reduce the total energy by 21.08x. The accelerator characteristics are obtained after layout at 65nm. Such a high throughput in a small footprint can open up the usage of state-of-the-art machine-learning algorithms in a broad set of systems and for a broad set of applications.
引用
收藏
页码:269 / 283
页数:15
相关论文
共 44 条
  • [1] Al Maashri A, 2012, DES AUT CON, P579
  • [2] Amant R. S., 2008, INT S MICR COM
  • [3] [Anonymous], 2012, P 17 C EL POW DISTR
  • [4] [Anonymous], INT C INF KNOWL MAN
  • [5] [Anonymous], INT S MICR
  • [6] [Anonymous], 2013, INT C MACH LEARN
  • [7] [Anonymous], 2012, ARXIV
  • [8] [Anonymous], 2011, CVPR 2011 WORKSH
  • [9] [Anonymous], 2013, IEEE INT C ACOUSTICS
  • [10] [Anonymous], 2008, INT C PAR ARCH COMP