Accelerating Inference on Binary Neural Networks with Digital RRAM Processing

被引：0

作者：

Vieira, Joao ^{[1
]}

Giacomin, Edouard ^{[2
]}

Qureshi, Yasir ^{[3
]}

Zapater, Marina ^{[3
]}

Tang, Xifan ^{[2
]}

Kvatinsky, Shahar ^{[4
]}

Atienza, David ^{[3
]}

Gaillardon, Pierre-Emmanuel ^{[2
]}

机构：

[1] Univ Lisbon, Inst Super Tecn, INESC ID, Lisbon, Portugal

[2] Univ Utah, LNIS, Salt Lake City, UT USA

[3] Swiss Fed Inst Technol Lausanne EPFL, ESL, Lausanne, Switzerland

[4] Technion Israel Inst Technol, Andrew & Erna Viterbi Fac Elect Engn, Haifa, Israel

来源：

VLSI-SOC: NEW TECHNOLOGY ENABLER, VLSI-SOC 2019 | 2020年 / 586卷

基金：

欧洲研究理事会;

关键词：

Machine Learning; Embedded systems; Binary Neural Networks; RRAM-based Binary Dot Product Engine;

D O I：

10.1007/978-3-030-53273-4_12

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The need for efficient Convolutional Neural Network (CNNs) targeting embedded systems led to the popularization of Binary Neural Networks (BNNs), which significantly reduce execution time and memory requirements by representing the operands using only one bit. Also, due to 90% of the operations executed by CNNs and BNNs being convolutions, a quest for custom accelerators to optimize the convolution operation and reduce data movements has started, in which Resistive Random Access Memory (RRAM)-based accelerators have proven to be of interest. This work presents a custom Binary Dot Product Engine(BDPE) for BNNs that exploits the low-level compute capabilities enabled RRAMs. This new engine allows accelerating the execution of the inference phase of BNNs by locally storing the most used kernels and performing the binary convolutions using RRAM devices and optimized custom circuitry. Results show that the novel BDPE improves performance by 11.3%, energy efficiency by 7.4% and reduces the number of memory accesses by 10.7% at a cost of less than 0.3% additional die area.

引用

页码：257 / 278

页数：22

共 33 条

[1]

Abouzeid F, 2016, PROC EUR SOLID-STATE, P37

[2] Quantification of Sense Amplifier Offset Voltage Degradation due to Zero- and Run-time Variability [J].

Agbo, Innocent ;

Taouil, Mottaqiallah ;

Hamdioui, Said ;

Weckx, Pieter ;

Cosemans, Stefan ;

Raghavan, Praveen ;

Catthoor, Francky ;

Dehaene, Wim .

2016 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI), 2016, :725-730

[3]

[Anonymous], 2008, P 25 INT C MACH LEAR, DOI DOI 10.1145/1390156.1390177

[4]

ARM, 2018, ARM ARCHITECTURE REF

[5]

Binkert Nathan, 2011, Computer Architecture News, V39, P1, DOI 10.1145/2024716.2024718

[6]

Butko A, 2012, 2012 7TH INTERNATIONAL WORKSHOP ON RECONFIGURABLE AND COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC)

[7]

Chen A, 2011, 2011 IEEE INTERNATIONAL RELIABILITY PHYSICS SYMPOSIUM (IRPS)

[8]

Chen XZ, 2018, ASIA S PACIF DES AUT, P123, DOI 10.1109/ASPDAC.2018.8297293

[9]

Chen YH, 2016, ISSCC DIG TECH PAP I, V59, P262, DOI 10.1109/ISSCC.2016.7418007

[10] PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory [J].

Chi, Ping ;

Li, Shuangchen ;

Xu, Cong ;

Zhang, Tao ;

Zhao, Jishen ;

Liu, Yongpan ;

Wang, Yu ;

Xie, Yuan .

2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :27-39

← 1 2 3 4 →