RRAM-Based Analog Approximate Computing

被引:99
|
作者
Li, Boxun [1 ]
Gu, Peng [1 ]
Shan, Yi [2 ]
Wang, Yu [1 ]
Chen, Yiran [3 ]
Yang, Huazhong [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Tsinghua Natl Lab Informat Sci & Technol, Beijing 100084, Peoples R China
[2] Baidu Inc, Baidu Res Inst Deep Learning, Beijing 100085, Peoples R China
[3] Univ Pittsburgh, Dept Elect & Comp Engn, Pittsburgh, PA 15261 USA
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
Approximate computing; neural network; power efficiency; resistive random-access memory (RRAM); NEURAL-NETWORKS; DEVICE; DESIGN; MEMORY;
D O I
10.1109/TCAD.2015.2445741
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Approximate computing is a promising design paradigm for better performance and power efficiency. In this paper, we propose a power efficient framework for analog approximate computing with the emerging metal-oxide resistive switching random-access memory (RRAM) devices. A programmable RRAM-based approximate computing unit (RRAM-ACU) is introduced first to accelerate approximated computation, and an approximate computing framework with scalability is then proposed on top of the RRAM-ACU. In order to program the RRAM-ACU efficiently, we also present a detailed configuration flow, which includes a customized approximator training scheme, an approximator-parameter-to-RRAM-state mapping algorithm, and an RRAM state tuning scheme. Finally, the proposed RRAM-based computing framework is modeled at system level. A predictive compact model is developed to estimate the configuration overhead of RRAM-ACU and help explore the application scenarios of RRAM-based analog approximate computing. The simulation results on a set of diverse benchmarks demonstrate that, compared with a x86-64 CPU at 2 GHz, the RRAM-ACU is able to achieve 4.06-196.41x speedup and power efficiency of 24.59-567.98 GFLOPS/W with quality loss of 8.72% on average. And the implementation of hierarchical model and X application demonstrates that the proposed RRAM-based approximate computing framework can achieve >12.8x power efficiency than its pure digital implementation counterparts (CPU, graphics processing unit, and field-programmable gate arrays).
引用
收藏
页码:1905 / 1917
页数:13
相关论文
empty
未找到相关数据