aCortex: An Energy-Efficient Multipurpose Mixed-Signal Inference Accelerator

被引:10
作者
Bavandpour, Mohammad [1 ]
Mahmoodi, Mohammad R. [1 ]
Strukov, Dmitri B. [1 ]
机构
[1] Univ Calif Santa Barbara, Dept Elect & Comp Engn, Santa Barbara, CA 93117 USA
来源
IEEE JOURNAL ON EXPLORATORY SOLID-STATE COMPUTATIONAL DEVICES AND CIRCUITS | 2020年 / 6卷 / 01期
基金
美国国家科学基金会;
关键词
Artificial neural networks; floating-gate memory; machine learning; mixed-signal circuits; neuromorphic inference accelerator; nonvolatile memory (NVM); ANALOG;
D O I
10.1109/JXCDC.2020.2999581
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We introduce "aCortex," an extremely energy-efficient, fast, compact, and versatile neuromorphic processor architecture suitable for the acceleration of a wide range of neural network inference models. The most important feature of our processor is a configurable mixed-signal computing array of vector-by-matrix multiplier (VMM) blocks utilizing embedded nonvolatile memory arrays for storing weight matrices. Analog peripheral circuitry for data conversion and high-voltage programming are shared among a large array of VMM blocks to facilitate compact and energy-efficient analog-domain VMM operation of different types of neural network layers. Other unique features of aCortex include configurable chain of buffers and data buses, simple and efficient instruction set architecture and its corresponding multiagent controller, programmable quantization range, and a customized refresh-free embedded dynamic random access memory. The energy-optimal aCortex with 4-bit analog computing precision was designed in a 55-nm process with embedded NOR flash memory. Its physical performance was evaluated using experimental data from testing individual circuit elements and physical layout of key components for several common benchmarks, namely, Inception-vl and ResNet-152, two state-of-the-art deep feedforward networks for image classification, and GNTM, Google's deep recurrent network for language translation. The system-level simulation results for these benchmarks show the energy efficiency of 97, 106, and 336 TOp/J, respectively, combined with up to 15 TOp/s computing throughput and 0.27-MB/mm(2) storage efficiency. Such estimated performance results compare favorably with those of previously reported mixed-signal accelerators based on much less mature aggressively scaled resistive switching memories.
引用
收藏
页码:98 / 106
页数:9
相关论文
共 40 条
  • [1] PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference
    Ankit, Aayush
    El Hajj, Izzat
    Chalamalasetti, Sai Rahul
    Ndu, Geoffrey
    Foltin, Martin
    Williams, R. Stanley
    Faraboschi, Paolo
    Hwu, Wen-mei
    Strachan, John Paul
    Roy, Kaushik
    Milojicic, Dejan S.
    [J]. TWENTY-FOURTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXIV), 2019, : 715 - 731
  • [2] [Anonymous], INV DAY PRES
  • [3] [Anonymous], 1989, ANALOG VLSI NEURAL S
  • [4] Bavandpour M., 2018, INT EL DEVICES MEET
  • [5] Energy-Efficient Time-Domain Vector-by-Matrix Multiplier for Neurocomputing and Beyond
    Bavandpour, Mohammad
    Mahmoodi, Mohammad Reza
    Strukov, Dmitri B.
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2019, 66 (09) : 1512 - 1516
  • [6] Bayat F.M., 2016, 2016 74th Annual Device Research Conference (DRC), P1
  • [7] Implementation of multilayer perceptron network with highly uniform passive memristive crossbar circuits
    Bayat, F. Merrikh
    Prezioso, M.
    Chakrabarti, B.
    Nili, H.
    Kataeva, I.
    Strukov, D.
    [J]. NATURE COMMUNICATIONS, 2018, 9
  • [8] Bayat FM, 2015, IEEE INT SYMP CIRC S, P1921, DOI 10.1109/ISCAS.2015.7169048
  • [9] Finite element multigrid method for multi-term time fractional advection diffusion equations
    Bu, Weiping
    Liu, Xiangtao
    Tang, Yifa
    Yang, Jiye
    [J]. INTERNATIONAL JOURNAL OF MODELING SIMULATION AND SCIENTIFIC COMPUTING, 2015, 6 (01)
  • [10] Experimental Demonstration and Tolerancing of a Large-Scale Neural Network (165 000 Synapses) Using Phase-Change Memory as the Synaptic Weight Element
    Burr, Geoffrey W.
    Shelby, Robert M.
    Sidler, Severin
    di Nolfo, Carmelo
    Jang, Junwoo
    Boybat, Irem
    Shenoy, Rohit S.
    Narayanan, Pritish
    Virwani, Kumar
    Giacometti, Emanuele U.
    Kuerdi, Bulent N.
    Hwang, Hyunsang
    [J]. IEEE TRANSACTIONS ON ELECTRON DEVICES, 2015, 62 (11) : 3498 - 3507