Hadamard product-based in-memory computing design for floating point neural network training

被引:1
作者
Fan, Anjunyi [1 ,2 ]
Fu, Yihan [1 ,2 ]
Tao, Yaoyu [1 ,2 ]
Jin, Zhonghua [3 ]
Han, Haiyue [3 ]
Liu, Huiyu [3 ]
Zhang, Yaojun [3 ]
Yan, Bonan [1 ,2 ]
Yang, Yuchao [1 ,2 ,4 ,5 ]
Huang, Ru [1 ,2 ]
机构
[1] Peking Univ, Inst Artificial Intelligence, Beijing, Peoples R China
[2] Peking Univ, Beijing Adv Innovat Ctr Integrated Circuits, Sch Integrated Circuits, Beijing, Peoples R China
[3] Pimchip Technol Co Ltd, Beijing, Peoples R China
[4] Peking Univ, Sch Elect & Comp Engn, Shenzhen, Peoples R China
[5] Chinese Inst Brain Res CIBR, Ctr Brain Inspired Intelligence, Beijing, Peoples R China
来源
NEUROMORPHIC COMPUTING AND ENGINEERING | 2023年 / 3卷 / 01期
基金
中国国家自然科学基金;
关键词
in-memory computing; SRAM; floating point; DNN training; SRAM;
D O I
10.1088/2634-4386/acbab9
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deep neural networks (DNNs) are one of the key fields of machine learning. It requires considerable computational resources for cognitive tasks. As a novel technology to perform computing inside/near memory units, in-memory computing (IMC) significantly improves computing efficiency by reducing the need for repetitive data transfer between the processing and memory units. However, prior IMC designs mainly focus on the acceleration for DNN inference. DNN training with the IMC hardware has rarely been proposed. The challenges lie in the requirement of DNN training for high precision (e.g. floating point (FP)) and various operations of tensors (e.g. inner and outer products). These challenges call for the IMC design with new features. This paper proposes a novel Hadamard product-based IMC design for FP DNN training. Our design consists of multiple compartments, which are the basic units for the matrix element-wise processing. We also develop BFloat16 post-processing circuits and fused adder trees, laying the foundation for IMC FP processing. Based on the proposed circuit scheme, we reformulate the back-propagation training algorithm for the convenience and efficiency of the IMC execution. The proposed design is implemented with commercial 28 nm technology process design kits and benchmarked with widely used neural networks. We model the influence of the circuit structural design parameters and provide an analysis framework for design space exploration. Our simulation validates that MobileNet training with the proposed IMC scheme saves 91.2% in energy and 13.9% in time versus the same task with NVIDIA GTX 3060 GPU. The proposed IMC design has a data density of 769.2 Kb mm-2 with the FP processing circuits included, showing a 3.5 x improvement than the prior FP IMC designs.
引用
收藏
页数:18
相关论文
共 61 条
  • [1] BEOL-Compatible Superlattice FEFET Analog Synapse With Improved Linearity and Symmetry of Weight Update
    Aabrar, Khandker Akif
    Kirtania, Sharadindu Gopal
    Liang, Fu-Xiang
    Gomez, Jorge
    San Jose, Matthew
    Luo, Yandong
    Ye, Huacheng
    Dutta, Sourav
    Ravikumar, Priyankka G.
    Ravindran, Prasanna Venkatesan
    Khan, Asif Islam
    Yu, Shimeng
    Datta, Suman
    [J]. IEEE TRANSACTIONS ON ELECTRON DEVICES, 2022, 69 (04) : 2094 - 2100
  • [2] MRIMA: An MRAM-Based In-Memory Accelerator
    Angizi, Shaahin
    He, Zhezhi
    Awad, Amro
    Fan, Deliang
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (05) : 1123 - 1136
  • [3] An Always-On 3.8 μJ/86% CIFAR-10 Mixed-Signal Binary CNN Processor With All Memory on Chip in 28-nm CMOS
    Bankman, Daniel
    Yang, Lita
    Moons, Bert
    Verhelst, Marian
    Murmann, Boris
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2019, 54 (01) : 158 - 172
  • [4] Banner Ron, 2018, Advances in neural information processing systems, V31
  • [5] Belluomini W., 2005, 2005 IEEE International Solid-State Circuits Conference (IEEE Cat. No. 05CH37636), P374
  • [6] SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems
    Besta, Maciej
    Kanakagiri, Raghavendra
    Kwasniewski, Grzegorz
    Ausavarungnirun, Rachata
    Beranek, Jakub
    Kanellopoulos, Konstantinos
    Janda, Kacper
    Vonarburg-Shmaria, Zur
    Gianinazzi, Lukas
    Stefan, Ioana
    Gomez-Luna, Juan
    Copik, Marcin
    Kapp-Schwoerer, Lukas
    Di Girolamo, Salvatore
    Blach, Nils
    Konieczny, Marek
    Mutlu, Onur
    Hoefler, Torsten
    [J]. PROCEEDINGS OF 54TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2021, 2021, : 282 - 297
  • [7] CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Computation for Low-Power Convolutional Neural Networks
    Biswas, Avishek
    Chandrakasan, Anantha P.
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2019, 54 (01) : 217 - 230
  • [8] NeuroSim: A Circuit-Level Macro Model for Benchmarking Neuro-Inspired Architectures in Online Learning
    Chen, Pai-Yu
    Peng, Xiaochen
    Yu, Shimeng
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (12) : 3067 - 3080
  • [9] An 89TOPS/W and 16.3TOPS/mm2 All-Digital SRAM-Based Full-Precision Compute-In Memory Macro in 22nm for Machine-Learning Edge Applications
    Chih, Yu-Der
    Lee, Po-Hao
    Fujiwara, Hidehiro
    Shih, Yi-Chun
    Lee, Chia-Fu
    Naous, Rawan
    Chen, Yu-Lin
    Lo, Chieh-Pu
    Lu, Cheng-Han
    Mori, Haruki
    Zhao, Wei-Cheng
    Sun, Dar
    Sinangil, Mahmut E.
    Chen, Yen-Huei
    Chou, Tan-Li
    Akarvardar, Kerem
    Liao, Hung-Jen
    Wang, Yih
    Chang, Meng-Fan
    Chang, Tsung-Yung Jonathan
    [J]. 2021 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE (ISSCC), 2021, 64 : 252 - +
  • [10] Courbariaux M, 2015, Arxiv, DOI arXiv:1412.7024