Reconfigurable Bit-Serial Operation Using Toggle SOT-MRAM for High-Performance Computing in Memory Architecture

被引:19
作者
Wang, Jinkai [1 ,2 ]
Bai, Yining [3 ]
Wang, Hongyu [3 ]
Hao, Zuolei [3 ]
Wang, Guanda [3 ]
Zhang, Kun [3 ]
Zhang, Youguang [3 ]
Lv, Weifeng [4 ,5 ]
Zhang, Yue [3 ,6 ]
机构
[1] Beihang Univ, Fert Beijing Inst, Sch Comp Sci & Engn, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
[2] Beihang Univ, Fert Beijing Inst, Sch Comp Sci & Engn, MIIT Key Lab Spintron, Beijing 100191, Peoples R China
[3] Beihang Univ, Fert Beijing Inst, Sch Integrated Circuit Sci & Engn, MIIT Key Lab Spintron, Beijing 100191, Peoples R China
[4] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
[5] Beihang Univ, Res Inst, Shenzhen Key Lab Data Vitalizat Smart City, Shenzhen 518057, Peoples R China
[6] Beihang Univ, Hefei Innovat Res Inst, Nanoelect Sci & Technol Ctr, Hefei 230013, Peoples R China
基金
中国国家自然科学基金;
关键词
Computing in memory; bit-serial operation; toggle spin-orbit torque MRAM; convolution operation; digital CIM architectures; UNIT-MACRO; SRAM; EFFICIENT; ENERGY; COMPUTATION; ENGINE;
D O I
10.1109/TCSI.2022.3192165
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Computing in memory (CIM) is a promising candidate for high throughput and energy-efficient data-driven applications, which mitigates the well-known memory bottleneck in Von Neumann architecture. In this paper, we present a reconfigurable bit-serial operation using toggle spin-orbit torque magnetic random access memory (TSOT-MRAM) to perform the computation completely in the bit-cell array instead of in a peripheral circuit. This bit-serial CIM (BSCIM) scheme achieves higher throughput and energy efficiency in CIM. First, basic Boolean logic operations are realized by utilizing the feature of TSOT device. A bit-cell array that implements the bit-serial operation is then built to provide the communication between column and row necessary for arithmetic operations, such as the carry propagation of addition and multiplication. Finally, we analyze the reliability of BSCIM scheme and demonstrate the performance advantage by performing convolution operations for 28 x 28 handwritten digit images in a BSCIM architecture. The results show that the delay and energy of BSCIM architecture are respectively reduced by 1.16-5.49 times and 1.12-1.43 times compared with the existing digital CIM architectures. Besides, its throughput and energy efficiency are also enhanced to 51.2 GOPS and 9.9 TOPS/W respectively.
引用
收藏
页码:4535 / 4545
页数:11
相关论文
共 40 条
[1]   IMAC: In-Memory Multi-Bit Multiplication and ACcumulation in 6T SRAM Array [J].
Ali, Mustafa ;
Jaiswal, Akhilesh ;
Kodge, Sangamesh ;
Agrawal, Amogh ;
Chakraborty, Indranil ;
Roy, Kaushik .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2020, 67 (08) :2521-2531
[2]   In-Memory Low-Cost Bit-Serial Addition Using Commodity DRAM Technology [J].
Ali, Mustafa E. ;
Jaiswal, Akhilesh ;
Roy, Kaushik .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2020, 67 (01) :155-165
[3]   MRIMA: An MRAM-Based In-Memory Accelerator [J].
Angizi, Shaahin ;
He, Zhezhi ;
Awad, Amro ;
Fan, Deliang .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (05) :1123-1136
[4]  
Anitha P., 2014, P INT C EL COMM SYST, P1, DOI 10.1109/ECS.2014.6892623
[5]   CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Computation for Low-Power Convolutional Neural Networks [J].
Biswas, Avishek ;
Chandrakasan, Anantha P. .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2019, 54 (01) :217-230
[6]   Proposal of Analog In-Memory Computing With Magnified Tunnel Magnetoresistance Ratio and Universal STT-MRAM Cell [J].
Cai, Hao ;
Guo, Yanan ;
Liu, Bo ;
Zhou, Mingyang ;
Chen, Juntong ;
Liu, Xinning ;
Yang, Jun .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2022, 69 (04) :1519-1531
[7]   Accelerating Real-Time Embedded Scene Labeling with Convolutional Networks [J].
Cavigelli, Lukas ;
Magno, Michele ;
Benini, Luca .
2015 52ND ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2015,
[8]   A 4-Kb 1-to-8-bit Configurable 6T SRAM-Based Computation-in-Memory Unit-Macro for CNN-Based AI Edge Processors [J].
Chiu, Yen-Cheng ;
Zhang, Zhixiao ;
Chen, Jia-Jing ;
Si, Xin ;
Liu, Ruhui ;
Tu, Yung-Ning ;
Su, Jian-Wei ;
Huang, Wei-Hsing ;
Wang, Jing-Hong ;
Wei, Wei-Chen ;
Hung, Je-Min ;
Sheu, Shyh-Shyuan ;
Li, Sih-Han ;
Wu, Chih-I ;
Liu, Ren-Shuo ;
Hsieh, Chih-Cheng ;
Tang, Kea-Tiong ;
Chang, Meng-Fan .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2020, 55 (10) :2790-2801
[9]   NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory [J].
Dong, Xiangyu ;
Xu, Cong ;
Xie, Yuan ;
Jouppi, Norman P. .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2012, 31 (07) :994-1007
[10]   Enhanced magnetoresistance by monoatomic roughness in epitaxial Fe/MgO/Fe tunnel junctions [J].
Duluard, A. ;
Bellouard, C. ;
Lu, Y. ;
Hehn, M. ;
Lacour, D. ;
Montaigne, F. ;
Lengaigne, G. ;
Andrieu, S. ;
Bonell, F. ;
Tiusan, C. .
PHYSICAL REVIEW B, 2015, 91 (17)