A Dual-Split 6T SRAM-Based Computing-in-Memory Unit-Macro With Fully Parallel Product-Sum Operation for Binarized DNN Edge Processors

被引:133
作者
Si, Xin [1 ,2 ]
Khwa, Win-San [2 ,3 ]
Chen, Jia-Jing [2 ]
Li, Jia-Fang [2 ]
Sun, Xiaoyu [4 ]
Liu, Rui [5 ]
Yu, Shimeng [4 ]
Yamauchi, Hiroyuki [6 ]
Li, Qiang [1 ]
Chang, Meng-Fan [2 ]
机构
[1] Univ Elect Sci & Technol China, Inst Integrated Circuits & Syst, Chengdu 610054, Sichuan, Peoples R China
[2] Natl Tsing Hua Univ, Dept Elect Engn, Hsinchu 30013, Taiwan
[3] TSMC, Hsinchu 30078, Taiwan
[4] Georgia Inst Technol, Atlanta, GA 30332 USA
[5] Synopsys, San Francisco, CA 94107 USA
[6] Fukuoka Inst Technol, Fukuoka, Fukuoka 8110295, Japan
关键词
Computer architecture; Biological neural networks; Microprocessors; SRAM cells; Program processors; Random access memory; computing-in-memory; binarized DNN edge processors; artificial intelligence;
D O I
10.1109/TCSI.2019.2928043
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Computing-in-memory (CIM) is a promising approach to reduce the latency and improve the energy efficiency of deep neural network (DNN) artificial intelligence (AI) edge processors. However, SRAM-based CIM (SRAM-CIM) faces practical challenges in terms of area overhead, performance, energy efficiency, and yield against variations in data patterns and transistor performance. This paper employed a circuit-system co-design methodology to develop a SRAM-CIM unit-macro for a binary-based fully connected neural network (FCNN) layer of the DNN AI edge processors. The proposed SRAM-CIM unit-macro supports two binarized neural network models: an XNOR neural network (XNORNN) and a modified binary neural network (MBNN). To achieve compact area, fast access time, robust operations, and high energy-efficiency, our proposed SRAM-CIM uses a split-wordline compact-rule 6T SRAM and circuit techniques, including a dynamic input-aware reference generation (DIARG) scheme, an algorithm-dependent asymmetric control (ADAC) scheme, a write disturb-free (WDF) scheme, and a common-mode-insensitive small offset voltage-mode sensing amplifier (CMI-VSA). A fabricated 65-nm 4-Kb SRAM-CIM unit-macro achieved 2.4- and 2.3-ns product-sum access times for a FCNN layer using XNORNN and MBNN, respectively. The measured maximum energy efficiency reached 30.49 TOPS/W for XNORNN and 55.8 TOPS/W for the MBNN modes.
引用
收藏
页码:4172 / 4185
页数:14
相关论文
共 35 条
[1]   Compute Caches [J].
Aga, Shaizeen ;
Jeloka, Supreet ;
Subramaniyan, Arun ;
Narayanasamy, Satish ;
Blaauw, David ;
Das, Reetuparna .
2017 23RD IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2017, :481-492
[2]  
Agrawal A., 2018, ARXIV180700343
[3]   X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories [J].
Agrawal, Amogh ;
Jaiswal, Akhilesh ;
Lee, Chankyu ;
Roy, Kaushik .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2018, 65 (12) :4219-4232
[4]   BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W [J].
Ando, Kota ;
Ueyoshi, Kodai ;
Orimo, Kentaro ;
Yonekawa, Haruyoshi ;
Sato, Shimpei ;
Nakahara, Hiroki ;
Takamaeda-Yamazaki, Shinya ;
Ikebe, Masayuki ;
Asai, Tetsuya ;
Kuroda, Tadahiro ;
Motomura, Masato .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2018, 53 (04) :983-994
[5]  
Biswas A, 2018, ISSCC DIG TECH PAP I, P488, DOI 10.1109/ISSCC.2018.8310397
[6]  
Bong K, 2017, ISSCC DIG TECH PAP I, P248, DOI 10.1109/ISSCC.2017.7870354
[7]   A Compact-Area Low-VDDmin 6T SRAM With Improvement in Cell Stability, Read Speed, and Write Margin Using a Dual-Split-Control-Assist Scheme [J].
Chang, Meng-Fan ;
Chen, Chien-Fu ;
Chang, Ting-Hao ;
Shuai, Chi-Chang ;
Wang, Yen-Yao ;
Chen, Yi-Ju ;
Yamauchi, Hiroyuki .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (09) :2498-2514
[8]   A 3T1R Nonvolatile TCAM Using MLC ReRAM for Frequent-Off Instant-On Filters in IoT and Big-Data Processing [J].
Chang, Meng-Fan ;
Lin, Chien-Chen ;
Lee, Albert ;
Chiang, Yen-Ning ;
Kuo, Chia-Chen ;
Yang, Geng-Hau ;
Tsai, Hsiang-Jen ;
Chen, Tien-Fu ;
Sheu, Shyh-Shyuan .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (06) :1664-1679
[9]   A ReRAM-Based 4T2R Nonvolatile TCAM Using RC-Filtered Stress-Decoupled Scheme for Frequent-OFF Instant-ON Search Engines Used in IoT and Big-Data Processing [J].
Chang, Meng-Fan ;
Huang, Lie-Yue ;
Lin, Wen-Zhang ;
Chiang, Yen-Ning ;
Kuo, Chia-Chen ;
Chuang, Ching-Hao ;
Yang, Keng-Hao ;
Tsai, Hsiang-Jen ;
Chen, Tien-Fu ;
Sheu, Shyh-Shyuan .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2016, 51 (11) :2786-2798
[10]  
Chen YH, 2016, ISSCC DIG TECH PAP I, V59, P262, DOI 10.1109/ISSCC.2016.7418007