A 4-Kb 1-to-8-bit Configurable 6T SRAM-Based Computation-in-Memory Unit-Macro for CNN-Based AI Edge Processors

被引:68
作者
Chiu, Yen-Cheng [1 ]
Zhang, Zhixiao [2 ,3 ]
Chen, Jia-Jing [1 ]
Si, Xin [2 ,4 ]
Liu, Ruhui [1 ]
Tu, Yung-Ning [1 ]
Su, Jian-Wei [5 ]
Huang, Wei-Hsing [1 ]
Wang, Jing-Hong [1 ]
Wei, Wei-Chen [1 ]
Hung, Je-Min [1 ]
Sheu, Shyh-Shyuan [5 ]
Li, Sih-Han [5 ]
Wu, Chih-I [5 ]
Liu, Ren-Shuo [1 ]
Hsieh, Chih-Cheng [1 ]
Tang, Kea-Tiong [1 ]
Chang, Meng-Fan [1 ]
机构
[1] Natl Tsing Hua Univ, Inst Elect Engn, Hsinchu 30013, Taiwan
[2] Natl Tsing Hua Univ, Hsinchu 30013, Taiwan
[3] Fuzhou Univ, Microelect & Solid State Elect Dept, Fuzhou 350108, Peoples R China
[4] Univ Elect Sci & Technol China, Integrated Circuit Design & Integrat Syst Dept, Chengdu 611731, Peoples R China
[5] Ind Technol Res Inst, Hsinchu 31040, Taiwan
关键词
AI edge processor; CNN; computing-in-memory (CIM); SRAM;
D O I
10.1109/JSSC.2020.3005754
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Previous SRAM-based computing-in-memory (SRAM-CIM) macros suffer small read margins for high-precision operations, large cell array area overhead, and limited compatibility with many input and weight configurations. This work presents a 1-to-8-bit configurable SRAM CIM unit-macro using: 1) a hybrid structure combining 6T-SRAM based in-memory binary product-sum (PS) operations with digital near-memory-computing multibit PS accumulation to increase read accuracy and reduce area overhead; 2) column-based place-value-grouped weight mapping and a serial-bit input (SBIN) mapping scheme to facilitate reconfiguration and increase array efficiency under various input and weight configurations; 3) a self-reference multilevel reader (SRMLR) to reduce read-out energy and achieve a sensing margin 2x that of the midpoint reference scheme; and 4) an input-aware bitline voltage compensation scheme to ensure successful read operations across various input-weight patterns. A 4-Kb configurable 6T-SRAM CIM unit-macro was fabricated using a 55-nm CMOS process with foundry 6T-SRAM cells. The resulting macro achieved access times of 3.5 ns per cycle (pipeline) and energy efficiency of 0.6-40.2 TOPS/W under binary to 8-b input/8-b weight precision.
引用
收藏
页码:2790 / 2801
页数:12
相关论文
共 30 条
[1]   X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories [J].
Agrawal, Amogh ;
Jaiswal, Akhilesh ;
Lee, Chankyu ;
Roy, Kaushik .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2018, 65 (12) :4219-4232
[2]   BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W [J].
Ando, Kota ;
Ueyoshi, Kodai ;
Orimo, Kentaro ;
Yonekawa, Haruyoshi ;
Sato, Shimpei ;
Nakahara, Hiroki ;
Takamaeda-Yamazaki, Shinya ;
Ikebe, Masayuki ;
Asai, Tetsuya ;
Kuroda, Tadahiro ;
Motomura, Masato .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2018, 53 (04) :983-994
[3]  
[Anonymous], 2019, S VLSI TECH
[4]  
[Anonymous], 2009, Tech. Rep. TR-2009
[5]  
Bankman D, 2018, ISSCC DIG TECH PAP I, P222, DOI 10.1109/ISSCC.2018.8310264
[6]  
Biswas A, 2018, ISSCC DIG TECH PAP I, P488, DOI 10.1109/ISSCC.2018.8310397
[7]  
Chang J, 2013, ISSCC DIG TECH PAP I, V56, P316, DOI 10.1109/ISSCC.2013.6487750
[8]  
Chang MF, 2015, ISSCC DIG TECH PAP I, V58, P314
[9]   CMOS-integrated memristive non-volatile computing-in-memory for AI edge processors [J].
Chen, Wei-Hao ;
Dou, Chunmeng ;
Li, Kai-Xiang ;
Lin, Wei-Yu ;
Li, Pin-Yi ;
Huang, Jian-Hao ;
Wang, Jing-Hong ;
Wei, Wei-Chen ;
Xue, Cheng-Xin ;
Chiu, Yen-Cheng ;
King, Ya-Chin ;
Lin, Chorng-Jung ;
Liu, Ren-Shuo ;
Hsieh, Chih-Cheng ;
Tang, Kea-Tiong ;
Yang, J. Joshua ;
Ho, Mon-Shu ;
Chang, Meng-Fan .
NATURE ELECTRONICS, 2019, 2 (09) :420-428
[10]  
Chen WH, 2018, ISSCC DIG TECH PAP I, P494, DOI 10.1109/ISSCC.2018.8310400