A 4-Kb 1-to-8-bit Configurable 6T SRAM-Based Computation-in-Memory Unit-Macro for CNN-Based AI Edge Processors

被引：68

作者：

Chiu, Yen-Cheng ^{[1
]}

Zhang, Zhixiao ^{[2
,3
]}

Chen, Jia-Jing ^{[1
]}

Si, Xin ^{[2
,4
]}

Liu, Ruhui ^{[1
]}

Tu, Yung-Ning ^{[1
]}

Su, Jian-Wei ^{[5
]}

Huang, Wei-Hsing ^{[1
]}

Wang, Jing-Hong ^{[1
]}

Wei, Wei-Chen ^{[1
]}

Hung, Je-Min ^{[1
]}

Sheu, Shyh-Shyuan ^{[5
]}

Li, Sih-Han ^{[5
]}

Wu, Chih-I ^{[5
]}

Liu, Ren-Shuo ^{[1
]}

Hsieh, Chih-Cheng ^{[1
]}

Tang, Kea-Tiong ^{[1
]}

Chang, Meng-Fan ^{[1
]}

机构：

[1] Natl Tsing Hua Univ, Inst Elect Engn, Hsinchu 30013, Taiwan

[2] Natl Tsing Hua Univ, Hsinchu 30013, Taiwan

[3] Fuzhou Univ, Microelect & Solid State Elect Dept, Fuzhou 350108, Peoples R China

[4] Univ Elect Sci & Technol China, Integrated Circuit Design & Integrat Syst Dept, Chengdu 611731, Peoples R China

[5] Ind Technol Res Inst, Hsinchu 31040, Taiwan

来源：

IEEE JOURNAL OF SOLID-STATE CIRCUITS | 2020年 / 55卷 / 10期

关键词：

AI edge processor; CNN; computing-in-memory (CIM); SRAM;

D O I：

10.1109/JSSC.2020.3005754

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Previous SRAM-based computing-in-memory (SRAM-CIM) macros suffer small read margins for high-precision operations, large cell array area overhead, and limited compatibility with many input and weight configurations. This work presents a 1-to-8-bit configurable SRAM CIM unit-macro using: 1) a hybrid structure combining 6T-SRAM based in-memory binary product-sum (PS) operations with digital near-memory-computing multibit PS accumulation to increase read accuracy and reduce area overhead; 2) column-based place-value-grouped weight mapping and a serial-bit input (SBIN) mapping scheme to facilitate reconfiguration and increase array efficiency under various input and weight configurations; 3) a self-reference multilevel reader (SRMLR) to reduce read-out energy and achieve a sensing margin 2x that of the midpoint reference scheme; and 4) an input-aware bitline voltage compensation scheme to ensure successful read operations across various input-weight patterns. A 4-Kb configurable 6T-SRAM CIM unit-macro was fabricated using a 55-nm CMOS process with foundry 6T-SRAM cells. The resulting macro achieved access times of 3.5 ns per cycle (pipeline) and energy efficiency of 0.6-40.2 TOPS/W under binary to 8-b input/8-b weight precision.

引用

页码：2790 / 2801

页数：12

共 30 条

[21]

Mochida R, 2018, 2018 IEEE SYMPOSIUM ON VLSI TECHNOLOGY, P175, DOI 10.1109/VLSIT.2018.8510676

[22]

Si X, 2019, ISSCC DIG TECH PAP I, V62, P396, DOI 10.1109/ISSCC.2019.8662392

[23]

Song T, 2014, ISSCC DIG TECH PAP I, V57, P232, DOI 10.1109/ISSCC.2014.6757413

[24]

Tang KT, 2019, SYMP VLSI CIRCUITS, pT166, DOI [10.23919/VLSIT.2019.8776560, 10.23919/vlsit.2019.8776560]

[25]

Ueyoshi K, 2018, ISSCC DIG TECH PAP I, P216, DOI 10.1109/ISSCC.2018.8310261

[26]

Valavi H, 2018, SYMP VLSI CIRCUITS, P141, DOI 10.1109/VLSIC.2018.8502421

[27] Yield and speed optimization of a latch-type voltage sense amplifier [J].

Wicht, B ;

Nirschl, T ;

Schmitt-Landsiedel, D .

IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2004, 39 (07) :1148-1158

[28]

Xue CX, 2019, ISSCC DIG TECH PAP I, V62, P388, DOI 10.1109/ISSCC.2019.8662395

[29]

Yang J, 2019, ISSCC DIG TECH PAP I, V62, P394, DOI 10.1109/ISSCC.2019.8662435

[30] In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array [J].

Zhang, Jintao ;

Wang, Zhuo ;

Verma, Naveen .

IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (04) :915-924

← 1 2 3 →