Material to system-level benchmarking of CMOS-integrated RRAM with ultra-fast switching for low power on-chip learning

被引:21
作者
Abedin, Minhaz [1 ,4 ]
Gong, Nanbo [2 ]
Beckmann, Karsten [1 ,3 ]
Liehr, Maximilian [1 ]
Saraf, Iqbal [4 ]
Van der Straten, Oscar [4 ]
Ando, Takashi [2 ]
Cady, Nathaniel [1 ]
机构
[1] SUNY Albany, Coll Nanotechnol Sci & Engn, Albany, NY 12203 USA
[2] IBM Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
[3] NY CREATES, Albany, NY 12203 USA
[4] IBM Res, Albany, NY 12203 USA
关键词
MEMORY;
D O I
10.1038/s41598-023-42214-x
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Analog hardware-based training provides a promising solution to developing state-of-the-art power-hungry artificial intelligence models. Non-volatile memory hardware such as resistive random access memory (RRAM) has the potential to provide a low power alternative. The training accuracy of analog hardware depends on RRAM switching properties including the number of discrete conductance states and conductance variability. Furthermore, the overall power consumption of the system inversely correlates with the RRAM devices conductance. To study material dependence of these properties, TaOx and HfOx RRAM devices in one-transistor one-RRAM configuration (1T1R) were fabricated using a custom 65 nm CMOS fabrication process. Analog switching performance was studied with a range of initial forming compliance current (200-500 mu A) and analog switching tests with ultra-short pulse width (300 ps) was carried out. We report that by utilizing low current during electroforming and high compliance current during analog switching, a large number of RRAM conductance states can be achieved while maintaining low conductance state. While both TaOx and HfOx could be switched to more than 20 distinct states, TaOx devices exhibited 10x lower conductance, which reduces total power consumption for array-level operations. Furthermore, we adopted an analog, fully in-memory training algorithm for system-level training accuracy benchmarking and showed that implementing TaOx 1T1R cells could yield an accuracy of up to 96.4% compared to 97% for the floating-point arithmetic baseline, while implementing HfOx devices would yield a maximum accuracy of 90.5%. Our experimental work and benchmarking approach paves the path for future materials engineering in analog-AI hardware for a low-power environment training.
引用
收藏
页数:10
相关论文
共 33 条
[1]  
Agarwal S, 2016, IEEE IJCNN, P929, DOI 10.1109/IJCNN.2016.7727298
[2]  
Azzaz M., 2016, Endurance/Retention Trade Off in HfOx and TaOx based RRAM, P1
[3]   Towards Synaptic Behavior of Nanoscale ReRAM Devices for Neuromorphic Computing Applications [J].
Beckmann, Karsten ;
Olin-Ammentorp, Wilkie ;
Chakma, Gangotree ;
Amer, Sherif ;
Rose, Garrett S. ;
Hobbs, Chris ;
Van Nostrand, Joseph ;
Rodgers, Martin ;
Cady, Nathaniel C. .
ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2020, 16 (02)
[4]   Deep Learning for AI [J].
Bengio, Yoshua ;
Lecun, Yann ;
Hinton, Geoffrey .
COMMUNICATIONS OF THE ACM, 2021, 64 (07) :58-65
[5]  
Biewald L., 2020, Experiment tracking with weights and biases
[6]   AI hardware acceleration with analog memory: Microarchitectures for low energy at high speed [J].
Chang, H-Y ;
Narayanan, P. ;
Lewis, S. C. ;
Farinha, N. C. P. ;
Hosokawa, K. ;
Mackin, C. ;
Tsai, H. ;
Ambrogio, S. ;
Chen, A. ;
Burr, G. W. .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2019, 63 (06)
[7]  
Chen PY, 2015, ICCAD-IEEE ACM INT, P194, DOI 10.1109/ICCAD.2015.7372570
[8]   MEMRISTOR - MISSING CIRCUIT ELEMENT [J].
CHUA, LO .
IEEE TRANSACTIONS ON CIRCUIT THEORY, 1971, CT18 (05) :507-+
[9]   Enabling Training of Neural Networks on Noisy Hardware [J].
Gokmen, Tayfun .
FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2021, 4
[10]   Algorithm for Training Neural Networks on Resistive Device Arrays [J].
Gokmen, Tayfun ;
Haensch, Wilfried .
FRONTIERS IN NEUROSCIENCE, 2020, 14