An analog-AI chip for energy-efficient speech recognition and transcription

被引:107
作者
Ambrogio, S. [1 ]
Narayanan, P. [1 ]
Okazaki, A. [2 ]
Fasoli, A. [1 ]
Mackin, C. [1 ]
Hosokawa, K. [2 ]
Nomura, A. [2 ]
Yasuda, T. [2 ]
Chen, A. [1 ]
Friz, A. [1 ]
Ishii, M. [2 ]
Luquin, J. [1 ]
Kohda, Y. [2 ]
Saulnier, N. [3 ]
Brew, K. [3 ]
Choi, S. [3 ]
Ok, I. [3 ]
Philip, T. [3 ]
Chan, V. [3 ]
Silvestre, C. [3 ]
Ahsan, I. [3 ]
Narayanan, V. [4 ]
Tsai, H. [1 ]
Burr, G. W. [1 ]
机构
[1] IBM Res Almaden, San Jose, CA 95120 USA
[2] IBM Res Tokyo, Kawasaki, Japan
[3] IBM Res Albany, NanoTech Ctr, Albany, NY USA
[4] IBM Thomas J Watson Res Ctr, Yorktown Hts, NY USA
关键词
D O I
10.1038/s41586-023-06337-5
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Models of artificial intelligence (AI) that have billions of parameters can achieve high accuracy across a range of tasks1,2, but they exacerbate the poor energy efficiency of conventional general-purpose processors, such as graphics processing units or central processing units. Analog in-memory computing (analog-AI)3-7 can provide better energy efficiency by performing matrix-vector multiplications in parallel on 'memory tiles'. However, analog-AI has yet to demonstrate software-equivalent (SWeq) accuracy on models that require many such tiles and efficient communication of neural-network activations between the tiles. Here we present an analog-AI chip that combines 35 million phase-change memory devices across 34 tiles, massively parallel inter-tile communication and analog, low-power peripheral circuitry that can achieve up to 12.4 tera-operations per second per watt (TOPS/W) chip-sustained performance. We demonstrate fully end-to-end SWeq accuracy for a small keyword-spotting network and near-SWeq accuracy on the much larger MLPerf8 recurrent neural-network transducer (RNNT), with more than 45 million weights mapped onto more than 140 million phase-change memory devices across five chips. A low-power chip that runs AI models using analog rather than digital computation shows comparable accuracy on speech-recognition tasks but is more than 14 times as energy efficient.
引用
收藏
页码:768 / +
页数:23
相关论文
共 41 条
[1]   Equivalent-accuracy accelerated neural-network training using analogue memory [J].
Ambrogio, Stefano ;
Narayanan, Pritish ;
Tsai, Hsinyu ;
Shelby, Robert M. ;
Boybat, Irem ;
di Nolfo, Carmelo ;
Sidler, Severin ;
Giordano, Massimo ;
Bodini, Martina ;
Farinha, Nathan C. P. ;
Killeen, Benjamin ;
Cheng, Christina ;
Jaoudi, Yassine ;
Burr, Geoffrey W. .
NATURE, 2018, 558 (7708) :60-+
[2]  
[Anonymous], 2023, BETT MACH LEARN EV
[3]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[4]  
Biswas A, 2018, ISSCC DIG TECH PAP I, P488, DOI 10.1109/ISSCC.2018.8310397
[5]  
CHAN WILLIAM., 2021, arXiv, DOI DOI 10.48550/ARXIV.2104.02133
[6]   AI hardware acceleration with analog memory: Microarchitectures for low energy at high speed [J].
Chang, H-Y ;
Narayanan, P. ;
Lewis, S. C. ;
Farinha, N. C. P. ;
Hosokawa, K. ;
Mackin, C. ;
Tsai, H. ;
Ambrogio, S. ;
Chen, A. ;
Burr, G. W. .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2019, 63 (06)
[7]  
Chen GG, 2014, INT CONF ACOUST SPEE
[8]   A 22nm 4Mb 8b-Precision ReRAM Computing-in-Memory Macro with 11.91 to 195.7TOPS/W for Tiny AI Edge Devices [J].
Xue, Cheng-Xin ;
Hung, Je-Min ;
Kao, Hui-Yao ;
Huang, Yen-Hsiang ;
Huang, Sheng-Po ;
Chang, Fu-Chun ;
Chen, Peng ;
Liu, Ta-Wei ;
Jhang, Chuan-Jia ;
Su, Chin-, I ;
Khwa, Win-San ;
Lo, Chung-Chuan ;
Liu, Ren-Shuo ;
Hsieh, Chih-Cheng ;
Tang, Kea-Tiong ;
Chih, Yu-Der ;
Chang, Tsung-Yung Jonathan ;
Chang, Meng-Fan .
2021 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE (ISSCC), 2021, 64 :246-+
[9]   An 89TOPS/W and 16.3TOPS/mm2 All-Digital SRAM-Based Full-Precision Compute-In Memory Macro in 22nm for Machine-Learning Edge Applications [J].
Chih, Yu-Der ;
Lee, Po-Hao ;
Fujiwara, Hidehiro ;
Shih, Yi-Chun ;
Lee, Chia-Fu ;
Naous, Rawan ;
Chen, Yu-Lin ;
Lo, Chieh-Pu ;
Lu, Cheng-Han ;
Mori, Haruki ;
Zhao, Wei-Cheng ;
Sun, Dar ;
Sinangil, Mahmut E. ;
Chen, Yen-Huei ;
Chou, Tan-Li ;
Akarvardar, Kerem ;
Liao, Hung-Jen ;
Wang, Yih ;
Chang, Meng-Fan ;
Chang, Tsung-Yung Jonathan .
2021 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE (ISSCC), 2021, 64 :252-+
[10]   Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].
Dahl, George E. ;
Yu, Dong ;
Deng, Li ;
Acero, Alex .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42