Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition

被引：79

作者：

Wu, Jibin ^{[1
]}

Yilmaz, Emre ^{[1
]}

Zhang, Malu ^{[1
]}

Li, Haizhou ^{[1
,2
]}

Tan, Kay Chen ^{[3
]}

机构：

[1] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore

[2] Univ Bremen, Fac Comp Sci & Math, Bremen, Germany

[3] City Univ Hong Kong, Dept Comp Sci, Kowloon Tong, Hong Kong, Peoples R China

来源：

FRONTIERS IN NEUROSCIENCE | 2020年 / 14卷

关键词：

deep spiking neural networks; automatic speech recognition; tandem learning; neuromorphic computing; acoustic modeling; HIDDEN MARKOV-MODELS; COMMUNICATION; ARCHITECTURE;

D O I：

10.3389/fnins.2020.00199

中图分类号：

Q189 [神经科学];

学科分类号：

071006 ;

摘要：

Artificial neural networks (ANN) have become the mainstream acoustic modeling technique for large vocabulary automatic speech recognition (ASR). A conventional ANN features a multi-layer architecture that requires massive amounts of computation. The brain-inspired spiking neural networks (SNN) closely mimic the biological neural networks and can operate on low-power neuromorphic hardware with spike-based computation. Motivated by their unprecedented energy-efficiency and rapid information processing capability, we explore the use of SNNs for speech recognition. In this work, we use SNNs for acoustic modeling and evaluate their performance on several large vocabulary recognition scenarios. The experimental results demonstrate competitive ASR accuracies to their ANN counterparts, while require only 10 algorithmic time steps and as low as 0.68 times total synaptic operations to classify each audio frame. Integrating the algorithmic power of deep SNNs with energy-efficient neuromorphic hardware, therefore, offer an attractive solution for ASR applications running locally on mobile and embedded devices.

引用

页数：14

共 50 条

[31] INVESTIGATION OF DEEP NEURAL NETWORKS (DNN) FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION: WHY DNN SURPASSES GMMS IN ACOUSTIC MODELING
Pan, Jia
Liu, Cong
Wang, Zhiguo
Hu, Yu
Jiang, Hui
2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 301 - 305
[32] PHONEME BASED NEURAL TRANSDUCER FOR LARGE VOCABULARY SPEECH RECOGNITION
Zhou, Wei
Berger, Simon
Schlueter, Ralf
Ney, Hermann
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5644 - 5648
[33] MULTI-SCALE FEATURE BASED CONVOLUTIONAL NEURAL NETWORKS FOR LARGE VOCABULARY SPEECH RECOGNITION
Fu, Tong
Wu, Xihong
2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 1093 - 1098
[34] Deep Neural Networks in Russian Speech Recognition
Markovnikov, Nikita
Kipyatkova, Irina
Karpov, Alexey
Filchenkov, Andrey
ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE, 2018, 789 : 54 - 67
[35] DEEP MAXOUT NEURAL NETWORKS FOR SPEECH RECOGNITION
Cai, Meng
Shi, Yongzhe
Liu, Jia
2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 291 - 296
[36] Deep Segmental Neural Networks for Speech Recognition
Abdel-Hamid, Ossama
Deng, Li
Yu, Dong
Jiang, Hui
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1848 - 1852
[37] SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS
Graves, Alex
Mohamed, Abdel-rahman
Hinton, Geoffrey
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6645 - 6649
[38] Binary Deep Neural Networks for Speech Recognition
Xiang, Xu
Qian, Yanmin
Yu, Kai
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 533 - 537
[39] Automatic language identification using large vocabulary continuous speech recognition
Mendoza, S
Gillick, L
Ito, Y
Lowe, S
Newmann, M
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 785 - 788
[40] IMPROVING ROBUSTNESS OF DEEP NEURAL NETWORKS VIA SPECTRAL MASKING FOR AUTOMATIC SPEECH RECOGNITION
Li, Bo
Sim, Khe Chai
2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 279 - 284

← 1 2 3 4 5 →