End-to-End Spiking Neural Network for Speech Recognition Using Resonating Input Neurons

被引：10

作者：

Auge, Daniel ^{[1
]}

Hille, Julian ^{[1
,2
]}

Kreutz, Felix ^{[3
]}

Mueller, Etienne ^{[1
]}

Knoll, Alois ^{[1
]}

机构：

[1] Tech Univ Munich, Dept Informat, Munich, Germany

[2] Infineon Technol AG, Munich, Germany

[3] Infineon Technol Dresden GmbH & Co KG, Dresden, Germany

来源：

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V | 2021年 / 12895卷

关键词：

Spiking neural networks; Speech processing; Keyword detection;

D O I：

10.1007/978-3-030-86383-8_20

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The growing demand for complex computations in edge devices requires the development of algorithms and hardware accelerators that are powerful while remaining energy-efficient. A possible solution are spiking neural networks, as they have been demonstrated to be energy-efficient in several data processing and classification tasks when executed on specialized neuromorphic hardware. In the field of speech processing, they are especially suited for the online classification of audio streams due to their strong temporal affinity. However, so far, there has been a lack of emphasis on small-scale networks that will ultimately fit into restricted neuromorphic implementations. We propose the use of resonating neurons as an input layer to spiking neural networks for online audio classification to enable an end-to-end solution. We compare different architectures to the established method of using mel-frequency-based spectral features. With our approach, spiking neural networks can be directly used without additional preprocessing, thereby making them suitable for simple continuous low-power analysis of audio streams. We compare the classification accuracy of different network architectures with ours in a keyword spotting benchmark to demonstrate the performance of our approach.

引用

页码：245 / 256

页数：12

共 50 条

[1] End-to-End Speech Emotion Recognition Based on Neural Network
Zhu, Bing
Zhou, Wenkai
Wang, Yutian
Wang, Hui
Cai, Juan Juan
2017 17TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT 2017), 2017, : 1634 - 1638
[2] Contextual Speech Recognition in End-to-End Neural Network Systems using Beam Search
Williams, Ian
Kannan, Anjuli
Aleksci, Petar
Rybach, David
Sainath, Tara N.
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2227 - 2231
[3] Hybrid Input-type Recurrent Neural Network Language Modeling for End-to-end Speech Recognition
Sertsi, Phuttapong
Lamsrichan, Poonlap
Chunwijitra, Vataya
Okumura, Manabu
2021 18TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE-2021), 2021,
[4] Unidirectional Neural Network Architectures for End-to-End Automatic Speech Recognition
Moritz, Niko
Hori, Takaaki
Le Roux, Jonathan
INTERSPEECH 2019, 2019, : 76 - 80
[5] END-TO-END SPEECH EMOTION RECOGNITION USING DEEP NEURAL NETWORKS
Tzirakis, Panagiotis
Zhang, Jiehao
Schuller, Bjoern W.
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5089 - 5093
[6] Insights on Neural Representations for End-to-End Speech Recognition
Ollerenshaw, Anna
Jalal, Asif
Hain, Thomas
INTERSPEECH 2021, 2021, : 4079 - 4083
[7] EXPLORING NEURAL TRANSDUCERS FOR END-TO-END SPEECH RECOGNITION
Battenberg, Eric
Chen, Jitong
Child, Rewon
Coates, Adam
Gaur, Yashesh
Li, Yi
Liu, Hairong
Satheesh, Sanjeev
Sriram, Anuroop
Zhu, Zhenyao
2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 206 - 213
[8] End-to-End Neural Segmental Models for Speech Recognition
Tang, Hao
Lu, Liang
Kong, Lingpeng
Gimpel, Kevin
Livescu, Karen
Dyer, Chris
Smith, Noah A.
Renals, Steve
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1254 - 1264
[9] End-to-End Speech Command Recognition with Capsule Network
Bae, Jaesung
Kim, Dae-Shik
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 776 - 780
[10] Extract, Adapt and Recognize: an End-to-end Neural Network for Corrupted Monaural Speech Recognition
Lam, Max W. Y.
Wang, Jun
Liu, Xunying
Meng, Helen
Su, Dan
Yu, Dong
INTERSPEECH 2019, 2019, : 2778 - 2782

← 1 2 3 4 5 →