Convolutional Grid Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition

被引：4

作者：

Xue, Jiabin ^{[1
]}

Zheng, Tieran ^{[1
]}

Han, Jiqing ^{[1
]}

机构：

[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China

来源：

NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V | 2019年 / 1143卷

基金：

中国国家自然科学基金;

关键词：

Automatic Speech Recognition; Grid-LSTM; Convolutional Neural Network;

D O I：

10.1007/978-3-030-36802-9_76

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Grid Long Short-Term Memory (Grid-LSTM), which is consisted of three steps, i.e., two-dimensional grid splitting, local feature projection, and grid sequence modeling, has been widely used in Automatic Speech Recognition (ASR) tasks, since it has a strong time-frequency modeling ability. However, the network suffers from a serious problem that heavy computing time is always required. It can be found that the reason for this problem is in the last step, two cross-working LSTMs are employed to model time-frequency features in the grid via an analysis of its process. Thus, we try to speed up the Grid-LSTM by using a smaller grid and propose two enhanced Grid-LSTM models, i.e., Convolutional Grid-LSTM (ConvGrid-LSTM) and Multichannel ConvGrid-LSTM (MCConvGrid-LSTM) to reduce the grid size from the two dimensions of the Grid-LSTM respectively. In the frequency axis, we try to do this by using a large frequency stride and further to prevent performance loss by embedding a CNN in the Grid-LSTM. Moreover, in the time axis, we model several adjacent frames by the multichannel processing ability of CNN. Our method achieves 54% relative reduction of training time and 19% relative reduction of Word Error Rate (WER) for a character level End-to-End ASR task.

引用

页码：718 / 726

页数：9

共 50 条

[1] Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition
Oruh, Jane
Viriri, Serestina
Adegun, Adekanmi
IEEE ACCESS, 2022, 10 : 30069 - 30079
[2] Long short-term memory recurrent-neural-network-based bandwidth extension for automatic speech recognition
Tachioka, Yuuki
Ishii, Jun
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2016, 37 (06) : 319 - 321
[3] Long Short-Term Memory based Convolutional Recurrent Neural Networks for Large Vocabulary Speech Recognition
Li, Xiangang
Wu, Xihong
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3219 - 3223
[4] An analysis of Convolutional Long Short-Term Memory Recurrent Neural Networks for gesture recognition
Tsironi, Eleni
Barros, Pablo
Weber, Cornelius
Wermter, Stefan
NEUROCOMPUTING, 2017, 268 : 76 - 86
[5] Speech Emotion Recognition using Convolutional Long Short-Term Memory Neural Network and Support Vector Machines
Kurpukdee, Nattapong
Koriyama, Tomoki
Kobayashi, Takao
Kasuriya, Sawit
Wutiwiwatchai, Chai
Lamsrichan, Poonlap
2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1744 - 1749
[6] BIDIRECTIONAL QUATERNION LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS FOR SPEECH RECOGNITION
Parcollet, Titouan
Morchid, Mohamed
Linares, Georges
De Mori, Renato
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8519 - 8523
[7] Long short-term memory and convolutional neural network for abnormal driving behaviour recognition
Jia, Shuo
Hui, Fei
Li, Shining
Zhao, Xiangmo
Khattak, Asad J.
IET INTELLIGENT TRANSPORT SYSTEMS, 2020, 14 (05) : 306 - 312
[8] A PRIORITIZED GRID LONG SHORT-TERM MEMORY RNN FOR SPEECH RECOGNITION
Hsu, Wei-Ning
Zhang, Yu
Glass, James
2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 467 - 473
[9] MINIMUM WORD ERROR TRAINING OF LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK LANGUAGE MODELS FOR SPEECH RECOGNITION
Hori, Takaaki
Hori, Chiori
Watanabe, Shinji
Hershey, John R.
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5990 - 5994
[10] Automatic speaker recognition from speech signal using bidirectional long-short-term memory recurrent neural network
Devi, Kharibam Jilenkumari
Thongam, Khelchandra
COMPUTATIONAL INTELLIGENCE, 2023, 39 (02) : 170 - 193

← 1 2 3 4 5 →