Convolutional Grid Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition

被引:4
|
作者
Xue, Jiabin [1 ]
Zheng, Tieran [1 ]
Han, Jiqing [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
基金
中国国家自然科学基金;
关键词
Automatic Speech Recognition; Grid-LSTM; Convolutional Neural Network;
D O I
10.1007/978-3-030-36802-9_76
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Grid Long Short-Term Memory (Grid-LSTM), which is consisted of three steps, i.e., two-dimensional grid splitting, local feature projection, and grid sequence modeling, has been widely used in Automatic Speech Recognition (ASR) tasks, since it has a strong time-frequency modeling ability. However, the network suffers from a serious problem that heavy computing time is always required. It can be found that the reason for this problem is in the last step, two cross-working LSTMs are employed to model time-frequency features in the grid via an analysis of its process. Thus, we try to speed up the Grid-LSTM by using a smaller grid and propose two enhanced Grid-LSTM models, i.e., Convolutional Grid-LSTM (ConvGrid-LSTM) and Multichannel ConvGrid-LSTM (MCConvGrid-LSTM) to reduce the grid size from the two dimensions of the Grid-LSTM respectively. In the frequency axis, we try to do this by using a large frequency stride and further to prevent performance loss by embedding a CNN in the Grid-LSTM. Moreover, in the time axis, we model several adjacent frames by the multichannel processing ability of CNN. Our method achieves 54% relative reduction of training time and 19% relative reduction of Word Error Rate (WER) for a character level End-to-End ASR task.
引用
收藏
页码:718 / 726
页数:9
相关论文
共 50 条
  • [1] Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition
    Oruh, Jane
    Viriri, Serestina
    Adegun, Adekanmi
    IEEE ACCESS, 2022, 10 : 30069 - 30079
  • [2] Long short-term memory recurrent-neural-network-based bandwidth extension for automatic speech recognition
    Tachioka, Yuuki
    Ishii, Jun
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2016, 37 (06) : 319 - 321
  • [3] Long Short-Term Memory based Convolutional Recurrent Neural Networks for Large Vocabulary Speech Recognition
    Li, Xiangang
    Wu, Xihong
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3219 - 3223
  • [4] An analysis of Convolutional Long Short-Term Memory Recurrent Neural Networks for gesture recognition
    Tsironi, Eleni
    Barros, Pablo
    Weber, Cornelius
    Wermter, Stefan
    NEUROCOMPUTING, 2017, 268 : 76 - 86
  • [5] Speech Emotion Recognition using Convolutional Long Short-Term Memory Neural Network and Support Vector Machines
    Kurpukdee, Nattapong
    Koriyama, Tomoki
    Kobayashi, Takao
    Kasuriya, Sawit
    Wutiwiwatchai, Chai
    Lamsrichan, Poonlap
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1744 - 1749
  • [6] BIDIRECTIONAL QUATERNION LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS FOR SPEECH RECOGNITION
    Parcollet, Titouan
    Morchid, Mohamed
    Linares, Georges
    De Mori, Renato
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8519 - 8523
  • [7] Long short-term memory and convolutional neural network for abnormal driving behaviour recognition
    Jia, Shuo
    Hui, Fei
    Li, Shining
    Zhao, Xiangmo
    Khattak, Asad J.
    IET INTELLIGENT TRANSPORT SYSTEMS, 2020, 14 (05) : 306 - 312
  • [8] A PRIORITIZED GRID LONG SHORT-TERM MEMORY RNN FOR SPEECH RECOGNITION
    Hsu, Wei-Ning
    Zhang, Yu
    Glass, James
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 467 - 473
  • [9] MINIMUM WORD ERROR TRAINING OF LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK LANGUAGE MODELS FOR SPEECH RECOGNITION
    Hori, Takaaki
    Hori, Chiori
    Watanabe, Shinji
    Hershey, John R.
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5990 - 5994
  • [10] Automatic speaker recognition from speech signal using bidirectional long-short-term memory recurrent neural network
    Devi, Kharibam Jilenkumari
    Thongam, Khelchandra
    COMPUTATIONAL INTELLIGENCE, 2023, 39 (02) : 170 - 193