Mandarin Recognition Based on Self-Attention Mechanism with Deep Convolutional Neural Network (DCNN)-Gated Recurrent Unit (GRU)

被引:1
作者
Chen, Xun [1 ]
Wang, Chengqi [1 ]
Hu, Chao [1 ]
Wang, Qin [1 ]
机构
[1] Hainan Univ, Sch Informat & Commun Engn, Haikou 570228, Peoples R China
基金
中国国家自然科学基金;
关键词
self-attention mechanism; CTC; gated circulation units;
D O I
10.3390/bdcc8120195
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech recognition technology is an important branch in the field of artificial intelligence, aiming to transform human speech into computer-readable text information. However, speech recognition technology still faces many challenges, such as noise interference, and accent and speech rate differences. An aim of this paper is to explore a deep learning-based speech recognition method to improve the accuracy and robustness of speech recognition. Firstly, this paper introduces the basic principles of speech recognition and existing mainstream technologies, and then focuses on the deep learning-based speech recognition method. Through comparative experiments, it is found that the self-attention mechanism performs best in speech recognition tasks. In order to further improve speech recognition performance, this paper proposes a deep learning model based on the self-attention mechanism with DCNN-GRU. The model realizes the dynamic attention to an input speech by introducing the self-attention mechanism in a neural network model instead of an RNN and with a deep convolutional neural network, which improves the robustness and recognition accuracy of this model. This experiment uses 170 h of Chinese dataset AISHELL-1. Compared with the deep convolutional neural network, the deep learning model based on the self-attention mechanism with DCNN-GRU accomplishes a reduction of at least 6% in CER. Compared with a bidirectional gated recurrent neural network, the deep learning model based on the self-attention mechanism with DCNN-GRU accomplishes a reduction of 0.7% in CER. And finally, this experiment is performed on a test set analyzed the influencing factors affecting the CER. The experimental results show that this model exhibits good performance in various noise environments and accent conditions.
引用
收藏
页数:13
相关论文
共 25 条
  • [1] A Training-Efficient Hybrid-Structured Deep Neural Network With Reconfigurable Memristive Synapses
    Bai, Kangjun
    An, Qiyuan
    Liu, Lingjia
    Yi, Yang
    [J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (01) : 62 - 75
  • [2] Optimisation of phonetic aware speech recognition through multi-objective evolutionary algorithms
    Bird, Jordan J.
    Wanner, Elizabeth
    Ekart, Aniko
    Faria, Diego R.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2020, 153
  • [3] A Survey of Deep Learning and Its Applications: A New Paradigm to Machine Learning
    Dargan, Shaveta
    Kumar, Munish
    Ayyagari, Maruthi Rohit
    Kumar, Gulshan
    [J]. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING, 2020, 27 (04) : 1071 - 1092
  • [4] A survey on automatic speech recognition systems for Portuguese language and its variations
    de Lima, Thales Aguiar
    Da Costa-Abreu, Marjory
    [J]. COMPUTER SPEECH AND LANGUAGE, 2020, 62 (62)
  • [5] Gaikwad S. K., 2010, International Journal of Computer Applications, V10, P16, DOI DOI 10.5120/1462-1976
  • [6] Goh YH, 2019, PROCEEDINGS OF THE 2019 4TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (INCIT), P38, DOI [10.1109/incit.2019.8912049, 10.1109/INCIT.2019.8912049]
  • [7] A Real-Time End-to-End Multilingual Speech Recognition Architecture
    Gonzalez-Dominguez, Javier
    Eustis, David
    Lopez-Moreno, Ignacio
    Senior, Andrew
    Beaufays, Francoise
    Moreno, Pedro J.
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2015, 9 (04) : 749 - 759
  • [8] Hu SK, 2019, INT CONF ACOUST SPEE, P6555, DOI 10.1109/ICASSP.2019.8682487
  • [9] Jianliang Meng, 2012, 2012 Fourth International Conference on Computational and Information Sciences (ICCIS), P199, DOI 10.1109/ICCIS.2012.202
  • [10] Deep Scattering Power Spectrum Features for Robust Speech Recognition
    Joy, Neethu M.
    Oglic, Dino
    Cvetkovic, Zoran
    Bell, Peter
    Renals, Steve
    [J]. INTERSPEECH 2020, 2020, : 1673 - 1677