Mandarin Recognition Based on Self-Attention Mechanism with Deep Convolutional Neural Network (DCNN)-Gated Recurrent Unit (GRU)

被引：1

作者：

Chen, Xun ^{[1
]}

Wang, Chengqi ^{[1
]}

Hu, Chao ^{[1
]}

Wang, Qin ^{[1
]}

机构：

[1] Hainan Univ, Sch Informat & Commun Engn, Haikou 570228, Peoples R China

来源：

BIG DATA AND COGNITIVE COMPUTING | 2024年 / 8卷 / 12期

基金：

中国国家自然科学基金;

关键词：

self-attention mechanism; CTC; gated circulation units;

D O I：

10.3390/bdcc8120195

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech recognition technology is an important branch in the field of artificial intelligence, aiming to transform human speech into computer-readable text information. However, speech recognition technology still faces many challenges, such as noise interference, and accent and speech rate differences. An aim of this paper is to explore a deep learning-based speech recognition method to improve the accuracy and robustness of speech recognition. Firstly, this paper introduces the basic principles of speech recognition and existing mainstream technologies, and then focuses on the deep learning-based speech recognition method. Through comparative experiments, it is found that the self-attention mechanism performs best in speech recognition tasks. In order to further improve speech recognition performance, this paper proposes a deep learning model based on the self-attention mechanism with DCNN-GRU. The model realizes the dynamic attention to an input speech by introducing the self-attention mechanism in a neural network model instead of an RNN and with a deep convolutional neural network, which improves the robustness and recognition accuracy of this model. This experiment uses 170 h of Chinese dataset AISHELL-1. Compared with the deep convolutional neural network, the deep learning model based on the self-attention mechanism with DCNN-GRU accomplishes a reduction of at least 6% in CER. Compared with a bidirectional gated recurrent neural network, the deep learning model based on the self-attention mechanism with DCNN-GRU accomplishes a reduction of 0.7% in CER. And finally, this experiment is performed on a test set analyzed the influencing factors affecting the CER. The experimental results show that this model exhibits good performance in various noise environments and accent conditions.

引用

页数：13

共 25 条

[1] A Training-Efficient Hybrid-Structured Deep Neural Network With Reconfigurable Memristive Synapses
Bai, Kangjun
An, Qiyuan
Liu, Lingjia
Yi, Yang
[J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (01) : 62 - 75
[2] Optimisation of phonetic aware speech recognition through multi-objective evolutionary algorithms
Bird, Jordan J.
Wanner, Elizabeth
Ekart, Aniko
Faria, Diego R.
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2020, 153
[3] A Survey of Deep Learning and Its Applications: A New Paradigm to Machine Learning
Dargan, Shaveta
Kumar, Munish
Ayyagari, Maruthi Rohit
Kumar, Gulshan
[J]. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING, 2020, 27 (04) : 1071 - 1092
[4] A survey on automatic speech recognition systems for Portuguese language and its variations
de Lima, Thales Aguiar
Da Costa-Abreu, Marjory
[J]. COMPUTER SPEECH AND LANGUAGE, 2020, 62 (62)
[5] Gaikwad S. K., 2010, International Journal of Computer Applications, V10, P16, DOI DOI 10.5120/1462-1976
[6] Goh YH, 2019, PROCEEDINGS OF THE 2019 4TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (INCIT), P38, DOI [10.1109/incit.2019.8912049, 10.1109/INCIT.2019.8912049]
[7] A Real-Time End-to-End Multilingual Speech Recognition Architecture
Gonzalez-Dominguez, Javier
Eustis, David
Lopez-Moreno, Ignacio
Senior, Andrew
Beaufays, Francoise
Moreno, Pedro J.
[J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2015, 9 (04) : 749 - 759
[8] Hu SK, 2019, INT CONF ACOUST SPEE, P6555, DOI 10.1109/ICASSP.2019.8682487
[9] Jianliang Meng, 2012, 2012 Fourth International Conference on Computational and Information Sciences (ICCIS), P199, DOI 10.1109/ICCIS.2012.202
[10] Deep Scattering Power Spectrum Features for Robust Speech Recognition
Joy, Neethu M.
Oglic, Dino
Cvetkovic, Zoran
Bell, Peter
Renals, Steve
[J]. INTERSPEECH 2020, 2020, : 1673 - 1677

← 1 2 3 →