English speech recognition based on deep learning with multiple features

被引：2

作者：

Zhaojuan Song

机构：

[1] School of Translation Studies of Qufu Normal University,

来源：

Computing | 2020年 / 102卷

关键词：

Deep neural network; Fusion; Speech recognition; Multiple features; 68T10; 68T35; 68T50;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

English is one of the widely used languages, with the shrinking of the global village, the smart home, the in-vehicle voice system and voice recognition software with English as the recognition language have gradually entered people’s field of vision, and have obtained the majority of users’ love by the practical accuracy. And deep learning technology in many tasks with its hierarchical feature learning ability and data modeling capabilities has achieved more than the performance of shallow learning technology. Therefore, this paper takes English speech as the research object, and proposes a deep learning speech recognition algorithm that combines speech features and speech attributes. Firstly, the deep neural network supervised learning method is used to extract the high-level features of the speech, select the output of the fixed hidden layer as the new speech feature for the newly generated network, and train the GMM–HMM acoustic model with the new speech features; secondly, the speech attribute extractor based on deep neural network is trained for multiple speech attributes, and the extracted speech attributes are classified into phoneme by deep neural network; finally, speech features and speech attribute features are merged into the same CNN framework by the neural network based on the linear feature fusion algorithm. The experimental results show that the proposed English speech recognition algorithm based on deep neural network with multiple features can directly and effectively combine the two methods by combining the speech features and the speech attributes of the speaker in the input layer of the deep neural network, and it can improve the performance of the English speech recognition system significantly.

引用

页码：663 / 682

页数：19

共 50 条

[31] Indonesian speech recognition based on Deep Neural Network
Yang, Ruolin
Yang, Jian
Lu, Yu
2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 36 - 41
[32] Factors in Emotion Recognition With Deep Learning Models Using Speech and Text on Multiple Corpora
Braunschweiler, Norbert
Doddipatla, Rama
Keizer, Simon
Stoyanchev, Svetlana
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 722 - 726
[33] Research on English pronunciation training based on intelligent speech recognition
Cai J.
Liu Y.
International Journal of Speech Technology, 2018, 21 (3) : 633 - 640
[34] Speech Based Multiple Emotion Classification Model Using Deep Learning
Patneedi, Shakti Swaroop
Kumari, Nandini
ADVANCES IN COMPUTING AND DATA SCIENCES, PT I, 2021, 1440 : 648 - 659
[35] English Speech Recognition Based on Artificial Intelligence
Bai, Tana
AGRO FOOD INDUSTRY HI-TECH, 2017, 28 (03): : 2259 - 2263
[36] Low-resource Sinhala Speech Recognition using Deep Learning
Karunathilaka, Hirunika
Welgama, Viraj
Nadungodage, Thilini
Weerasinghe, Ruvan
2020 20TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER-2020), 2020, : 196 - 201
[37] Learning Speech Rate in Speech Recognition
Zeng, Xiangyu
Yin, Shi
Wang, Dong
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 528 - 532
[38] Sound signal analysis in Japanese speech recognition based on deep learning algorithm
Yang, Xiaoxing
INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2023,
[39] Research on Chinese Speech Emotion Recognition Based on Deep Neural Network and Acoustic Features
Lee, Ming-Che
Yeh, Sheng-Cheng
Chang, Jia-Wei
Chen, Zhen-Yi
SENSORS, 2022, 22 (13)
[40] Deep and Wide: Multiple Layers in Automatic Speech Recognition
Morgan, Nelson
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 7 - 13

← 1 2 3 4 5 →