English speech recognition based on deep learning with multiple features

被引:2
|
作者
Zhaojuan Song
机构
[1] School of Translation Studies of Qufu Normal University,
来源
Computing | 2020年 / 102卷
关键词
Deep neural network; Fusion; Speech recognition; Multiple features; 68T10; 68T35; 68T50;
D O I
暂无
中图分类号
学科分类号
摘要
English is one of the widely used languages, with the shrinking of the global village, the smart home, the in-vehicle voice system and voice recognition software with English as the recognition language have gradually entered people’s field of vision, and have obtained the majority of users’ love by the practical accuracy. And deep learning technology in many tasks with its hierarchical feature learning ability and data modeling capabilities has achieved more than the performance of shallow learning technology. Therefore, this paper takes English speech as the research object, and proposes a deep learning speech recognition algorithm that combines speech features and speech attributes. Firstly, the deep neural network supervised learning method is used to extract the high-level features of the speech, select the output of the fixed hidden layer as the new speech feature for the newly generated network, and train the GMM–HMM acoustic model with the new speech features; secondly, the speech attribute extractor based on deep neural network is trained for multiple speech attributes, and the extracted speech attributes are classified into phoneme by deep neural network; finally, speech features and speech attribute features are merged into the same CNN framework by the neural network based on the linear feature fusion algorithm. The experimental results show that the proposed English speech recognition algorithm based on deep neural network with multiple features can directly and effectively combine the two methods by combining the speech features and the speech attributes of the speaker in the input layer of the deep neural network, and it can improve the performance of the English speech recognition system significantly.
引用
收藏
页码:663 / 682
页数:19
相关论文
共 50 条
  • [31] Indonesian speech recognition based on Deep Neural Network
    Yang, Ruolin
    Yang, Jian
    Lu, Yu
    2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 36 - 41
  • [32] Factors in Emotion Recognition With Deep Learning Models Using Speech and Text on Multiple Corpora
    Braunschweiler, Norbert
    Doddipatla, Rama
    Keizer, Simon
    Stoyanchev, Svetlana
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 722 - 726
  • [33] Research on English pronunciation training based on intelligent speech recognition
    Cai J.
    Liu Y.
    International Journal of Speech Technology, 2018, 21 (3) : 633 - 640
  • [34] Speech Based Multiple Emotion Classification Model Using Deep Learning
    Patneedi, Shakti Swaroop
    Kumari, Nandini
    ADVANCES IN COMPUTING AND DATA SCIENCES, PT I, 2021, 1440 : 648 - 659
  • [35] English Speech Recognition Based on Artificial Intelligence
    Bai, Tana
    AGRO FOOD INDUSTRY HI-TECH, 2017, 28 (03): : 2259 - 2263
  • [36] Low-resource Sinhala Speech Recognition using Deep Learning
    Karunathilaka, Hirunika
    Welgama, Viraj
    Nadungodage, Thilini
    Weerasinghe, Ruvan
    2020 20TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER-2020), 2020, : 196 - 201
  • [37] Learning Speech Rate in Speech Recognition
    Zeng, Xiangyu
    Yin, Shi
    Wang, Dong
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 528 - 532
  • [38] Sound signal analysis in Japanese speech recognition based on deep learning algorithm
    Yang, Xiaoxing
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2023,
  • [39] Research on Chinese Speech Emotion Recognition Based on Deep Neural Network and Acoustic Features
    Lee, Ming-Che
    Yeh, Sheng-Cheng
    Chang, Jia-Wei
    Chen, Zhen-Yi
    SENSORS, 2022, 22 (13)
  • [40] Deep and Wide: Multiple Layers in Automatic Speech Recognition
    Morgan, Nelson
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 7 - 13