Research on a Deep Learning Method for Speech Recognition

被引:0
|
作者
Xiao, Jia [1 ]
Xiaolin, Sun [1 ]
机构
[1] Artificial Intelligence and Software Engineering, Nanyang Normal University, Nanyang,473061, China
关键词
Audition - Convolution - Deep neural networks - Speech enhancement - Speech recognition;
D O I
暂无
中图分类号
学科分类号
摘要
Deep convolutional neural network (CNN) has been widely used in speech recognition technology. The model based on deep CNN can effectively improve the quality of human-computer interaction. However, the existing CNN with fixed convolutional kernel size has a disadvantage on extracting data features. It is hard to effectively identify whether the extracted features sufficient or not. As a result, a self-tuning convolutional kernel (STCK) algorithm is proposed to solve the mentioned problem. Firstly, the computational process of STCK algorithm is derived. Then the calculation formula of the convolutional kernel size is obtained. Meanwhile, Bark-spectrum is introduced to extract the spectrogram of speech signal, which is used as the CNN input to adapt to the human hearing. In addition, the data enhancement strategies are proposed, namely frame channel shielding and Bark-band channel shielding. The presented strategies can further improve the generalization ability of the recognition model. The experimental results show that, compared with another two models (the CNN model without STCK algorithm and the CNN model without the data enhancement strategy), the training loss of the proposed method is minimum. And the recognition error rates for the test samples are reduced by 3.9% and 1%, respectively. © (2024), (International Association of Engineers). All Rights Reserved.
引用
收藏
页码:1272 / 1280
相关论文
共 50 条
  • [1] A deep interpretable representation learning method for speech emotion recognition
    Jing, Erkang
    Liu, Yezheng
    Chai, Yidong
    Sun, Jianshan
    Samtani, Sagar
    Jiang, Yuanchun
    Qian, Yang
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (06)
  • [2] Research on English Vocabulary and Speech Corpus Recognition Based on Deep Learning
    Zhen, Wang
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [3] Speech Emotion Recognition with Deep Learning
    Harar, Pavol
    Burget, Radim
    Dutta, Malay Kishore
    2017 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2017, : 137 - 140
  • [4] Speech Recognition using Deep Learning
    Lakkhanawannakun, Phoemporn
    Noyunsan, Chaluemwut
    2019 34TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC 2019), 2019, : 514 - 517
  • [5] Deep Learning for Emotional Speech Recognition
    Sanchez-Gutierrez, Maximo E.
    Marcelo Albornoz, E.
    Martinez-Licona, Fabiola
    Leonardo Rufiner, H.
    Goddard, John
    PATTERN RECOGNITION, MCPR 2014, 2014, 8495 : 311 - +
  • [6] Deep Learning for Emotional Speech Recognition
    Alhamada, M., I
    Khalifa, O. O.
    Abdalla, A. H.
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON ELECTRONIC DEVICES, SYSTEMS AND APPLICATIONS (ICEDSA2020), 2020, 2306
  • [7] An Emotion Recognition Method Using Speech Signals Based on Deep Learning
    Byun, Sung-woo
    Shin, Bo-ra
    Lee, Seok-Pil
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 181 - 182
  • [8] Deep learning approaches for speech emotion recognition: state of the art and research challenges
    Jahangir, Rashid
    Teh, Ying Wah
    Hanif, Faiqa
    Mujtaba, Ghulam
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (16) : 23745 - 23812
  • [9] RESEARCH ON HUMAN POSTURE RECOGNITION METHOD BASED ON DEEP LEARNING
    Shan, Ziran
    Li, Zhipeng
    Song, Wenli
    JOURNAL OF MECHANICS IN MEDICINE AND BIOLOGY, 2024, 24 (02)
  • [10] RESEARCH ON ORE FRAGMENTATION RECOGNITION METHOD BASED ON DEEP LEARNING
    Jing, Hongdi
    He, Wenxuan
    Yu, Miao
    Li, Xin
    Zhang, Xingfan
    Liu, Xiaosong
    Cui, Yang
    Wang, Zhijian
    ARCHIVES OF MINING SCIENCES, 2024, 69 (03) : 447 - 459