Improving Low-Resource Speech Recognition Based on Improved NN-HMM Structures

被引:11
作者
Sun, Xiusong [1 ]
Yang, Qun [1 ]
Liu, Shaohan [1 ]
Yuan, Xin [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 2116, Peoples R China
关键词
Hidden Markov models; Task analysis; Speech recognition; Acoustics; Artificial neural networks; Computational modeling; Low-resource; speech recognition; multitask learning; acoustic modeling; feature combinations;
D O I
10.1109/ACCESS.2020.2988365
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The performance of the ASR system is unsatisfactory in a low-resource environment. In this paper, we investigated the effectiveness of three approaches to improve the performance of the acoustic models in low-resource environments. They are Mono-and-triphone Learning, Soft One-hot Label and Feature Combinations. We applied these three methods to the network architecture and compared their results with baselines. Our proposal has achieved remarkable improvement in the task of mandarin speech recognition in the hybrid hidden Markov model - neural network approach on phoneme level. In order to verify the generalization ability of our proposed method, we conducted many comparative experiments on DNN, RNN, LSTM and other network structures. The experimental results show that our method is applicable to almost all currently widely used network structures. Compared to baselines, our proposals achieved an average relative Character Error Rate (CER) reduction of 8.0 & x0025;. In our experiments, the size of training data is & x007E;10 hours, and we did not use data augmentation or transfer learning methods, which means that we did not use any additional data.
引用
收藏
页码:73005 / 73014
页数:10
相关论文
共 50 条
  • [31] Improving Low-Resource CD-DNN-HMM using Dropout and Multilingual DNN Training
    Miao, Yajie
    Metze, Florian
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2236 - 2240
  • [32] Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora
    Nespoli, Francesco
    Barreda, Daniel
    Naylor, Patrick A.
    [J]. FIFTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, IEEECONF, 2023, : 1080 - 1084
  • [33] End-to-End Low-Resource Speech Recognition with a Deep CNN-LSTM Encoder
    Wang, Weizhe
    Yang, Xiaodong
    Yang, Hongwu
    [J]. 2020 IEEE 3RD INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP 2020), 2020, : 158 - 162
  • [34] Transfer Ability of Monolingual Wav2vec2.0 for Low-resource Speech Recognition
    Yi, Cheng
    Wang, Jianzong
    Cheng, Ning
    Zhou, Shiyu
    Xu, Bo
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [35] Robust Speech Recognition using Meta-learning for Low-resource Accents
    Eledath, Dhanya
    Baby, Arun
    Singh, Shatrughan
    [J]. 2024 NATIONAL CONFERENCE ON COMMUNICATIONS, NCC, 2024,
  • [36] Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-Resource Speech Recognition
    Yi, Cheng
    Zhou, Shiyu
    Xu, Bo
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 (28) : 788 - 792
  • [37] Weighted Gradient Pretrain for Low-Resource Speech Emotion Recognition
    Xie, Yue
    Liang, Ruiyu
    Zhao, Xiaoyan
    Liang, Zhenlin
    Du, Jing
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (07) : 1352 - 1355
  • [38] Acoustic Modeling for Hindi Speech Recognition in Low-Resource Settings
    Dey, Anik
    Zhang, Weibin
    Fung, Pascale
    [J]. 2014 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), VOLS 1-2, 2014, : 891 - 894
  • [39] MixRep: Hidden Representation Mixup for Low-Resource Speech Recognition
    Xie, Jiamin
    Hansen, John H. L.
    [J]. INTERSPEECH 2023, 2023, : 1304 - 1308
  • [40] Low-resource automatic speech recognition and error analyses of oral cancer speech
    Halpern, Bence Mark
    Feng, Siyuan
    van Son, Rob
    van den Brekel, Michiel
    Scharenborg, Odette
    [J]. SPEECH COMMUNICATION, 2022, 141 : 14 - 27