Improving Low-Resource Speech Recognition Based on Improved NN-HMM Structures

被引:11
作者
Sun, Xiusong [1 ]
Yang, Qun [1 ]
Liu, Shaohan [1 ]
Yuan, Xin [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 2116, Peoples R China
关键词
Hidden Markov models; Task analysis; Speech recognition; Acoustics; Artificial neural networks; Computational modeling; Low-resource; speech recognition; multitask learning; acoustic modeling; feature combinations;
D O I
10.1109/ACCESS.2020.2988365
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The performance of the ASR system is unsatisfactory in a low-resource environment. In this paper, we investigated the effectiveness of three approaches to improve the performance of the acoustic models in low-resource environments. They are Mono-and-triphone Learning, Soft One-hot Label and Feature Combinations. We applied these three methods to the network architecture and compared their results with baselines. Our proposal has achieved remarkable improvement in the task of mandarin speech recognition in the hybrid hidden Markov model - neural network approach on phoneme level. In order to verify the generalization ability of our proposed method, we conducted many comparative experiments on DNN, RNN, LSTM and other network structures. The experimental results show that our method is applicable to almost all currently widely used network structures. Compared to baselines, our proposals achieved an average relative Character Error Rate (CER) reduction of 8.0 & x0025;. In our experiments, the size of training data is & x007E;10 hours, and we did not use data augmentation or transfer learning methods, which means that we did not use any additional data.
引用
收藏
页码:73005 / 73014
页数:10
相关论文
共 50 条
  • [21] Low-resource Sinhala Speech Recognition using Deep Learning
    Karunathilaka, Hirunika
    Welgama, Viraj
    Nadungodage, Thilini
    Weerasinghe, Ruvan
    2020 20TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER-2020), 2020, : 196 - 201
  • [22] A Lightweight Task-Agreement Meta Learning for Low-Resource Speech Recognition
    Chen, Yaqi
    Zhang, Hao
    Zhang, Wenlin
    Qu, Dan
    Yang, Xukui
    NEURAL PROCESSING LETTERS, 2024, 56 (04)
  • [23] Language fusion via adapters for low-resource speech recognition
    Hu, Qing
    Zhang, Yan
    Zhang, Xianlei
    Han, Zongyu
    Liang, Xiuxia
    SPEECH COMMUNICATION, 2024, 158
  • [24] ANALYSIS OF X-VECTORS FOR LOW-RESOURCE SPEECH RECOGNITION
    Karafiat, Martin
    Vesely, Karel
    Cernocky, Jan Honza
    Profant, Jan
    Nytra, Jiri
    Hlavacek, Miroslav
    Pavlicek, Tomas
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6998 - 7002
  • [25] Improved low-resource Somali speech recognition by semi-supervised acoustic and language model training
    Biswas, Astik
    Menon, Raghav
    van der Westhuizen, Ewald
    Niesler, Thomas
    INTERSPEECH 2019, 2019, : 3008 - 3012
  • [26] Comparison of Unsupervised Learning and Supervised Learning with Noisy Labels for Low-Resource Speech Recognition
    Schraner, Yanick
    Scheller, Christian
    Pluess, Michel
    Neukom, Lukas
    Vogel, Manfred
    INTERSPEECH 2022, 2022, : 4875 - 4879
  • [27] Multitask Learning of Deep Neural Networks for Low-Resource Speech Recognition
    Chen, Dongpeng
    Mak, Brian Kan-Wing
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (07) : 1172 - 1183
  • [28] Acoustic Coprocessor for HMM based Embedded Speech Recognition Systems
    Bapat, Ojas A.
    Fastow, Richard M.
    Olson, Jens
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2013, 59 (03) : 629 - 633
  • [29] Investigations on speech recognition systems for low-resource dialectal Arabic-English code-switching speech
    Hamed, Injy
    Denisov, Pavel
    Li, Chia-Yu
    Elmahdy, Mohamed
    Abdennadher, Slim
    Ngoc Thang Vu
    COMPUTER SPEECH AND LANGUAGE, 2022, 72
  • [30] State-Level Data Borrowing for Low-Resource Speech Recognition based on Subspace GMMs
    Qian, Yanmin
    Povey, Daniel
    Liu, Jia
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 560 - +