Improving Low-Resource Speech Recognition Based on Improved NN-HMM Structures

被引:11
作者
Sun, Xiusong [1 ]
Yang, Qun [1 ]
Liu, Shaohan [1 ]
Yuan, Xin [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 2116, Peoples R China
关键词
Hidden Markov models; Task analysis; Speech recognition; Acoustics; Artificial neural networks; Computational modeling; Low-resource; speech recognition; multitask learning; acoustic modeling; feature combinations;
D O I
10.1109/ACCESS.2020.2988365
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The performance of the ASR system is unsatisfactory in a low-resource environment. In this paper, we investigated the effectiveness of three approaches to improve the performance of the acoustic models in low-resource environments. They are Mono-and-triphone Learning, Soft One-hot Label and Feature Combinations. We applied these three methods to the network architecture and compared their results with baselines. Our proposal has achieved remarkable improvement in the task of mandarin speech recognition in the hybrid hidden Markov model - neural network approach on phoneme level. In order to verify the generalization ability of our proposed method, we conducted many comparative experiments on DNN, RNN, LSTM and other network structures. The experimental results show that our method is applicable to almost all currently widely used network structures. Compared to baselines, our proposals achieved an average relative Character Error Rate (CER) reduction of 8.0 & x0025;. In our experiments, the size of training data is & x007E;10 hours, and we did not use data augmentation or transfer learning methods, which means that we did not use any additional data.
引用
收藏
页码:73005 / 73014
页数:10
相关论文
共 50 条
[41]   Advanced Convolutional Neural Network-Based Hybrid Acoustic Models for Low-Resource Speech Recognition [J].
Fantaye, Tessfu Geteye ;
Yu, Junqing ;
Hailu, Tulu Tilahun .
COMPUTERS, 2020, 9 (02)
[42]   Simplified Scoring Methods in HMM Based Speech Recognition [J].
Paramonov, Pavel ;
Sutula, Nadezhda .
2014 INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE ISCMI 2014, 2014, :154-156
[43]   Transfer Learning from Multi-Lingual Speech Translation Benefits Low-Resource Speech Recognition [J].
Vanderreydt, Geoffroy ;
Remy, Francois ;
Demuynck, Kris .
INTERSPEECH 2022, 2022, :3053-3057
[44]   A Speech Recognition System Based Improved Algorithm of Dual-template HMM [J].
JingZhang ;
Zhang, Min .
CEIS 2011, 2011, 15
[45]   End-to-End Articulatory Attribute Modeling for Low-resource Multilingual Speech Recognition [J].
Li, Sheng ;
Ding, Chenchen ;
Lu, Xugang ;
Shen, Peng ;
Kawahara, Tatsuya ;
Kawai, Hisashi .
INTERSPEECH 2019, 2019, :2145-2149
[46]   Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition [J].
Matsuura, Kohei ;
Mimura, Masato ;
Sakai, Shinsuke ;
Kawahara, Tatsuya .
INTERSPEECH 2020, 2020, :2737-2741
[47]   Acoustic model training using self-attention for low-resource speech recognition [J].
Park, Hosung ;
Kim, Ji-Hwan .
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (05) :483-489
[48]   CAM: A cross-lingual adaptation framework for low-resource language speech recognition [J].
Hu, Qing ;
Zhang, Yan ;
Zhang, Xianlei ;
Han, Zongyu ;
Yu, Xilong .
INFORMATION FUSION, 2024, 111
[49]   Improving Low Resource Turkish Speech Recognition with Data Augmentation and TTS [J].
Gokay, Ramazan ;
Yalcin, Hulya .
2019 16TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2019, :357-360
[50]   Speech-to-speech Low-resource Translation [J].
Liu, Hsiao-Chuan ;
Day, Min-Yuh ;
Wang, Chih-Chien .
2023 IEEE 24TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE, IRI, 2023, :91-95