Regularized Urdu Speech Recognition with Semi-Supervised Deep Learning

被引：11

作者：

Humayun, Mohammad Ali ^{[1
]}

Hameed, Ibrahim A. ^{[2
]}

Shah, Syed Muslim ^{[1
]}

Khan, Sohaib Hassan ^{[1
]}

Zafar, Irfan ^{[1
]}

Bin Ahmed, Saad ^{[3
]}

Shuja, Junaid ^{[4
]}

机构：

[1] Univ Engn & Technol Peshawar, Dept Elect Engn, Inst Commun Technol ICT Campus, Islamabad 44000, Pakistan

[2] Norwegian Univ Sci & Technol, Fac Informat Technol & Elect Engn, Dept ICT & Nat Sci, N-6001 Alesund, Norway

[3] Univ Teknol Malaysia, M JIIT, Jalan Sultan Yahya Petra, Kuala Lumpur 54100, Malaysia

[4] COSMATS Univ Islamabad, Dept Comp Sci, Abbottabad Campus, Abbottabad 22010, Pakistan

来源：

APPLIED SCIENCES-BASEL | 2019年 / 9卷 / 09期

关键词：

speech recognition; locally linear embedding; label propagation; Maxout; low resource languages;

D O I：

10.3390/app9091956

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Automatic Speech Recognition, (ASR) has achieved the best results for English, with end-to-end neural network based supervised models. These supervised models need huge amounts of labeled speech data for good generalization, which can be quite a challenge to obtain for low-resource languages like Urdu. Most models proposed for Urdu ASR are based on Hidden Markov Models (HMMs). This paper proposes an end-to-end neural network model, for Urdu ASR, regularized with dropout, ensemble averaging and Maxout units. Dropout and ensembles are averaging techniques over multiple neural network models while Maxout are units in a neural network which adapt their activation functions. Due to limited labeled data, Semi Supervised Learning (SSL) techniques are also incorporated to improve model generalization. Speech features are transformed into a lower dimensional manifold using an unsupervised dimensionality-reduction technique called Locally Linear Embedding (LLE). Transformed data along with higher dimensional features is used to train neural networks. The proposed model also utilizes label propagation-based self-training of initially trained models and achieves a Word Error Rate (WER) of 4% less than that reported as the benchmark on the same Urdu corpus using HMM. The decrease in WER after incorporating SSL is more significant with an increased validation data size.

引用

页数：15

共 50 条

[31] note on label propagation for semi-supervised learning
Bodo, Zalan
Csato, Lehel
ACTA UNIVERSITATIS SAPIENTIAE INFORMATICA, 2015, 7 (01) : 18 - 30
[32] Semi-Supervised Classification Based on Transformed Learning
Kang Z.
Liu L.
Han M.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (01): : 103 - 111
[33] Logistic Label Propagation for Semi-supervised Learning
Watanabe, Kenji
Kobayashi, Takumi
Otsu, Nobuyuki
NEURAL INFORMATION PROCESSING: THEORY AND ALGORITHMS, PT I, 2010, 6443 : 462 - 469
[34] SEMI-SUPERVISED TRAINING STRATEGIES FOR DEEP NEURAL NETWORKS
Gibson, Matthew
Cook, Gary
Zhan, Puming
2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 77 - 83
[35] Large-Scale Semi-Supervised Training in Deep Learning Acoustic Model for ASR
Long, Yanhua
Li, Yijie
Wei, Shuang
Zhang, Qiaozheng
Yang, Chunxia
IEEE ACCESS, 2019, 7 : 133615 - 133627
[36] Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition
Guo, Pengcheng
Xu, Haihua
Xie, Lei
Chng, Eng Siong
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1928 - 1932
[37] NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning
Yuksel, Kamer Ali
Ferreira, Thiago Castro
Javadi, Golara
Al-Badrashiny, Mohamed
Gunduz, Ahmet
INTERSPEECH 2023, 2023, : 466 - 470
[38] Improved low-resource Somali speech recognition by semi-supervised acoustic and language model training
Biswas, Astik
Menon, Raghav
van der Westhuizen, Ewald
Niesler, Thomas
INTERSPEECH 2019, 2019, : 3008 - 3012
[39] Lightly supervised vs. semi-supervised training of acoustic model on Luxembourgish for low-resource automatic speech recognition
Vesely, Karel
Segura, Carlos
Szoke, Igor
Luque, Jordi
Cernocky, Jan Honza
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2883 - 2887
[40] Cyclic label propagation for graph semi-supervised learning
Li, Zhao
Liu, Yixin
Zhang, Zhen
Pan, Shirui
Gao, Jianliang
Bu, Jiajun
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2022, 25 (02): : 703 - 721

← 1 2 3 4 5 →