Regularized Urdu Speech Recognition with Semi-Supervised Deep Learning

被引:11
|
作者
Humayun, Mohammad Ali [1 ]
Hameed, Ibrahim A. [2 ]
Shah, Syed Muslim [1 ]
Khan, Sohaib Hassan [1 ]
Zafar, Irfan [1 ]
Bin Ahmed, Saad [3 ]
Shuja, Junaid [4 ]
机构
[1] Univ Engn & Technol Peshawar, Dept Elect Engn, Inst Commun Technol ICT Campus, Islamabad 44000, Pakistan
[2] Norwegian Univ Sci & Technol, Fac Informat Technol & Elect Engn, Dept ICT & Nat Sci, N-6001 Alesund, Norway
[3] Univ Teknol Malaysia, M JIIT, Jalan Sultan Yahya Petra, Kuala Lumpur 54100, Malaysia
[4] COSMATS Univ Islamabad, Dept Comp Sci, Abbottabad Campus, Abbottabad 22010, Pakistan
来源
APPLIED SCIENCES-BASEL | 2019年 / 9卷 / 09期
关键词
speech recognition; locally linear embedding; label propagation; Maxout; low resource languages;
D O I
10.3390/app9091956
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Automatic Speech Recognition, (ASR) has achieved the best results for English, with end-to-end neural network based supervised models. These supervised models need huge amounts of labeled speech data for good generalization, which can be quite a challenge to obtain for low-resource languages like Urdu. Most models proposed for Urdu ASR are based on Hidden Markov Models (HMMs). This paper proposes an end-to-end neural network model, for Urdu ASR, regularized with dropout, ensemble averaging and Maxout units. Dropout and ensembles are averaging techniques over multiple neural network models while Maxout are units in a neural network which adapt their activation functions. Due to limited labeled data, Semi Supervised Learning (SSL) techniques are also incorporated to improve model generalization. Speech features are transformed into a lower dimensional manifold using an unsupervised dimensionality-reduction technique called Locally Linear Embedding (LLE). Transformed data along with higher dimensional features is used to train neural networks. The proposed model also utilizes label propagation-based self-training of initially trained models and achieves a Word Error Rate (WER) of 4% less than that reported as the benchmark on the same Urdu corpus using HMM. The decrease in WER after incorporating SSL is more significant with an increased validation data size.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] note on label propagation for semi-supervised learning
    Bodo, Zalan
    Csato, Lehel
    ACTA UNIVERSITATIS SAPIENTIAE INFORMATICA, 2015, 7 (01) : 18 - 30
  • [32] Semi-Supervised Classification Based on Transformed Learning
    Kang Z.
    Liu L.
    Han M.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (01): : 103 - 111
  • [33] Logistic Label Propagation for Semi-supervised Learning
    Watanabe, Kenji
    Kobayashi, Takumi
    Otsu, Nobuyuki
    NEURAL INFORMATION PROCESSING: THEORY AND ALGORITHMS, PT I, 2010, 6443 : 462 - 469
  • [34] SEMI-SUPERVISED TRAINING STRATEGIES FOR DEEP NEURAL NETWORKS
    Gibson, Matthew
    Cook, Gary
    Zhan, Puming
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 77 - 83
  • [35] Large-Scale Semi-Supervised Training in Deep Learning Acoustic Model for ASR
    Long, Yanhua
    Li, Yijie
    Wei, Shuang
    Zhang, Qiaozheng
    Yang, Chunxia
    IEEE ACCESS, 2019, 7 : 133615 - 133627
  • [36] Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition
    Guo, Pengcheng
    Xu, Haihua
    Xie, Lei
    Chng, Eng Siong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1928 - 1932
  • [37] NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning
    Yuksel, Kamer Ali
    Ferreira, Thiago Castro
    Javadi, Golara
    Al-Badrashiny, Mohamed
    Gunduz, Ahmet
    INTERSPEECH 2023, 2023, : 466 - 470
  • [38] Improved low-resource Somali speech recognition by semi-supervised acoustic and language model training
    Biswas, Astik
    Menon, Raghav
    van der Westhuizen, Ewald
    Niesler, Thomas
    INTERSPEECH 2019, 2019, : 3008 - 3012
  • [39] Lightly supervised vs. semi-supervised training of acoustic model on Luxembourgish for low-resource automatic speech recognition
    Vesely, Karel
    Segura, Carlos
    Szoke, Igor
    Luque, Jordi
    Cernocky, Jan Honza
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2883 - 2887
  • [40] Cyclic label propagation for graph semi-supervised learning
    Li, Zhao
    Liu, Yixin
    Zhang, Zhen
    Pan, Shirui
    Gao, Jianliang
    Bu, Jiajun
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2022, 25 (02): : 703 - 721