Exploring end-to-end framework towards Khasi speech recognition system

被引:4
|
作者
Syiem, Bronson [1 ]
Singh, L. Joyprakash [1 ]
机构
[1] NEHU, Elect & Commun Engn, Shillong 793022, Meghalaya, India
关键词
Automatic speech recognition; Deep neural network; End-to-End; Hidden Markov model;
D O I
10.1007/s10772-021-09811-5
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Building a conventional automatic speech recognition (ASR) system based on hidden Markov model (HMM)/deep neural network (DNN) makes the system complex as it requires various modules such as acoustic, lexicon, linguistic resources, language models etc. particularly with the low resource languages. In contrast, End-to-End architecture has greatly simplifies the model building process by representing complex modules with a simple deep network and by replacing the use of linguistic resources with a data-driven learning techniques. In this paper, we present our prior work by exploring End-to-End (E2E) framework for Khasi speech recognition system and the novel extension towards the development of speech corpora for standard Khasi dialect. We implemented the proposed E2E model by using Nabu ASR toolkit. Additionally, three other models (monophone, triphone and hybrid DNN) were built. Comparing the results, significant improvement was achieved using the proposed method particularly with the connectionist temporal classification (CTC) with a character error rate (CER) of 5.04%.
引用
收藏
页码:419 / 424
页数:6
相关论文
共 50 条
  • [41] ADVERSARIAL TRAINING OF END-TO-END SPEECH RECOGNITION USING A CRITICIZING LANGUAGE MODEL
    Liu, Alexander H.
    Lee, Hung-yi
    Lee, Lin-shan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6176 - 6180
  • [42] Semantic Data Augmentation for End-to-End Mandarin Speech Recognition
    Sun, Jianwei
    Tang, Zhiyuan
    Yin, Hengxin
    Wang, Wei
    Zhao, Xi
    Zhao, Shuaijiang
    Lei, Xiaoning
    Zou, Wei
    Li, Xiangang
    INTERSPEECH 2021, 2021, : 1269 - 1273
  • [43] End-to-End Mandarin Speech Recognition Combining CNN and BLSTM
    Wang, Dong
    Wang, Xiaodong
    Lv, Shaohe
    SYMMETRY-BASEL, 2019, 11 (05):
  • [44] Conformer with lexicon transducer for Korean end-to-end speech recognition
    Son, Hyunsoo
    Park, Hosung
    Kim, Gyujin
    Cho, Eunsoo
    Kim, Ji-Hwan
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (05): : 530 - 536
  • [45] SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition
    Fu, Li
    Li, Xiaoxiao
    Wang, Runyu
    Fan, Lu
    Zhang, Zhengchen
    Chen, Meng
    Wu, Youzheng
    He, Xiaodong
    INTERSPEECH 2022, 2022, : 1006 - 1010
  • [46] Online Continual Learning of End-to-End Speech Recognition Models
    Yang, Muqiao
    Lane, Ian
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 2668 - 2672
  • [47] Tunisian Dialectal End-to-end Speech Recognition based on DeepSpeech
    Messaoudi, Abir
    Haddad, Hatem
    Fourati, Chayma
    Hmida, Moez BenHaj
    Mabrouk, Aymen Ben Elhaj
    Graiet, Mohamed
    AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 183 - 190
  • [48] VERY DEEP CONVOLUTIONAL NETWORKS FOR END-TO-END SPEECH RECOGNITION
    Zhang, Yu
    Chan, William
    Jaitly, Navdeep
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4845 - 4849
  • [49] Improved training of end-to-end attention models for speech recognition
    Zeyer, Albert
    Irie, Kazuki
    Schlueter, Ralf
    Ney, Hermann
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 7 - 11
  • [50] CYCLE-CONSISTENCY TRAINING FOR END-TO-END SPEECH RECOGNITION
    Hori, Takaaki
    Astudillo, Ramon
    Hayashi, Tomoki
    Zhang, Yu
    Watanabe, Shinji
    Le Roux, Jonathan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6271 - 6275