Exploring end-to-end framework towards Khasi speech recognition system

被引:4
|
作者
Syiem, Bronson [1 ]
Singh, L. Joyprakash [1 ]
机构
[1] NEHU, Elect & Commun Engn, Shillong 793022, Meghalaya, India
关键词
Automatic speech recognition; Deep neural network; End-to-End; Hidden Markov model;
D O I
10.1007/s10772-021-09811-5
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Building a conventional automatic speech recognition (ASR) system based on hidden Markov model (HMM)/deep neural network (DNN) makes the system complex as it requires various modules such as acoustic, lexicon, linguistic resources, language models etc. particularly with the low resource languages. In contrast, End-to-End architecture has greatly simplifies the model building process by representing complex modules with a simple deep network and by replacing the use of linguistic resources with a data-driven learning techniques. In this paper, we present our prior work by exploring End-to-End (E2E) framework for Khasi speech recognition system and the novel extension towards the development of speech corpora for standard Khasi dialect. We implemented the proposed E2E model by using Nabu ASR toolkit. Additionally, three other models (monophone, triphone and hybrid DNN) were built. Comparing the results, significant improvement was achieved using the proposed method particularly with the connectionist temporal classification (CTC) with a character error rate (CER) of 5.04%.
引用
收藏
页码:419 / 424
页数:6
相关论文
共 50 条
  • [21] Modular End-to-End Automatic Speech Recognition Framework for Acoustic-to-Word Model
    Liu, Qi
    Chen, Zhehuai
    Li, Hao
    Huang, Mingkun
    Lu, Yizhou
    Yu, Kai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 2174 - 2183
  • [22] Insights on Neural Representations for End-to-End Speech Recognition
    Ollerenshaw, Anna
    Jalal, Asif
    Hain, Thomas
    INTERSPEECH 2021, 2021, : 4079 - 4083
  • [23] Combination of end-to-end and hybrid models for speech recognition
    Wong, Jeremy H. M.
    Gaur, Yashesh
    Zhao, Rui
    Lu, Liang
    Sun, Eric
    Li, Jinyu
    Gong, Yifan
    INTERSPEECH 2020, 2020, : 1783 - 1787
  • [24] ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems
    Lin, Yi
    Yang, Bo
    Li, Linchao
    Guo, Dongyue
    Zhang, Jianwei
    Chen, Hu
    Zhang, Yi
    APPLIED SOFT COMPUTING, 2021, 112
  • [25] Hardware Accelerator for Transformer based End-to-End Automatic Speech Recognition System
    Yamini, Shaarada D.
    Mirishkar, Ganesh S.
    Vuppala, Anil Kumar
    Purini, Suresh
    2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW, 2023, : 93 - 100
  • [26] End-to-end named entity recognition for Vietnamese speech
    Nguyen, Thu-Hien
    Nguyen, Thai-Binh
    Do, Quoc-Truong
    Nguyen, Tuan-Linh
    2022 25TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA 2022), 2022,
  • [27] Integrating Lattice-Free MMI Into End-to-End Speech Recognition
    Tian, Jinchuan
    Yu, Jianwei
    Weng, Chao
    Zou, Yuexian
    Yu, Dong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 (25-38) : 25 - 38
  • [28] An End-to-End Continuous Speech Recognition System in Bengali for General and Elderly Domain
    Shubhojeet Paul
    Vandana Bhattacharjee
    Sujan Kumar Saha
    SN Computer Science, 6 (5)
  • [29] Towards End-to-End Speech-to-Text Summarization
    Monteiro, Raul
    Pernes, Diogo
    TEXT, SPEECH, AND DIALOGUE, TSD 2023, 2023, 14102 : 304 - 316
  • [30] Inverted Alignments for End-to-End Automatic Speech Recognition
    Doetsch, Patrick
    Hannemann, Mirko
    Schluter, Ralf
    Ney, Hermann
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1265 - 1273