Exploring end-to-end framework towards Khasi speech recognition system

被引:4
|
作者
Syiem, Bronson [1 ]
Singh, L. Joyprakash [1 ]
机构
[1] NEHU, Elect & Commun Engn, Shillong 793022, Meghalaya, India
关键词
Automatic speech recognition; Deep neural network; End-to-End; Hidden Markov model;
D O I
10.1007/s10772-021-09811-5
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Building a conventional automatic speech recognition (ASR) system based on hidden Markov model (HMM)/deep neural network (DNN) makes the system complex as it requires various modules such as acoustic, lexicon, linguistic resources, language models etc. particularly with the low resource languages. In contrast, End-to-End architecture has greatly simplifies the model building process by representing complex modules with a simple deep network and by replacing the use of linguistic resources with a data-driven learning techniques. In this paper, we present our prior work by exploring End-to-End (E2E) framework for Khasi speech recognition system and the novel extension towards the development of speech corpora for standard Khasi dialect. We implemented the proposed E2E model by using Nabu ASR toolkit. Additionally, three other models (monophone, triphone and hybrid DNN) were built. Comparing the results, significant improvement was achieved using the proposed method particularly with the connectionist temporal classification (CTC) with a character error rate (CER) of 5.04%.
引用
收藏
页码:419 / 424
页数:6
相关论文
共 50 条
  • [31] End-to-End Speech Recognition From the Raw Waveform
    Zeghidour, Neil
    Usunier, Nicolas
    Synnaeve, Gabriel
    Collobert, Ronan
    Dupoux, Emmanuel
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 781 - 785
  • [32] PERSONALIZATION STRATEGIES FOR END-TO-END SPEECH RECOGNITION SYSTEMS
    Gourav, Aditya
    Liu, Linda
    Gandhe, Ankur
    Gu, Yile
    Lan, Guitang
    Huang, Xiangyang
    Kalmane, Shashank
    Tiwari, Gautam
    Filimonov, Denis
    Rastrow, Ariya
    Stolcke, Andreas
    Bulyko, Ivan
    Alexa, Amazon
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7348 - 7352
  • [33] Lightweight End-to-End Architecture for Streaming Speech Recognition
    Yang S.
    Li X.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2023, 36 (03): : 268 - 279
  • [34] INCREMENTAL LEARNING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Fu, Li
    Li, Xiaoxiao
    Zi, Libo
    Zhang, Zhengchen
    Wu, Youzheng
    He, Xiaodong
    Zhou, Bowen
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 320 - 327
  • [35] End-to-End Myanmar Speech Recognition with Human-Machine Cooperation
    Wang, Faliang
    Yang, Yiling
    Yang, Jian
    2022 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2022), 2022, : 156 - 161
  • [36] EXPLORING ARCHITECTURES, DATA AND UNITS FOR STREAMING END-TO-END SPEECH RECOGNITION WITH RNN-TRANSDUCER
    Rao, Kanishka
    Sak, Hasim
    Prabhavalkar, Rohit
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 193 - 199
  • [37] EXPLORING PRE-TRAINING WITH ALIGNMENTS FOR RNN TRANSDUCER BASED END-TO-END SPEECH RECOGNITION
    Hu, Hu
    Zhao, Rui
    Li, Jinyu
    Lu, Liang
    Gong, Yifan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7079 - 7083
  • [38] Attention-Based End-to-End Named Entity Recognition from Speech
    Porjazovski, Dejan
    Leinonen, Juho
    Kurimo, Mikko
    TEXT, SPEECH, AND DIALOGUE, TSD 2021, 2021, 12848 : 469 - 480
  • [39] End-To-End deep neural models for Automatic Speech Recognition for Polish Language
    Pondel-Sycz, Karolina
    Pietrzak, Agnieszka Paula
    Szymla, Julia
    INTERNATIONAL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2024, 70 (02) : 315 - 321
  • [40] Improved training strategies for end-to-end speech recognition in digital voice assistants
    Tulsiani, Hitesh
    Sapru, Ashtosh
    Arsikere, Harish
    Punjabi, Surabhi
    Garimella, Sri
    INTERSPEECH 2020, 2020, : 2792 - 2796