Exploring end-to-end framework towards Khasi speech recognition system

被引：4

作者：

Syiem, Bronson ^{[1
]}

Singh, L. Joyprakash ^{[1
]}

机构：

[1] NEHU, Elect & Commun Engn, Shillong 793022, Meghalaya, India

来源：

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY | 2021年 / 24卷 / 02期

关键词：

Automatic speech recognition; Deep neural network; End-to-End; Hidden Markov model;

D O I：

10.1007/s10772-021-09811-5

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Building a conventional automatic speech recognition (ASR) system based on hidden Markov model (HMM)/deep neural network (DNN) makes the system complex as it requires various modules such as acoustic, lexicon, linguistic resources, language models etc. particularly with the low resource languages. In contrast, End-to-End architecture has greatly simplifies the model building process by representing complex modules with a simple deep network and by replacing the use of linguistic resources with a data-driven learning techniques. In this paper, we present our prior work by exploring End-to-End (E2E) framework for Khasi speech recognition system and the novel extension towards the development of speech corpora for standard Khasi dialect. We implemented the proposed E2E model by using Nabu ASR toolkit. Additionally, three other models (monophone, triphone and hybrid DNN) were built. Comparing the results, significant improvement was achieved using the proposed method particularly with the connectionist temporal classification (CTC) with a character error rate (CER) of 5.04%.

引用

页码：419 / 424

页数：6

共 50 条

[31] End-to-End Speech Recognition From the Raw Waveform
Zeghidour, Neil
Usunier, Nicolas
Synnaeve, Gabriel
Collobert, Ronan
Dupoux, Emmanuel
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 781 - 785
[32] PERSONALIZATION STRATEGIES FOR END-TO-END SPEECH RECOGNITION SYSTEMS
Gourav, Aditya
Liu, Linda
Gandhe, Ankur
Gu, Yile
Lan, Guitang
Huang, Xiangyang
Kalmane, Shashank
Tiwari, Gautam
Filimonov, Denis
Rastrow, Ariya
Stolcke, Andreas
Bulyko, Ivan
Alexa, Amazon
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7348 - 7352
[33] Lightweight End-to-End Architecture for Streaming Speech Recognition
Yang S.
Li X.
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2023, 36 (03): : 268 - 279
[34] INCREMENTAL LEARNING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
Fu, Li
Li, Xiaoxiao
Zi, Libo
Zhang, Zhengchen
Wu, Youzheng
He, Xiaodong
Zhou, Bowen
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 320 - 327
[35] End-to-End Myanmar Speech Recognition with Human-Machine Cooperation
Wang, Faliang
Yang, Yiling
Yang, Jian
2022 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2022), 2022, : 156 - 161
[36] EXPLORING ARCHITECTURES, DATA AND UNITS FOR STREAMING END-TO-END SPEECH RECOGNITION WITH RNN-TRANSDUCER
Rao, Kanishka
Sak, Hasim
Prabhavalkar, Rohit
2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 193 - 199
[37] EXPLORING PRE-TRAINING WITH ALIGNMENTS FOR RNN TRANSDUCER BASED END-TO-END SPEECH RECOGNITION
Hu, Hu
Zhao, Rui
Li, Jinyu
Lu, Liang
Gong, Yifan
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7079 - 7083
[38] Attention-Based End-to-End Named Entity Recognition from Speech
Porjazovski, Dejan
Leinonen, Juho
Kurimo, Mikko
TEXT, SPEECH, AND DIALOGUE, TSD 2021, 2021, 12848 : 469 - 480
[39] End-To-End deep neural models for Automatic Speech Recognition for Polish Language
Pondel-Sycz, Karolina
Pietrzak, Agnieszka Paula
Szymla, Julia
INTERNATIONAL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2024, 70 (02) : 315 - 321
[40] Improved training strategies for end-to-end speech recognition in digital voice assistants
Tulsiani, Hitesh
Sapru, Ashtosh
Arsikere, Harish
Punjabi, Surabhi
Garimella, Sri
INTERSPEECH 2020, 2020, : 2792 - 2796

← 1 2 3 4 5 →