Exploring end-to-end framework towards Khasi speech recognition system

被引：4

作者：

Syiem, Bronson ^{[1
]}

Singh, L. Joyprakash ^{[1
]}

机构：

[1] NEHU, Elect & Commun Engn, Shillong 793022, Meghalaya, India

来源：

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY | 2021年 / 24卷 / 02期

关键词：

Automatic speech recognition; Deep neural network; End-to-End; Hidden Markov model;

D O I：

10.1007/s10772-021-09811-5

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Building a conventional automatic speech recognition (ASR) system based on hidden Markov model (HMM)/deep neural network (DNN) makes the system complex as it requires various modules such as acoustic, lexicon, linguistic resources, language models etc. particularly with the low resource languages. In contrast, End-to-End architecture has greatly simplifies the model building process by representing complex modules with a simple deep network and by replacing the use of linguistic resources with a data-driven learning techniques. In this paper, we present our prior work by exploring End-to-End (E2E) framework for Khasi speech recognition system and the novel extension towards the development of speech corpora for standard Khasi dialect. We implemented the proposed E2E model by using Nabu ASR toolkit. Additionally, three other models (monophone, triphone and hybrid DNN) were built. Comparing the results, significant improvement was achieved using the proposed method particularly with the connectionist temporal classification (CTC) with a character error rate (CER) of 5.04%.

引用

页码：419 / 424

页数：6

共 50 条

[41] ADVERSARIAL TRAINING OF END-TO-END SPEECH RECOGNITION USING A CRITICIZING LANGUAGE MODEL
Liu, Alexander H.
Lee, Hung-yi
Lee, Lin-shan
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6176 - 6180
[42] Semantic Data Augmentation for End-to-End Mandarin Speech Recognition
Sun, Jianwei
Tang, Zhiyuan
Yin, Hengxin
Wang, Wei
Zhao, Xi
Zhao, Shuaijiang
Lei, Xiaoning
Zou, Wei
Li, Xiangang
INTERSPEECH 2021, 2021, : 1269 - 1273
[43] End-to-End Mandarin Speech Recognition Combining CNN and BLSTM
Wang, Dong
Wang, Xiaodong
Lv, Shaohe
SYMMETRY-BASEL, 2019, 11 (05):
[44] Conformer with lexicon transducer for Korean end-to-end speech recognition
Son, Hyunsoo
Park, Hosung
Kim, Gyujin
Cho, Eunsoo
Kim, Ji-Hwan
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (05): : 530 - 536
[45] SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition
Fu, Li
Li, Xiaoxiao
Wang, Runyu
Fan, Lu
Zhang, Zhengchen
Chen, Meng
Wu, Youzheng
He, Xiaodong
INTERSPEECH 2022, 2022, : 1006 - 1010
[46] Online Continual Learning of End-to-End Speech Recognition Models
Yang, Muqiao
Lane, Ian
Watanabe, Shinji
INTERSPEECH 2022, 2022, : 2668 - 2672
[47] Tunisian Dialectal End-to-end Speech Recognition based on DeepSpeech
Messaoudi, Abir
Haddad, Hatem
Fourati, Chayma
Hmida, Moez BenHaj
Mabrouk, Aymen Ben Elhaj
Graiet, Mohamed
AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 183 - 190
[48] VERY DEEP CONVOLUTIONAL NETWORKS FOR END-TO-END SPEECH RECOGNITION
Zhang, Yu
Chan, William
Jaitly, Navdeep
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4845 - 4849
[49] Improved training of end-to-end attention models for speech recognition
Zeyer, Albert
Irie, Kazuki
Schlueter, Ralf
Ney, Hermann
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 7 - 11
[50] CYCLE-CONSISTENCY TRAINING FOR END-TO-END SPEECH RECOGNITION
Hori, Takaaki
Astudillo, Ramon
Hayashi, Tomoki
Zhang, Yu
Watanabe, Shinji
Le Roux, Jonathan
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6271 - 6275

← 1 2 3 4 5 →