Exploring end-to-end framework towards Khasi speech recognition system

被引：4

作者：

Syiem, Bronson ^{[1
]}

Singh, L. Joyprakash ^{[1
]}

机构：

[1] NEHU, Elect & Commun Engn, Shillong 793022, Meghalaya, India

来源：

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY | 2021年 / 24卷 / 02期

关键词：

Automatic speech recognition; Deep neural network; End-to-End; Hidden Markov model;

D O I：

10.1007/s10772-021-09811-5

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Building a conventional automatic speech recognition (ASR) system based on hidden Markov model (HMM)/deep neural network (DNN) makes the system complex as it requires various modules such as acoustic, lexicon, linguistic resources, language models etc. particularly with the low resource languages. In contrast, End-to-End architecture has greatly simplifies the model building process by representing complex modules with a simple deep network and by replacing the use of linguistic resources with a data-driven learning techniques. In this paper, we present our prior work by exploring End-to-End (E2E) framework for Khasi speech recognition system and the novel extension towards the development of speech corpora for standard Khasi dialect. We implemented the proposed E2E model by using Nabu ASR toolkit. Additionally, three other models (monophone, triphone and hybrid DNN) were built. Comparing the results, significant improvement was achieved using the proposed method particularly with the connectionist temporal classification (CTC) with a character error rate (CER) of 5.04%.

引用

页码：419 / 424

页数：6

共 50 条

[21] Modular End-to-End Automatic Speech Recognition Framework for Acoustic-to-Word Model
Liu, Qi
Chen, Zhehuai
Li, Hao
Huang, Mingkun
Lu, Yizhou
Yu, Kai
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 2174 - 2183
[22] Insights on Neural Representations for End-to-End Speech Recognition
Ollerenshaw, Anna
Jalal, Asif
Hain, Thomas
INTERSPEECH 2021, 2021, : 4079 - 4083
[23] Combination of end-to-end and hybrid models for speech recognition
Wong, Jeremy H. M.
Gaur, Yashesh
Zhao, Rui
Lu, Liang
Sun, Eric
Li, Jinyu
Gong, Yifan
INTERSPEECH 2020, 2020, : 1783 - 1787
[24] ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems
Lin, Yi
Yang, Bo
Li, Linchao
Guo, Dongyue
Zhang, Jianwei
Chen, Hu
Zhang, Yi
APPLIED SOFT COMPUTING, 2021, 112
[25] Hardware Accelerator for Transformer based End-to-End Automatic Speech Recognition System
Yamini, Shaarada D.
Mirishkar, Ganesh S.
Vuppala, Anil Kumar
Purini, Suresh
2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW, 2023, : 93 - 100
[26] End-to-end named entity recognition for Vietnamese speech
Nguyen, Thu-Hien
Nguyen, Thai-Binh
Do, Quoc-Truong
Nguyen, Tuan-Linh
2022 25TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA 2022), 2022,
[27] Integrating Lattice-Free MMI Into End-to-End Speech Recognition
Tian, Jinchuan
Yu, Jianwei
Weng, Chao
Zou, Yuexian
Yu, Dong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 (25-38) : 25 - 38
[28] An End-to-End Continuous Speech Recognition System in Bengali for General and Elderly Domain
Shubhojeet Paul
Vandana Bhattacharjee
Sujan Kumar Saha
SN Computer Science, 6 (5)
[29] Towards End-to-End Speech-to-Text Summarization
Monteiro, Raul
Pernes, Diogo
TEXT, SPEECH, AND DIALOGUE, TSD 2023, 2023, 14102 : 304 - 316
[30] Inverted Alignments for End-to-End Automatic Speech Recognition
Doetsch, Patrick
Hannemann, Mirko
Schluter, Ralf
Ney, Hermann
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1265 - 1273

← 1 2 3 4 5 →