Bangladeshi Bangla speech corpus for automatic speech recognition research

被引：7

作者：

Kibria, Shafkat ^{[1
]}

Samin, Ahnaf Mozib ^{[1
]}

Kobir, M. Humayon ^{[1
]}

Rahman, M. Shahidur ^{[1
]}

Selim, M. Reza ^{[1
]}

Iqbal, M. Zafar ^{[1
]}

机构：

[1] Shahjalal Univ Sci & Technol, Dept Comp Sci & Engn, Sylhet 3114, Bangladesh

来源：

SPEECH COMMUNICATION | 2022年 / 136卷

关键词：

Bangladeshi bangla corpus; Automatic speech recognition; Corpora evaluation; Recurrent neural network;

D O I：

10.1016/j.specom.2021.12.004

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This article reports the development of language resource for Bangladeshi Bangla spoken language (BBSL). Bangladeshi Bangla has inadequate large speech corpora for Large Vocabulary Continuous Speech Recognition (LVCSR) system. The accuracy of the automatic speech recognition (ASR) system rests on the quality of the speech corpus. This work discusses the common issues and activities related to the development of a large speech corpus named (sic) (SUBAK.KO). This corpus is designed to support ASR research in Bangladeshi Bangla. It has been labeled sentence-wise. We have trained this corpus with one of the well-known current End-to-End ASR algorithms, Recurrent Neural Networks (RNNs) with Connectionist Temporal Classification (CTC). To know the strengths and weaknesses, the CER (Character Error Rate) and the WER (Word Error Rate) of the trained RNN-CTC model have been observed. Another open-source large Bangla ASR corpus has been trained using the same ASR algorithm. Both trained models have been compared to assess the quality of these corpora. It has been found that SUBAK.KO is a more balanced corpus and considered more regional accented speech variability for a LVCSR system compared to that open-source large Bangla ASR corpus.

引用

页码：84 / 97

页数：14

共 50 条

[21] Automatic speech recognition: a survey
Mishaim Malik
Muhammad Kamran Malik
Khawar Mehmood
Imran Makhdoom
Multimedia Tools and Applications, 2021, 80 : 9411 - 9457
[22] Automatic speech recognition: a survey
Malik, Mishaim
Malik, Muhammad Kamran
Mehmood, Khawar
Makhdoom, Imran
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 9411 - 9457
[23] Using adaptive filter to increase automatic speech recognition rate in a digit corpus
Oropeza Rodriguez, Jose Luis
Suarez Guerra, Sergio
Sanchez Fernandez, Luis Pastor
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS, 2007, 4756 : 78 - 87
[24] Efficient automatic speech recognition
O'Shaughnessy, D
PROCEEDINGS OF THE EIGHTH IASTED INTERNATIONAL CONFERENCE ON INTERNET AND MULTIMEDIA SYSTEMS AND APPLICATIONS, 2004, : 323 - 327
[25] Towards Automatic Assessment of Aphasia Speech Using Automatic Speech Recognition Techniques
Qin, Ying
Lee, Tan
Kong, Anthony Pak Hin
Law, Sam Po
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[26] Building a Speech and Text Corpus of Turkish: Large Corpus Collection with Initial Speech Recognition Results
Polat, Huseyin
Oyucu, Saadin
SYMMETRY-BASEL, 2020, 12 (02):
[27] Automatic Construction of the Finnish Parliament Speech Corpus
Mansikkaniemi, Andre
Smit, Peter
Kurimo, Mikko
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3762 - 3766
[28] Automatic Speech Correction: A step to Speech Recognition for People with Disabilities
Terbeh, Naim
Labidi, Mohamed
Zrigui, Mounir
2013 FOURTH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY AND ACCESSIBILITY (ICTA), 2013,
[29] Real and synthetic Punjabi speech datasets for automatic speech recognition
Singh, Satwinder
Hou, Feng
Wang, Ruili
DATA IN BRIEF, 2024, 52
[30] Autonomous measurement of speech intelligibility utilizing automatic speech recognition
Meyer, Bernd T.
Kollmeier, Birger
Ooster, Jasper
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2982 - 2986

← 1 2 3 4 5 →