Bangladeshi Bangla speech corpus for automatic speech recognition research

被引:7
|
作者
Kibria, Shafkat [1 ]
Samin, Ahnaf Mozib [1 ]
Kobir, M. Humayon [1 ]
Rahman, M. Shahidur [1 ]
Selim, M. Reza [1 ]
Iqbal, M. Zafar [1 ]
机构
[1] Shahjalal Univ Sci & Technol, Dept Comp Sci & Engn, Sylhet 3114, Bangladesh
关键词
Bangladeshi bangla corpus; Automatic speech recognition; Corpora evaluation; Recurrent neural network;
D O I
10.1016/j.specom.2021.12.004
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This article reports the development of language resource for Bangladeshi Bangla spoken language (BBSL). Bangladeshi Bangla has inadequate large speech corpora for Large Vocabulary Continuous Speech Recognition (LVCSR) system. The accuracy of the automatic speech recognition (ASR) system rests on the quality of the speech corpus. This work discusses the common issues and activities related to the development of a large speech corpus named (sic) (SUBAK.KO). This corpus is designed to support ASR research in Bangladeshi Bangla. It has been labeled sentence-wise. We have trained this corpus with one of the well-known current End-to-End ASR algorithms, Recurrent Neural Networks (RNNs) with Connectionist Temporal Classification (CTC). To know the strengths and weaknesses, the CER (Character Error Rate) and the WER (Word Error Rate) of the trained RNN-CTC model have been observed. Another open-source large Bangla ASR corpus has been trained using the same ASR algorithm. Both trained models have been compared to assess the quality of these corpora. It has been found that SUBAK.KO is a more balanced corpus and considered more regional accented speech variability for a LVCSR system compared to that open-source large Bangla ASR corpus.
引用
收藏
页码:84 / 97
页数:14
相关论文
共 50 条
  • [41] Automatic Speech Recognition: An Improved Paradigm
    Topoleanu, Tudor-Sabin
    Mogan, Gheorghe Leonte
    TECHNOLOGICAL INNOVATION FOR SUSTAINABILITY, 2011, 349 : 269 - +
  • [42] Counterfactually Fair Automatic Speech Recognition
    Sari, Leda
    Hasegawa-Johnson, Mark
    Yoo, Chang D.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3515 - 3525
  • [43] Bangla Short Speech Commands Recognition Using Convolutional Neural Networks
    Sumon, Shakil Ahmed
    Chowdhury, Joydip
    Debnath, Sujit
    Mohammed, Nabeel
    Momen, Sifat
    2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
  • [44] A systematic literature review of research on automatic speech recognition in EFL pronunciation
    Liu, Yao
    Ab Rahman, Faizahani Binti
    Zain, Farah Binti Mohamad
    COGENT EDUCATION, 2025, 12 (01):
  • [45] Acoustic Analysis for Automatic Speech Recognition
    O'Shaughnessy, Douglas
    PROCEEDINGS OF THE IEEE, 2013, 101 (05) : 1038 - 1053
  • [46] Coupling Particle Filters with Automatic Speech Recognition for Speech Feature Enhancement
    Faubel, Friedrich
    Woelfel, Matthias
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 37 - 40
  • [47] Intelligibility of laryngectomees' substitute speech:: automatic speech recognition and subjective rating
    Schuster, M
    Haderlein, T
    Nöth, E
    Lohscheller, J
    Eysholdt, U
    Rosanowski, F
    EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2006, 263 (02) : 188 - 193
  • [48] The use of automatic speech recognition showing the influence of nasality on speech intelligibility
    Mayr, S.
    Burkhardt, K.
    Schuster, M.
    Rogler, K.
    Maier, A.
    Iro, H.
    EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2010, 267 (11) : 1719 - 1725
  • [49] Automatic Speech Recognition for Assistive Writing in Speech Supplemented Word Prediction
    Hosom, John-Paul
    Jakobs, Tom
    Baker, Allen
    Fager, Susan
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2682 - +
  • [50] Intelligibility of laryngectomees’ substitute speech: automatic speech recognition and subjective rating
    Maria Schuster
    Tino Haderlein
    Elmar Nöth
    Jörg Lohscheller
    Ulrich Eysholdt
    Frank Rosanowski
    European Archives of Oto-Rhino-Laryngology and Head & Neck, 2006, 263 : 188 - 193