Deploying a Speech Recognition Model for Under-Resourced Languages: A Case Study on Dioula Wake Words 1, 2, 3, and 4

被引:0
作者
Ouedraogo, Ismaila [1 ]
Some, Borlli Michel Jonas [2 ]
Keita, Zakaria Cheick Oumar [2 ]
Nabaloum, Emile [2 ]
Bationo, Fabrice [2 ]
Benedikter, Roland [3 ]
Diallo, Gayo [1 ]
机构
[1] Univ Bordeaux, Team AHeaD, Inserm 1219, F-33000 Bordeaux, France
[2] Univ Nazi Boni, Sch Informat, Bobo Dioulasso, Burkina Faso
[3] Ctr Adv Studies Eurac Res, Bolzano, Italy
来源
PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2023 | 2023年
关键词
Dioula language; voice recognition; user interface; under-resourced languages;
D O I
10.1145/3639233.3639345
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech recognition technology has the potential to provide valuable information and services to the 12.5 million Dioula speakers, especially the illiterates. However, these people, who could benefit the most, often do not have access to this technology because there are few data sets for resource-poor languages. This paper investigates the effectiveness of data augmentation in training wake words such as 1, 2, 3 and 4 in Dioula. The study contains two major contributions: the release of a Dioula language corpus for wake words 1, 2, 3 and 4, comprising 1.4 hours of audio with a labeled dataset, and a training of speech recognition model for 1, 2, 3, and 4 applying the data augmentation technique, which resulted in a significant improvement in accuracy from 51% to 96%. Additionally, the confusion matrices illustrate the model's enhanced predictive capacity, with an average of 1762 out of 1817 instances of the number "1" being correctly recognized after data augmentation. The study also uncovered an impressive reduction in loss from 205% to 14% after implementing data augmentation. These results underscore the pivotal role of data augmentation in improving the model's performance and mitigating overfitting issues, underscoring the promise of this technique in addressing data scarcity in underrepresented speech contexts. Training a speech recognition model to detect specific wake words, such as "1," "2," "3," and "4" in Dioula, can be highly valuable in constructing interactive voice response systems, thereby fostering greater inclusivity and accessibility for underserved communities.
引用
收藏
页码:111 / 118
页数:8
相关论文
共 20 条
  • [1] [Anonymous], 2010, International Journal of Computer Theory and Engineering
  • [2] Babirye C, 2022, Building text and speech datasets for low resourced languages: A case of languages in East Africa
  • [3] Berment V, 2004, PhD thesis
  • [4] Chapaneri S. V., 2012, Int. J. Computer Appl, V40, P6, DOI DOI 10.5120/5022-7167
  • [5] Dave N., 2013, International journal for advance research in engineering and technology, V1, P1
  • [6] Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems
    de Wet, Febe
    Kleynhans, Neil
    van Compernolle, Dirk
    Sahraeian, Reza
    [J]. SOUTH AFRICAN JOURNAL OF SCIENCE, 2017, 113 (1-2) : 25 - 33
  • [7] Doumbouya M, 2021, AAAI CONF ARTIF INTE, V35, P14757
  • [8] The digital divide in Brazil and the accessibility as a fundamental right
    Gabardo, Emerson
    Viana, Ana Cristina Aguilar
    de Freitas, Olga Lucia Castreghini
    [J]. REVISTA CHILENA DE DERECHO Y TECNOLOGIA, 2022, 11 (02): : 1 - 26
  • [9] Krauwer S, 2003, The basic language resource kit (BLARK) as the first milestone for the language resources roadmap, V2003, P15
  • [10] Li LS, 2018, J MACH LEARN RES, V18