Deploying a Speech Recognition Model for Under-Resourced Languages: A Case Study on Dioula Wake Words 1, 2, 3, and 4

被引:0
作者
Ouedraogo, Ismaila [1 ]
Some, Borlli Michel Jonas [2 ]
Keita, Zakaria Cheick Oumar [2 ]
Nabaloum, Emile [2 ]
Bationo, Fabrice [2 ]
Benedikter, Roland [3 ]
Diallo, Gayo [1 ]
机构
[1] Univ Bordeaux, Team AHeaD, Inserm 1219, F-33000 Bordeaux, France
[2] Univ Nazi Boni, Sch Informat, Bobo Dioulasso, Burkina Faso
[3] Ctr Adv Studies Eurac Res, Bolzano, Italy
来源
PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2023 | 2023年
关键词
Dioula language; voice recognition; user interface; under-resourced languages;
D O I
10.1145/3639233.3639345
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech recognition technology has the potential to provide valuable information and services to the 12.5 million Dioula speakers, especially the illiterates. However, these people, who could benefit the most, often do not have access to this technology because there are few data sets for resource-poor languages. This paper investigates the effectiveness of data augmentation in training wake words such as 1, 2, 3 and 4 in Dioula. The study contains two major contributions: the release of a Dioula language corpus for wake words 1, 2, 3 and 4, comprising 1.4 hours of audio with a labeled dataset, and a training of speech recognition model for 1, 2, 3, and 4 applying the data augmentation technique, which resulted in a significant improvement in accuracy from 51% to 96%. Additionally, the confusion matrices illustrate the model's enhanced predictive capacity, with an average of 1762 out of 1817 instances of the number "1" being correctly recognized after data augmentation. The study also uncovered an impressive reduction in loss from 205% to 14% after implementing data augmentation. These results underscore the pivotal role of data augmentation in improving the model's performance and mitigating overfitting issues, underscoring the promise of this technique in addressing data scarcity in underrepresented speech contexts. Training a speech recognition model to detect specific wake words, such as "1," "2," "3," and "4" in Dioula, can be highly valuable in constructing interactive voice response systems, thereby fostering greater inclusivity and accessibility for underserved communities.
引用
收藏
页码:111 / 118
页数:8
相关论文
共 20 条
  • [11] Mahar J. A., 2010, IJCTE, P538, DOI [10.7763/IJCTE.2010.V2.198, DOI 10.7763/IJCTE.2010.V2.198]
  • [12] Mangeot M, 2014, TALN RECITAL 2014 WO
  • [13] Maslinsky K, 2019, Mandenkan Bulletin
  • [14] Netshiombo D, Spoken Digit Recognition System for an Extremely Under-resourced Language
  • [15] Omniglot, 2021, Dioula language
  • [16] Ranjan S., 2010, Int. J. Comput. Theory Eng., V2, P642, DOI [10.7763/ijcte.2010.v2.216, DOI 10.7763/IJCTE.2010.V2.216]
  • [17] Some Michel J, 2022, Stud Health Technol Inform, V295, P454, DOI 10.3233/SHTI220763
  • [18] Tapo AA, 2020, Arxiv, DOI arXiv:2011.05284
  • [19] Team K, 2022, Keras documentation: ReduceLROnPlateau
  • [20] van der Westhuizen Ewald, 2021, Speech and Computer: 23rd International Conference, SPECOM 2021, Proceedings. Lecture Notes in Computer Science, Lecture Notes in Artificial Intelligence (12997), P749, DOI 10.1007/978-3-030-87802-3_67