Challenges of German Speech Recognition: A Study on Multi-ethnolectal Speech Among Adolescents

被引:0
作者
Schubert, Martha [1 ]
Duran, Daniel [2 ]
Siegert, Ingo [1 ]
机构
[1] Otto von Guericke Univ, IIKT, Mobile Dialog Syst, Magdeburg, Germany
[2] Leibniz Ctr Gen Linguist ZAS, Berlin, Germany
来源
INTERSPEECH 2024 | 2024年
关键词
speech recognition; adolescent speech; bias; multi-ethnolectal speech;
D O I
10.21437/Interspeech.2024-1717
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite significant advancements in speech recognition systems, challenges persist in accurately interpreting spontaneous speech from underrepresented groups like non-standard speakers or younger individuals. The difficulty increases when these conditions overlap. To further explore this topic, we employ a dataset featuring spontaneous as well as read speech from young speakers in Germany, including both, speakers from mono-ethnic and multi-ethnic backgrounds. Our study involves a comparative analysis of speech recognition performance, incorporating gender considerations, using three distinct Automatic Speech Recognition (ASR) engines: Whisper (OpenAI), NeMo (NVIDIA), and Wav2Vec2.0 (Meta AI). Furthermore, we conduct a comprehensive error analysis on the automatically generated transcripts, employing part-of-speech (POS) tagging. This allows us to discern the word types that pose the greatest challenge for comprehension by the ASR engines.
引用
收藏
页码:3045 / 3049
页数:5
相关论文
共 27 条
[1]  
Auer P, 2024, STUD LANG VAR, V31, P79, DOI 10.1075/silv.31.04aue
[2]  
Baevski A., 2020, wav2vec 2.0: A framework for self-supervised learning of speech representations
[3]   DiapixUK: task materials for the elicitation of multiple spontaneous speech dialogs [J].
Baker, Rachel ;
Hazan, Valerie .
BEHAVIOR RESEARCH METHODS, 2011, 43 (03) :761-770
[4]   Automatic Speech Recognition (ASR) Systems for Children: A Systematic Literature Review [J].
Bhardwaj, Vivek ;
Ben Othman, Mohamed Tahar ;
Kukreja, Vinay ;
Belkhier, Youcef ;
Bajaj, Mohit ;
Goud, B. Srikanth ;
Rehman, Ateeq Ur ;
Shafiq, Muhammad ;
Hamam, Habib .
APPLIED SCIENCES-BASEL, 2022, 12 (09)
[5]  
Brugman H., 2004, P 4 INT C LANGUAGE
[6]  
Clyne Michael., 2000, SOCIOLINGUISTICA, V14, P83, DOI [DOI 10.1515/9783110245196.83, https://doi.org/10.1515/9783110245196.83]
[7]  
Decker M. A., 2005, INTERSPEECH, P2205, DOI 10.21437/Interspeech.2005-699
[8]   A comprehensive survey on automatic speech recognition using neural networks [J].
Dhanjal, Amandeep Singh ;
Singh, Williamjeet .
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (8) :23367-23412
[9]  
Feng SY, 2021, Arxiv, DOI arXiv:2103.15122
[10]  
Fuckner Marcio, 2023, 2023 International Conference on Speech Technology and Human-Computer Dialogue (SpeD), P146, DOI 10.1109/SpeD59241.2023.10314895