Challenges of German Speech Recognition: A Study on Multi-ethnolectal Speech Among Adolescents

被引：0

作者：

Schubert, Martha ^{[1
]}

Duran, Daniel ^{[2
]}

Siegert, Ingo ^{[1
]}

机构：

[1] Otto von Guericke Univ, IIKT, Mobile Dialog Syst, Magdeburg, Germany

[2] Leibniz Ctr Gen Linguist ZAS, Berlin, Germany

来源：

INTERSPEECH 2024 | 2024年

关键词：

speech recognition; adolescent speech; bias; multi-ethnolectal speech;

D O I：

10.21437/Interspeech.2024-1717

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Despite significant advancements in speech recognition systems, challenges persist in accurately interpreting spontaneous speech from underrepresented groups like non-standard speakers or younger individuals. The difficulty increases when these conditions overlap. To further explore this topic, we employ a dataset featuring spontaneous as well as read speech from young speakers in Germany, including both, speakers from mono-ethnic and multi-ethnic backgrounds. Our study involves a comparative analysis of speech recognition performance, incorporating gender considerations, using three distinct Automatic Speech Recognition (ASR) engines: Whisper (OpenAI), NeMo (NVIDIA), and Wav2Vec2.0 (Meta AI). Furthermore, we conduct a comprehensive error analysis on the automatically generated transcripts, employing part-of-speech (POS) tagging. This allows us to discern the word types that pose the greatest challenge for comprehension by the ASR engines.

引用

页码：3045 / 3049

页数：5

共 27 条

[1]

Auer P, 2024, STUD LANG VAR, V31, P79, DOI 10.1075/silv.31.04aue

[2]

Baevski A., 2020, wav2vec 2.0: A framework for self-supervised learning of speech representations

[3] DiapixUK: task materials for the elicitation of multiple spontaneous speech dialogs [J].

Baker, Rachel ;

Hazan, Valerie .

BEHAVIOR RESEARCH METHODS, 2011, 43 (03) :761-770

[4] Automatic Speech Recognition (ASR) Systems for Children: A Systematic Literature Review [J].

Bhardwaj, Vivek ;

Ben Othman, Mohamed Tahar ;

Kukreja, Vinay ;

Belkhier, Youcef ;

Bajaj, Mohit ;

Goud, B. Srikanth ;

Rehman, Ateeq Ur ;

Shafiq, Muhammad ;

Hamam, Habib .

APPLIED SCIENCES-BASEL, 2022, 12 (09)

[5]

Brugman H., 2004, P 4 INT C LANGUAGE

[6]

Clyne Michael., 2000, SOCIOLINGUISTICA, V14, P83, DOI [DOI 10.1515/9783110245196.83, https://doi.org/10.1515/9783110245196.83]

[7]

Decker M. A., 2005, INTERSPEECH, P2205, DOI 10.21437/Interspeech.2005-699

[8] A comprehensive survey on automatic speech recognition using neural networks [J].

Dhanjal, Amandeep Singh ;

Singh, Williamjeet .

MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (8) :23367-23412

[9]

Feng SY, 2021, Arxiv, DOI arXiv:2103.15122

[10]

Fuckner Marcio, 2023, 2023 International Conference on Speech Technology and Human-Computer Dialogue (SpeD), P146, DOI 10.1109/SpeD59241.2023.10314895

← 1 2 3 →