Exploring Native and Non-Native English Child Speech Recognition With Whisper

被引：0

作者：

Jain, Rishabh ^{[1
]}

Barcovschi, Andrei ^{[1
]}

Yiwere, Mariam Yahayah ^{[1
]}

Corcoran, Peter ^{[1
]}

Cucu, Horia ^{[2
]}

机构：

[1] Univ Galway, Sch Elect & Elect Engn, Galway H91 TK33, Ireland

[2] Univ Politehn Bucuresti, Speech & Dialogue Res Lab, Bucharest 060042, Romania

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Child automatic speech recognition; whisper; large-scale supervision; MyST; PFSTAR; CMU_Kids; speechocean762; non-native child speech;

D O I：

10.1109/ACCESS.2024.3378738

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Modern end-to-end Automatic Speech Recognition (ASR) systems struggle to recognise children's speech. This challenge is due to the high acoustic variability in children's voices and the scarcity of child speech training data, particularly for accented or low-resource languages. This study focuses on improving the performance of ASR on native and non-native English child speech using publicly available datasets. We evaluate how the large-scale whisper models (trained with a large amount of adult speech data) perform with child speech. In addition, we perform finetuning experiments using different child speech datasets to investigate the performance of whisper ASR on non-native English-speaking children's speech. Our findings indicate relative Word Error Rate (WER) improvements ranging from 29% to 89% over previous benchmarks on the same datasets. Notably, these gains were achieved by finetuning with only a 10% sample of unseen non-native datasets. These results demonstrate the potential of whisper for improving ASR in a low-resource scenario for non-native child speech.

引用

页码：41601 / 41610

页数：10

共 50 条

[1] Non-native Speech in English Literature
Lange, Claudia
ANGLIA-ZEITSCHRIFT FUR ENGLISCHE PHILOLOGIE, 2016, 134 (03): : 527 - U359
[2] Comparing transcription agreement on non-native English speech corpus between native and non-native annotators
Ryu, Hyuksu
Kim, Sunhee
Chung, Minhwa
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2363 - 2366
[3] ACOUSTIC MODELING FOR NATIVE AND NON-NATIVE MANDARIN SPEECH RECOGNITION
Chen, Xin
Cheng, Jian
2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 325 - 329
[4] INTELLIGIBILITY OF ENGLISH SPEECH TO NON-NATIVE ENGLISH SPEAKERS
IRVINE, DH
LANGUAGE AND SPEECH, 1977, 20 (OCT-) : 308 - 316
[5] Non-native speech recognition sentences: A new materials set for non-native speech perception research
Stringer, Louise
Iverson, Paul
BEHAVIOR RESEARCH METHODS, 2020, 52 (02) : 561 - 571
[6] Non-native speech recognition sentences: A new materials set for non-native speech perception research
Louise Stringer
Paul Iverson
Behavior Research Methods, 2020, 52 : 561 - 571
[7] Non-native English speech recognition using bilingual english lexicon and acoustic models
Matsunaga, S
Ogawa, A
Yamaguchi, Y
Imamura, A
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 340 - 343
[8] Non-native English speech recognition using bilingual English lexicon and acoustic models
Matsunaga, S
Ogawa, A
Yamaguchi, Y
Imamura, A
2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL III, PROCEEDINGS, 2003, : 625 - 628
[9] The Lombard intelligibility benefit of native and non-native speech for native and non-native listeners
Marcoux, Katherine
Cooke, Martin
Tucker, Benjamin, V
Ernestus, Mirjam
SPEECH COMMUNICATION, 2022, 136 : 53 - 62
[10] NATIVE AND NON-NATIVE SPEECH PERCEPTION
Williams, Daniel
Escudero, Paola
ACOUSTICS AUSTRALIA, 2014, 42 (02) : 79 - 83

← 1 2 3 4 5 →