Exploring Native and Non-Native English Child Speech Recognition With Whisper

被引:0
|
作者
Jain, Rishabh [1 ]
Barcovschi, Andrei [1 ]
Yiwere, Mariam Yahayah [1 ]
Corcoran, Peter [1 ]
Cucu, Horia [2 ]
机构
[1] Univ Galway, Sch Elect & Elect Engn, Galway H91 TK33, Ireland
[2] Univ Politehn Bucuresti, Speech & Dialogue Res Lab, Bucharest 060042, Romania
关键词
Child automatic speech recognition; whisper; large-scale supervision; MyST; PFSTAR; CMU_Kids; speechocean762; non-native child speech;
D O I
10.1109/ACCESS.2024.3378738
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modern end-to-end Automatic Speech Recognition (ASR) systems struggle to recognise children's speech. This challenge is due to the high acoustic variability in children's voices and the scarcity of child speech training data, particularly for accented or low-resource languages. This study focuses on improving the performance of ASR on native and non-native English child speech using publicly available datasets. We evaluate how the large-scale whisper models (trained with a large amount of adult speech data) perform with child speech. In addition, we perform finetuning experiments using different child speech datasets to investigate the performance of whisper ASR on non-native English-speaking children's speech. Our findings indicate relative Word Error Rate (WER) improvements ranging from 29% to 89% over previous benchmarks on the same datasets. Notably, these gains were achieved by finetuning with only a 10% sample of unseen non-native datasets. These results demonstrate the potential of whisper for improving ASR in a low-resource scenario for non-native child speech.
引用
收藏
页码:41601 / 41610
页数:10
相关论文
共 50 条
  • [1] Non-native Speech in English Literature
    Lange, Claudia
    ANGLIA-ZEITSCHRIFT FUR ENGLISCHE PHILOLOGIE, 2016, 134 (03): : 527 - U359
  • [2] Comparing transcription agreement on non-native English speech corpus between native and non-native annotators
    Ryu, Hyuksu
    Kim, Sunhee
    Chung, Minhwa
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2363 - 2366
  • [3] ACOUSTIC MODELING FOR NATIVE AND NON-NATIVE MANDARIN SPEECH RECOGNITION
    Chen, Xin
    Cheng, Jian
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 325 - 329
  • [4] INTELLIGIBILITY OF ENGLISH SPEECH TO NON-NATIVE ENGLISH SPEAKERS
    IRVINE, DH
    LANGUAGE AND SPEECH, 1977, 20 (OCT-) : 308 - 316
  • [5] Non-native speech recognition sentences: A new materials set for non-native speech perception research
    Stringer, Louise
    Iverson, Paul
    BEHAVIOR RESEARCH METHODS, 2020, 52 (02) : 561 - 571
  • [6] Non-native speech recognition sentences: A new materials set for non-native speech perception research
    Louise Stringer
    Paul Iverson
    Behavior Research Methods, 2020, 52 : 561 - 571
  • [7] Non-native English speech recognition using bilingual english lexicon and acoustic models
    Matsunaga, S
    Ogawa, A
    Yamaguchi, Y
    Imamura, A
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 340 - 343
  • [8] Non-native English speech recognition using bilingual English lexicon and acoustic models
    Matsunaga, S
    Ogawa, A
    Yamaguchi, Y
    Imamura, A
    2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL III, PROCEEDINGS, 2003, : 625 - 628
  • [9] The Lombard intelligibility benefit of native and non-native speech for native and non-native listeners
    Marcoux, Katherine
    Cooke, Martin
    Tucker, Benjamin, V
    Ernestus, Mirjam
    SPEECH COMMUNICATION, 2022, 136 : 53 - 62
  • [10] NATIVE AND NON-NATIVE SPEECH PERCEPTION
    Williams, Daniel
    Escudero, Paola
    ACOUSTICS AUSTRALIA, 2014, 42 (02) : 79 - 83