Age-Invariant Training for End-to-End Child Speech Recognition using Adversarial Multi-Task Learning

被引:11
|
作者
Rumberg, Lars [1 ]
Ehlert, Hanna [2 ]
Luedtke, Ulrike [2 ]
Ostermann, Joern [1 ]
机构
[1] Leibniz Univ Hannover, Inst Informationsverarbeitung, Hannover, Germany
[2] Leibniz Univ Hannover, Inst Sonderpadagog, Hannover, Germany
来源
关键词
speech recognition; child speech; domain adaptation;
D O I
10.21437/Interspeech.2021-1241
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Automatic speech recognition for children's speech is a challenging task mainly due to scarcity of publicly available child speech corpora and wide inter- and intra-speaker variability in terms of acoustic and linguistic characteristics of children's speech. We propose a framework for age-invariant training of the acoustic model of end-to-end speech recognition systems based on adversarial multi-task learning. We use age information additionally to just differentiating between the child and adult domains and thus force the acoustic model to learn age invariant features. Our results on publicly available data sets show that this leads to better leveraging of existing data during training We further show that usage of adversarial multitask learning should not necessarily be regarded as a substitute for traditional feature space adaptation methods, but that both should be used together for best performance.
引用
收藏
页码:3850 / 3854
页数:5
相关论文
共 50 条
  • [31] NIESR: Nuisance Invariant End-to-end Speech Recognition
    Hsu, I-Hung
    Jaiswal, Ayush
    Natarajan, Premkumar
    INTERSPEECH 2019, 2019, : 456 - 460
  • [32] Accented Speech Recognition Based on End-to-End Domain Adversarial Training of Neural Networks
    Na, Hyeong-Ju
    Park, Jeong-Sik
    APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [33] Arabic speech recognition using end-to-end deep learning
    Alsayadi, Hamzah A.
    Abdelhamid, Abdelaziz A.
    Hegazy, Islam
    Fayed, Zaki T.
    IET SIGNAL PROCESSING, 2021, 15 (08) : 521 - 534
  • [34] ASR Posterior-based Loss for Multi-task End-to-end Speech Translation
    Ko, Yuka
    Sudoh, Katsuhito
    Sakti, Sakriani
    Nakamura, Satoshi
    INTERSPEECH 2021, 2021, : 2272 - 2276
  • [35] END-TO-END TRAINING OF A LARGE VOCABULARY END-TO-END SPEECH RECOGNITION SYSTEM
    Kim, Chanwoo
    Kim, Sungsoo
    Kim, Kwangyoun
    Kumar, Mehul
    Kim, Jiyeon
    Lee, Kyungmin
    Han, Changwoo
    Garg, Abhinav
    Kim, Eunhyang
    Shin, Minkyoo
    Singh, Shatrughan
    Heck, Larry
    Gowda, Dhananjaya
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 562 - 569
  • [36] Utterance invariant training for hybrid two-pass end-to-end speech recognition
    Gowda, Dhananjaya
    Kumar, Ankur
    Kim, Kwangyoun
    Yang, Hejung
    Garg, Abhinav
    Singh, Sachin
    Kim, Jiyeon
    Jin, Mehul Kumar Sichen
    Singh, Shatrughan
    Kim, Chanwoo
    INTERSPEECH 2020, 2020, : 2827 - 2831
  • [37] Multi-task Learning for End-to-end Noise-robust Bandwidth Extension
    Hou, Nana
    Xu, Chenglin
    Zhou, Joey Tianyi
    Chng, Eng Siong
    Li, Haizhou
    INTERSPEECH 2020, 2020, : 4069 - 4073
  • [38] A multi-task learning framework for end-to-end aspect sentiment triplet extraction
    Chen, Fang
    Yang, Zhongliang
    Huang, Yongfeng
    NEUROCOMPUTING, 2022, 479 : 12 - 21
  • [39] An end-to-end multi-task deep learning framework for bronchoscopy image classification
    Setayeshi, Rojin
    Vahidi, Javad
    Kozegar, Ehsan
    Tan, Tao
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [40] Multi-Task Neural Learning Architecture for End-to-End Identification of Helpful Reviews
    Fan, Miao
    Feng, Yue
    Sun, Mingming
    Li, Ping
    Wang, Haifeng
    Wang, Jianmin
    2018 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2018, : 343 - 350