Age-Invariant Training for End-to-End Child Speech Recognition using Adversarial Multi-Task Learning

被引:11
|
作者
Rumberg, Lars [1 ]
Ehlert, Hanna [2 ]
Luedtke, Ulrike [2 ]
Ostermann, Joern [1 ]
机构
[1] Leibniz Univ Hannover, Inst Informationsverarbeitung, Hannover, Germany
[2] Leibniz Univ Hannover, Inst Sonderpadagog, Hannover, Germany
来源
关键词
speech recognition; child speech; domain adaptation;
D O I
10.21437/Interspeech.2021-1241
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Automatic speech recognition for children's speech is a challenging task mainly due to scarcity of publicly available child speech corpora and wide inter- and intra-speaker variability in terms of acoustic and linguistic characteristics of children's speech. We propose a framework for age-invariant training of the acoustic model of end-to-end speech recognition systems based on adversarial multi-task learning. We use age information additionally to just differentiating between the child and adult domains and thus force the acoustic model to learn age invariant features. Our results on publicly available data sets show that this leads to better leveraging of existing data during training We further show that usage of adversarial multitask learning should not necessarily be regarded as a substitute for traditional feature space adaptation methods, but that both should be used together for best performance.
引用
收藏
页码:3850 / 3854
页数:5
相关论文
共 50 条
  • [41] Multi-task multi-resolution char-to-BPE cross-attention decoder for end-to-end speech recognition
    Gowda, Dhananjaya
    Garg, Abhinav
    Kim, Kwangyoun
    Kumar, Mehul
    Kim, Chanwoo
    INTERSPEECH 2019, 2019, : 2783 - 2787
  • [42] An End-to-End Multi-Task Deep Learning Framework for Skin Lesion Analysis
    Song, Lei
    Lin, Jianzhe
    Wang, Z. Jane
    Wang, Haoqian
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2020, 24 (10) : 2912 - 2921
  • [43] End-to-End Multi-task Learning for Allusion Detection in Ancient Chinese Poems
    Liu, Lei
    Chen, Xiaoyang
    He, Ben
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT (KSEM 2020), PT II, 2020, 12275 : 300 - 311
  • [44] End-to-end Argument Mining with Cross-corpora Multi-task Learning
    Morio, Gaku
    Ozaki, Hiroaki
    Morishita, Terufumi
    Yanai, Kohsuke
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 639 - 658
  • [45] When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework and a New Benchmark
    Huang, Zhizhong
    Zhang, Junping
    Shan, Hongming
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7917 - 7932
  • [46] An End-to-End Multi-Task and Fusion CNN for Inertial-Based Gait Recognition
    Delgado-Escano, Ruben
    Castro, Francisco M.
    Cozar, Julian Ramos
    Marin-Jimenez, Manuel J.
    Guil, Nicolas
    IEEE ACCESS, 2019, 7 : 1897 - 1908
  • [47] LEARNING NOISE INVARIANT FEATURES THROUGH TRANSFER LEARNING FOR ROBUST END-TO-END SPEECH RECOGNITION
    Zhang, Shucong
    Do, Cong-Thanh
    Doddipatla, Rama
    Renals, Steve
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7024 - 7028
  • [48] Adversarial joint training with self-attention mechanism for robust end-to-end speech recognition
    Li, Lujun
    Kang, Yikai
    Shi, Yuchen
    Kurzinger, Ludwig
    Watzel, Tobias
    Rigoll, Gerhard
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
  • [49] Adversarial joint training with self-attention mechanism for robust end-to-end speech recognition
    Lujun Li
    Yikai Kang
    Yuchen Shi
    Ludwig Kürzinger
    Tobias Watzel
    Gerhard Rigoll
    EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [50] SELF-TRAINING FOR END-TO-END SPEECH RECOGNITION
    Kahn, Jacob
    Lee, Ann
    Hannun, Awni
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7084 - 7088