Age-Invariant Training for End-to-End Child Speech Recognition using Adversarial Multi-Task Learning

被引:11
|
作者
Rumberg, Lars [1 ]
Ehlert, Hanna [2 ]
Luedtke, Ulrike [2 ]
Ostermann, Joern [1 ]
机构
[1] Leibniz Univ Hannover, Inst Informationsverarbeitung, Hannover, Germany
[2] Leibniz Univ Hannover, Inst Sonderpadagog, Hannover, Germany
来源
关键词
speech recognition; child speech; domain adaptation;
D O I
10.21437/Interspeech.2021-1241
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Automatic speech recognition for children's speech is a challenging task mainly due to scarcity of publicly available child speech corpora and wide inter- and intra-speaker variability in terms of acoustic and linguistic characteristics of children's speech. We propose a framework for age-invariant training of the acoustic model of end-to-end speech recognition systems based on adversarial multi-task learning. We use age information additionally to just differentiating between the child and adult domains and thus force the acoustic model to learn age invariant features. Our results on publicly available data sets show that this leads to better leveraging of existing data during training We further show that usage of adversarial multitask learning should not necessarily be regarded as a substitute for traditional feature space adaptation methods, but that both should be used together for best performance.
引用
收藏
页码:3850 / 3854
页数:5
相关论文
共 50 条
  • [21] Multi-task and multi-view training for end-to-end relation extraction
    Zhang, Junchi
    Zhang, Yue
    Ji, Donghong
    Liu, Mengchi
    NEUROCOMPUTING, 2019, 364 : 245 - 253
  • [22] Multi-objective optimization based multi-task learning for end-to-end license plates recognition
    Zhou X.-J.
    Gao Y.
    Li C.-J.
    Yang C.-H.
    Kongzhi Lilun Yu Yingyong/Control Theory and Applications, 2021, 38 (05): : 676 - 688
  • [23] Speech Emotion Recognition in the Wild using Multi-task and Adversarial Learning
    Parry, Jack
    DeMattos, Eric
    Klementiev, Anita
    Ind, Axel
    Morse-Kopp, Daniela
    Clarke, Georgia
    Palaz, Dimitri
    INTERSPEECH 2022, 2022, : 1158 - 1162
  • [24] End-to-End Speech Recognition Sequence Training With Reinforcement Learning
    Tjandra, Andros
    Sakti, Sakriani
    Nakamura, Satoshi
    IEEE ACCESS, 2019, 7 : 79758 - 79769
  • [25] COMBINING END-TO-END AND ADVERSARIAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION
    Drexler, Jennifer
    Glass, James
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 361 - 368
  • [26] Towards end-to-end Cyberthreat Detection from Twitter using Multi-Task Learning
    Dionisio, Nuno
    Alves, Fernando
    Ferreira, Pedro M.
    Bessani, Alysson
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [27] An End-to-End Scalable Iterative Sequence Tagging with Multi-Task Learning
    Gui, Lin
    Du, Jiachen
    Zhao, Zhishan
    He, Yulan
    Xu, Ruifeng
    Fan, Chuang
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2018, PT II, 2018, 11109 : 288 - 298
  • [28] End-to-End Multi-Task Learning for Lung Nodule Segmentation and Diagnosis
    Chen, Wei
    Wang, Qiuli
    Yang, Dan
    Zhang, Xiaohong
    Liu, Chen
    Li, Yucong
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 6710 - 6717
  • [29] MitosisNet: End-to-End Mitotic Cell Detection by Multi-Task Learning
    Alom, Md Zahangir
    Aspiras, Theus
    Taha, Tarek M.
    Bowen, T. J.
    Asari, Vijayan K.
    IEEE ACCESS, 2020, 8 : 68695 - 68710
  • [30] Decorrelated Adversarial Learning for Age-Invariant Face Recognition
    Wang, Hao
    Gong, Dihong
    Li, Zhifeng
    Liu, Wei
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3522 - 3531