Age-Invariant Training for End-to-End Child Speech Recognition using Adversarial Multi-Task Learning

被引:11
|
作者
Rumberg, Lars [1 ]
Ehlert, Hanna [2 ]
Luedtke, Ulrike [2 ]
Ostermann, Joern [1 ]
机构
[1] Leibniz Univ Hannover, Inst Informationsverarbeitung, Hannover, Germany
[2] Leibniz Univ Hannover, Inst Sonderpadagog, Hannover, Germany
来源
关键词
speech recognition; child speech; domain adaptation;
D O I
10.21437/Interspeech.2021-1241
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Automatic speech recognition for children's speech is a challenging task mainly due to scarcity of publicly available child speech corpora and wide inter- and intra-speaker variability in terms of acoustic and linguistic characteristics of children's speech. We propose a framework for age-invariant training of the acoustic model of end-to-end speech recognition systems based on adversarial multi-task learning. We use age information additionally to just differentiating between the child and adult domains and thus force the acoustic model to learn age invariant features. Our results on publicly available data sets show that this leads to better leveraging of existing data during training We further show that usage of adversarial multitask learning should not necessarily be regarded as a substitute for traditional feature space adaptation methods, but that both should be used together for best performance.
引用
收藏
页码:3850 / 3854
页数:5
相关论文
共 50 条
  • [1] Adversarial Multi-task Learning for End-to-end Metaphor Detection
    Zhang, Shenglong
    Liu, Ying
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 1483 - 1497
  • [2] Multi-task CTC Training with Auxiliary Feature Reconstruction for End-to-end Speech Recognition
    Kurata, Gakuto
    Audhkhasi, Kartik
    INTERSPEECH 2019, 2019, : 1636 - 1640
  • [3] JOINT CTC-ATTENTION BASED END-TO-END SPEECH RECOGNITION USING MULTI-TASK LEARNING
    Kim, Suyoun
    Hori, Takaaki
    Watanabe, Shinji
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4835 - 4839
  • [4] Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition
    Yadavalli, Aditya
    Mirishkar, Ganesh S.
    Vuppala, Anil Kumar
    INTERSPEECH 2022, 2022, : 1387 - 1391
  • [5] End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning
    Imaizumi, Ryo
    Masumura, Ryo
    Shiota, Sayaka
    Kiya, Hitoshi
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)
  • [6] Rethinking and Improving Multi-task Learning for End-to-end Speech Translation
    Zhang, Yuhao
    Xu, Chen
    Li, Bei
    Chen, Hao
    Xiao, Tong
    Zhang, Chunliang
    Zhu, Jingbo
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 10753 - 10765
  • [7] Hybrid Multi-Task Learning for End-To-End Multimodal Emotion Recognition
    Chen, Junjie
    Li, Yongwei
    Zhao, Ziping
    Liu, Xuefei
    Wen, Zhengqi
    Tao, Jianhua
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1966 - 1971
  • [8] End-to-End Multi-Task Learning with Attention
    Liu, Shikun
    Johns, Edward
    Davison, Andrew J.
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1871 - 1880
  • [9] Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning
    Hou, Wenxin
    Dong, Yue
    Zhuang, Bairong
    Yang, Longfei
    Shi, Jiatong
    Shinozaki, Takahiro
    INTERSPEECH 2020, 2020, : 1037 - 1041
  • [10] ADVERSARIAL TRAINING OF END-TO-END SPEECH RECOGNITION USING A CRITICIZING LANGUAGE MODEL
    Liu, Alexander H.
    Lee, Hung-yi
    Lee, Lin-shan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6176 - 6180