Age-Invariant Training for End-to-End Child Speech Recognition using Adversarial Multi-Task Learning

被引：11

作者：

Rumberg, Lars ^{[1
]}

Ehlert, Hanna ^{[2
]}

Luedtke, Ulrike ^{[2
]}

Ostermann, Joern ^{[1
]}

机构：

[1] Leibniz Univ Hannover, Inst Informationsverarbeitung, Hannover, Germany

[2] Leibniz Univ Hannover, Inst Sonderpadagog, Hannover, Germany

来源：

INTERSPEECH 2021 | 2021年

关键词：

speech recognition; child speech; domain adaptation;

D O I：

10.21437/Interspeech.2021-1241

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Automatic speech recognition for children's speech is a challenging task mainly due to scarcity of publicly available child speech corpora and wide inter- and intra-speaker variability in terms of acoustic and linguistic characteristics of children's speech. We propose a framework for age-invariant training of the acoustic model of end-to-end speech recognition systems based on adversarial multi-task learning. We use age information additionally to just differentiating between the child and adult domains and thus force the acoustic model to learn age invariant features. Our results on publicly available data sets show that this leads to better leveraging of existing data during training We further show that usage of adversarial multitask learning should not necessarily be regarded as a substitute for traditional feature space adaptation methods, but that both should be used together for best performance.

引用

页码：3850 / 3854

页数：5

共 50 条

[1] Adversarial Multi-task Learning for End-to-end Metaphor Detection
Zhang, Shenglong
Liu, Ying
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 1483 - 1497
[2] Multi-task CTC Training with Auxiliary Feature Reconstruction for End-to-end Speech Recognition
Kurata, Gakuto
Audhkhasi, Kartik
INTERSPEECH 2019, 2019, : 1636 - 1640
[3] JOINT CTC-ATTENTION BASED END-TO-END SPEECH RECOGNITION USING MULTI-TASK LEARNING
Kim, Suyoun
Hori, Takaaki
Watanabe, Shinji
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4835 - 4839
[4] Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition
Yadavalli, Aditya
Mirishkar, Ganesh S.
Vuppala, Anil Kumar
INTERSPEECH 2022, 2022, : 1387 - 1391
[5] End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning
Imaizumi, Ryo
Masumura, Ryo
Shiota, Sayaka
Kiya, Hitoshi
APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)
[6] Rethinking and Improving Multi-task Learning for End-to-end Speech Translation
Zhang, Yuhao
Xu, Chen
Li, Bei
Chen, Hao
Xiao, Tong
Zhang, Chunliang
Zhu, Jingbo
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 10753 - 10765
[7] Hybrid Multi-Task Learning for End-To-End Multimodal Emotion Recognition
Chen, Junjie
Li, Yongwei
Zhao, Ziping
Liu, Xuefei
Wen, Zhengqi
Tao, Jianhua
2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1966 - 1971
[8] End-to-End Multi-Task Learning with Attention
Liu, Shikun
Johns, Edward
Davison, Andrew J.
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1871 - 1880
[9] Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning
Hou, Wenxin
Dong, Yue
Zhuang, Bairong
Yang, Longfei
Shi, Jiatong
Shinozaki, Takahiro
INTERSPEECH 2020, 2020, : 1037 - 1041
[10] ADVERSARIAL TRAINING OF END-TO-END SPEECH RECOGNITION USING A CRITICIZING LANGUAGE MODEL
Liu, Alexander H.
Lee, Hung-yi
Lee, Lin-shan
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6176 - 6180

← 1 2 3 4 5 →