Towards multi-task learning of speech and speaker recognition

被引:0
|
作者
Vaessen, Nik [1 ]
van Leeuwen, David A. [1 ]
机构
[1] Radboud Univ Nijmegen, Inst Comp & Informat Sci, Nijmegen, Netherlands
来源
INTERSPEECH 2023 | 2023年
关键词
multi-task learning; speech recognition; speaker recognition; wav2vec2;
D O I
10.21437/Interspeech.2023-353
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We study multi-task learning for two orthogonal speech technology tasks: speech and speaker recognition. We use wav2vec2 as a base architecture with two task-specific output heads. We experiment with different architectural decisions to mix speaker and speech information in the output sequence as well as different optimization strategies. Our multi-task learning networks can produce a shared speaker and speech embedding, which on first glance achieve a performance comparable to separate single-task models. However, we show that the multi-task networks have strongly degraded performance on out-of-distribution evaluation data compared to the single-task models. Code and model checkpoints are available at https://github.com/nikvaessen/disjoint-mtl.
引用
收藏
页码:4898 / 4902
页数:5
相关论文
共 50 条
  • [21] Multi-task Learning over Mixup Variants for the Speaker Verification Task
    Fathan, Abderrahim
    Alam, Jahangir
    Zhu, Xiaolin
    SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 446 - 460
  • [22] Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning
    Jain, Abhinav
    Upreti, Minali
    Jyothi, Preethi
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2454 - 2458
  • [23] Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario
    Peng, Chiang-Jen
    Chan, Yun-Ju
    Yu, Cheng
    Wang, Syu-Siang
    Tsao, Yu
    Chi, Tai-Shih
    2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
  • [24] E2E-based Multi-task Learning Approach to Joint Speech and Accent Recognition
    Zhang, Jicheng
    Peng, Yizhou
    Pham, Van Tung
    Xu, Haihua
    Huang, Hao
    Chng, Eng Siong
    INTERSPEECH 2021, 2021, : 1519 - 1523
  • [25] Investigating Multi-task Learning for Automatic Speech Recognition with Code-switching between Mandarin and English
    Song, Xiao
    Zou, Yuexian
    Huang, Shilei
    Chen, Shaobin
    Liu, Yi
    2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 27 - 30
  • [26] Multimodal Sentiment Recognition With Multi-Task Learning
    Zhang, Sun
    Yin, Chunyong
    Yin, Zhichao
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2023, 7 (01): : 200 - 209
  • [27] Multi-Task Learning for Text-dependent Speaker Verification
    Chen, Nanxin
    Qian, Yanmin
    Yu, Kai
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 185 - 189
  • [28] Hierarchical Multi-Task Learning Based on Interactive Multi-Head Attention Feature Fusion for Speech Depression Recognition
    Xing, Yujuan
    He, Ruifang
    Zhang, Chengwen
    Tan, Ping
    IEEE ACCESS, 2025, 13 : 51208 - 51219
  • [29] Multi-Task Learning using Mismatched Transcription for Under-Resourced Speech Recognition
    Van Hai Do
    Chen, Nancy E.
    Lim, Boon Pang
    Hasegawa-Johnson, Mark
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 734 - 738
  • [30] Recognition of Latin American Spanish using Multi-task Learning
    Mendes, Carlos
    Abad, Alberto
    Neto, Joao Paulo
    Trancoso, Isabel
    INTERSPEECH 2019, 2019, : 2135 - 2139