Towards multi-task learning of speech and speaker recognition

被引:0
|
作者
Vaessen, Nik [1 ]
van Leeuwen, David A. [1 ]
机构
[1] Radboud Univ Nijmegen, Inst Comp & Informat Sci, Nijmegen, Netherlands
来源
关键词
multi-task learning; speech recognition; speaker recognition; wav2vec2;
D O I
10.21437/Interspeech.2023-353
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We study multi-task learning for two orthogonal speech technology tasks: speech and speaker recognition. We use wav2vec2 as a base architecture with two task-specific output heads. We experiment with different architectural decisions to mix speaker and speech information in the output sequence as well as different optimization strategies. Our multi-task learning networks can produce a shared speaker and speech embedding, which on first glance achieve a performance comparable to separate single-task models. However, we show that the multi-task networks have strongly degraded performance on out-of-distribution evaluation data compared to the single-task models. Code and model checkpoints are available at https://github.com/nikvaessen/disjoint-mtl.
引用
收藏
页码:4898 / 4902
页数:5
相关论文
共 50 条
  • [31] Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario
    Peng, Chiang-Jen
    Chan, Yun-Ju
    Yu, Cheng
    Wang, Syu-Siang
    Tsao, Yu
    Chi, Tai-Shih
    2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
  • [32] TASK AWARE MULTI-TASK LEARNING FOR SPEECH TO TEXT TASKS
    Indurthi, Sathish
    Zaidi, Mohd Abbas
    Lakumarapu, Nikhil Kumar
    Lee, Beomseok
    Han, Hyojung
    Ahn, Seokchan
    Kim, Sangha
    Kim, Chanwoo
    Hwang, Inchul
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7723 - 7727
  • [33] Multi-Task Learning of Speech Recognition and Speech Synthesis Parameters for Ultrasound-based Silent Speech Interfaces
    Toth, Laszlo
    Gosztolya, Gabor
    Grosz, Tamas
    Marko, Alexandra
    Csapo, Tamas Gabor
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3172 - 3176
  • [34] Multi-Task Ensemble Learning for Affect Recognition
    Gjoreski, Martin
    Lustrek, Mitja
    Gams, Matjaz
    PROCEEDINGS OF THE 2018 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING AND PROCEEDINGS OF THE 2018 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS (UBICOMP/ISWC'18 ADJUNCT), 2018, : 553 - 558
  • [35] Multimodal Sentiment Recognition With Multi-Task Learning
    Zhang, Sun
    Yin, Chunyong
    Yin, Zhichao
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2023, 7 (01): : 200 - 209
  • [36] MULTI-TASK LEARNING FOR SPEAKER VERIFICATION AND VOICE TRIGGER DETECTION
    Sigtia, Siddharth
    Marchi, Erik
    Kajarekar, Sachin
    Naik, Devang
    Bridle, John
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6844 - 6848
  • [37] Multi-Task Learning for Text-dependent Speaker Verification
    Chen, Nanxin
    Qian, Yanmin
    Yu, Kai
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 185 - 189
  • [38] ADVERSARIAL MULTI-TASK LEARNING FOR SPEAKER NORMALIZATION IN REPLAY DETECTION
    Suthokumar, Gajan
    Sethu, Vidhyasaharan
    Sriskandaraja, Kaavya
    Ambikairajah, Eliathamby
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6609 - 6613
  • [39] Multi-Task Learning using Mismatched Transcription for Under-Resourced Speech Recognition
    Van Hai Do
    Chen, Nancy E.
    Lim, Boon Pang
    Hasegawa-Johnson, Mark
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 734 - 738
  • [40] Attribute Knowledge Integration for Speech Recognition Based on Multi-task Learning Neural Networks
    Zheng, Hao
    Yang, Zhanlei
    Qiao, Liwei
    Li, Jianping
    Liu, Wenju
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 543 - 547