Towards multi-task learning of speech and speaker recognition

被引:0
|
作者
Vaessen, Nik [1 ]
van Leeuwen, David A. [1 ]
机构
[1] Radboud Univ Nijmegen, Inst Comp & Informat Sci, Nijmegen, Netherlands
来源
INTERSPEECH 2023 | 2023年
关键词
multi-task learning; speech recognition; speaker recognition; wav2vec2;
D O I
10.21437/Interspeech.2023-353
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We study multi-task learning for two orthogonal speech technology tasks: speech and speaker recognition. We use wav2vec2 as a base architecture with two task-specific output heads. We experiment with different architectural decisions to mix speaker and speech information in the output sequence as well as different optimization strategies. Our multi-task learning networks can produce a shared speaker and speech embedding, which on first glance achieve a performance comparable to separate single-task models. However, we show that the multi-task networks have strongly degraded performance on out-of-distribution evaluation data compared to the single-task models. Code and model checkpoints are available at https://github.com/nikvaessen/disjoint-mtl.
引用
收藏
页码:4898 / 4902
页数:5
相关论文
共 50 条
  • [41] Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
    Fu, Hongliang
    Zhuang, Zhihao
    Wang, Yang
    Huang, Chen
    Duan, Wenzhuo
    ENTROPY, 2023, 25 (01)
  • [42] Multi-task Learning with Auxiliary Cross-attention Transformer for Low-Resource Multi-dialect Speech Recognition
    Dan, Zhengjia
    Zhao, Yue
    Bi, Xiaojun
    Wu, Licheng
    Ji, Qiang
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT I, 2022, 13551 : 107 - 118
  • [43] Multi-task learning for face ethnicity and gender recognition
    Yu, Chanjuan
    Fang, Yuchun
    Li, Yang
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8833 : 136 - 144
  • [44] Finger Vein Recognition Based on Multi-Task Learning
    Hao, Zhiang
    Fang, Peiyu
    Yang, Hanwen
    2020 5TH INTERNATIONAL CONFERENCE ON MATHEMATICS AND ARTIFICIAL INTELLIGENCE (ICMAI 2020), 2020, : 133 - 140
  • [45] MULTI-TASK LEARNING IN DEEP NEURAL NETWORKS FOR IMPROVED PHONEME RECOGNITION
    Seltzer, Michael L.
    Droppo, Jasha
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6965 - 6969
  • [46] IMPROVING SAR TARGET RECOGNITION WITH MULTI-TASK LEARNING
    Du, Wenrui
    Zhang, Fan
    Ma, Fei
    Yin, Qiang
    Zhou, Yongsheng
    IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 284 - 287
  • [47] Multi-task gradient descent for multi-task learning
    Lu Bai
    Yew-Soon Ong
    Tiantian He
    Abhishek Gupta
    Memetic Computing, 2020, 12 : 355 - 369
  • [48] HANDWRITTEN NUMERAL RECOGNITION USING MULTI-TASK LEARNING
    Hou, Jinhui
    Zeng, Huanqiang
    Cai, Lei
    Zhu, Jianqing
    Cao, Jiuwen
    Hou, Junhui
    2017 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ISPACS 2017), 2017, : 155 - 158
  • [49] Multi-Task Learning for Voice Related Recognition Tasks
    Montalvo, Ana
    Calvo, Jose R.
    Bonastre, Jean-Francois
    INTERSPEECH 2020, 2020, : 2997 - 3001
  • [50] Multi-Task Learning for Face Ethnicity and Gender Recognition
    Yu, Chanjuan
    Fang, Yuchun
    Li, Yang
    BIOMETRIC RECOGNITION (CCBR 2014), 2014, 8833 : 136 - 144