Towards multi-task learning of speech and speaker recognition

被引：0

作者：

Vaessen, Nik ^{[1
]}

van Leeuwen, David A. ^{[1
]}

机构：

[1] Radboud Univ Nijmegen, Inst Comp & Informat Sci, Nijmegen, Netherlands

来源：

INTERSPEECH 2023 | 2023年

关键词：

multi-task learning; speech recognition; speaker recognition; wav2vec2;

D O I：

10.21437/Interspeech.2023-353

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We study multi-task learning for two orthogonal speech technology tasks: speech and speaker recognition. We use wav2vec2 as a base architecture with two task-specific output heads. We experiment with different architectural decisions to mix speaker and speech information in the output sequence as well as different optimization strategies. Our multi-task learning networks can produce a shared speaker and speech embedding, which on first glance achieve a performance comparable to separate single-task models. However, we show that the multi-task networks have strongly degraded performance on out-of-distribution evaluation data compared to the single-task models. Code and model checkpoints are available at https://github.com/nikvaessen/disjoint-mtl.

引用

页码：4898 / 4902

页数：5

共 50 条

[41] Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
Fu, Hongliang
Zhuang, Zhihao
Wang, Yang
Huang, Chen
Duan, Wenzhuo
ENTROPY, 2023, 25 (01)
[42] Multi-task Learning with Auxiliary Cross-attention Transformer for Low-Resource Multi-dialect Speech Recognition
Dan, Zhengjia
Zhao, Yue
Bi, Xiaojun
Wu, Licheng
Ji, Qiang
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT I, 2022, 13551 : 107 - 118
[43] Multi-task learning for face ethnicity and gender recognition
Yu, Chanjuan
Fang, Yuchun
Li, Yang
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8833 : 136 - 144
[44] Finger Vein Recognition Based on Multi-Task Learning
Hao, Zhiang
Fang, Peiyu
Yang, Hanwen
2020 5TH INTERNATIONAL CONFERENCE ON MATHEMATICS AND ARTIFICIAL INTELLIGENCE (ICMAI 2020), 2020, : 133 - 140
[45] MULTI-TASK LEARNING IN DEEP NEURAL NETWORKS FOR IMPROVED PHONEME RECOGNITION
Seltzer, Michael L.
Droppo, Jasha
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6965 - 6969
[46] IMPROVING SAR TARGET RECOGNITION WITH MULTI-TASK LEARNING
Du, Wenrui
Zhang, Fan
Ma, Fei
Yin, Qiang
Zhou, Yongsheng
IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 284 - 287
[47] Multi-task gradient descent for multi-task learning
Lu Bai
Yew-Soon Ong
Tiantian He
Abhishek Gupta
Memetic Computing, 2020, 12 : 355 - 369
[48] HANDWRITTEN NUMERAL RECOGNITION USING MULTI-TASK LEARNING
Hou, Jinhui
Zeng, Huanqiang
Cai, Lei
Zhu, Jianqing
Cao, Jiuwen
Hou, Junhui
2017 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ISPACS 2017), 2017, : 155 - 158
[49] Multi-Task Learning for Voice Related Recognition Tasks
Montalvo, Ana
Calvo, Jose R.
Bonastre, Jean-Francois
INTERSPEECH 2020, 2020, : 2997 - 3001
[50] Multi-Task Learning for Face Ethnicity and Gender Recognition
Yu, Chanjuan
Fang, Yuchun
Li, Yang
BIOMETRIC RECOGNITION (CCBR 2014), 2014, 8833 : 136 - 144

← 1 2 3 4 5 →