Towards multi-task learning of speech and speaker recognition

被引：0

作者：

Vaessen, Nik ^{[1
]}

van Leeuwen, David A. ^{[1
]}

机构：

[1] Radboud Univ Nijmegen, Inst Comp & Informat Sci, Nijmegen, Netherlands

来源：

INTERSPEECH 2023 | 2023年

关键词：

multi-task learning; speech recognition; speaker recognition; wav2vec2;

D O I：

10.21437/Interspeech.2023-353

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We study multi-task learning for two orthogonal speech technology tasks: speech and speaker recognition. We use wav2vec2 as a base architecture with two task-specific output heads. We experiment with different architectural decisions to mix speaker and speech information in the output sequence as well as different optimization strategies. Our multi-task learning networks can produce a shared speaker and speech embedding, which on first glance achieve a performance comparable to separate single-task models. However, we show that the multi-task networks have strongly degraded performance on out-of-distribution evaluation data compared to the single-task models. Code and model checkpoints are available at https://github.com/nikvaessen/disjoint-mtl.

引用

页码：4898 / 4902

页数：5

共 50 条

[21] Multi-task Learning over Mixup Variants for the Speaker Verification Task
Fathan, Abderrahim
Alam, Jahangir
Zhu, Xiaolin
SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 446 - 460
[22] Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning
Jain, Abhinav
Upreti, Minali
Jyothi, Preethi
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2454 - 2458
[23] Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario
Peng, Chiang-Jen
Chan, Yun-Ju
Yu, Cheng
Wang, Syu-Siang
Tsao, Yu
Chi, Tai-Shih
2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
[24] E2E-based Multi-task Learning Approach to Joint Speech and Accent Recognition
Zhang, Jicheng
Peng, Yizhou
Pham, Van Tung
Xu, Haihua
Huang, Hao
Chng, Eng Siong
INTERSPEECH 2021, 2021, : 1519 - 1523
[25] Investigating Multi-task Learning for Automatic Speech Recognition with Code-switching between Mandarin and English
Song, Xiao
Zou, Yuexian
Huang, Shilei
Chen, Shaobin
Liu, Yi
2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 27 - 30
[26] Multimodal Sentiment Recognition With Multi-Task Learning
Zhang, Sun
Yin, Chunyong
Yin, Zhichao
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2023, 7 (01): : 200 - 209
[27] Multi-Task Learning for Text-dependent Speaker Verification
Chen, Nanxin
Qian, Yanmin
Yu, Kai
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 185 - 189
[28] Hierarchical Multi-Task Learning Based on Interactive Multi-Head Attention Feature Fusion for Speech Depression Recognition
Xing, Yujuan
He, Ruifang
Zhang, Chengwen
Tan, Ping
IEEE ACCESS, 2025, 13 : 51208 - 51219
[29] Multi-Task Learning using Mismatched Transcription for Under-Resourced Speech Recognition
Van Hai Do
Chen, Nancy E.
Lim, Boon Pang
Hasegawa-Johnson, Mark
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 734 - 738
[30] Recognition of Latin American Spanish using Multi-task Learning
Mendes, Carlos
Abad, Alberto
Neto, Joao Paulo
Trancoso, Isabel
INTERSPEECH 2019, 2019, : 2135 - 2139

← 1 2 3 4 5 →