Towards multi-task learning of speech and speaker recognition

被引：0

作者：

Vaessen, Nik ^{[1
]}

van Leeuwen, David A. ^{[1
]}

机构：

[1] Radboud Univ Nijmegen, Inst Comp & Informat Sci, Nijmegen, Netherlands

来源：

INTERSPEECH 2023 | 2023年

关键词：

multi-task learning; speech recognition; speaker recognition; wav2vec2;

D O I：

10.21437/Interspeech.2023-353

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We study multi-task learning for two orthogonal speech technology tasks: speech and speaker recognition. We use wav2vec2 as a base architecture with two task-specific output heads. We experiment with different architectural decisions to mix speaker and speech information in the output sequence as well as different optimization strategies. Our multi-task learning networks can produce a shared speaker and speech embedding, which on first glance achieve a performance comparable to separate single-task models. However, we show that the multi-task networks have strongly degraded performance on out-of-distribution evaluation data compared to the single-task models. Code and model checkpoints are available at https://github.com/nikvaessen/disjoint-mtl.

引用

页码：4898 / 4902

页数：5

共 50 条

[1] Speech Emotion Recognition with Multi-task Learning
Cai, Xingyu
Yuan, Jiahong
Zheng, Renjie
Huang, Liang
Church, Kenneth
INTERSPEECH 2021, 2021, : 4508 - 4512
[2] MULTI-OBJECTIVE MULTI-TASK LEARNING ON RNNLM FOR SPEECH RECOGNITION
Song, Minguang
Zhao, Yunxin
Wang, Shaojun
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 197 - 203
[3] Multi-task learning for X-vector based speaker recognition
Zhang Y.
Liu L.
International Journal of Speech Technology, 2023, 26 (04) : 817 - 823
[4] Speech Emotion Recognition based on Multi-Task Learning
Zhao, Huijuan
Han Zhijie
Wang, Ruchuan
2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, : 186 - 188
[5] A Pseudo-task Design in Multi-task Learning Deep Neural Network for Speaker Recognition
Lu, Xugang
Shen, Peng
Tsao, Yu
Kawai, Hisashi
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[6] Multi-task Learning for Speech Emotion and Emotion Intensity Recognition
Yue, Pengcheng
Qu, Leyuan
Zheng, Shukai
Li, Taihao
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1232 - 1237
[7] Hear No Evil: Towards Adversarial Robustness of Automatic Speech Recognition via Multi-Task Learning
Das, Nilaksh
Chau, Duen Horng
INTERSPEECH 2022, 2022, : 3839 - 3843
[8] Speech Emotion Recognition using Decomposed Speech via Multi-task Learning
Hsu, Jia-Hao
Wu, Chung-Hsien
Wei, Yu-Hung
INTERSPEECH 2023, 2023, : 4553 - 4557
[9] TASK AWARE MULTI-TASK LEARNING FOR SPEECH TO TEXT TASKS
Indurthi, Sathish
Zaidi, Mohd Abbas
Lakumarapu, Nikhil Kumar
Lee, Beomseok
Han, Hyojung
Ahn, Seokchan
Kim, Sangha
Kim, Chanwoo
Hwang, Inchul
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7723 - 7727
[10] Speech Emotion Recognition in the Wild using Multi-task and Adversarial Learning
Parry, Jack
DeMattos, Eric
Klementiev, Anita
Ind, Axel
Morse-Kopp, Daniela
Clarke, Georgia
Palaz, Dimitri
INTERSPEECH 2022, 2022, : 1158 - 1162

← 1 2 3 4 5 →