A Multi-task Framework of Speaker Recognition with TTS Data Augmentation

被引：0

作者：

Xie, Xingjia ^{[1
]}

Zhi, Yiming ^{[2
]}

Ouyang, Beibei ^{[1
]}

Hong, Qingyang ^{[2
]}

Li, Lin ^{[1
]}

机构：

[1] Xiamen Univ, Sch Elect Sci & Engn, Xiamen, Peoples R China

[2] Xiamen Univ, Sch Informat, Xiamen, Peoples R China

来源：

PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2022年

基金：

中国国家自然科学基金;

关键词：

speaker recognition; data augmentation; multi-task; text-to-speech;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning usually requires a lot of data, but we often have difficulties in collecting enough training data in many fields. In some limited resource application scenarios, data augmentation often plays a key role. A common method is adding background noise to the speech or changing the speed of speech to increase the number of utterances in the training dataset. For the training of ASV model, we propose a method to augment the training dataset through synthesizing large amounts of speech by VAE-based speech synthesis model, and we mitigate the problem of anti-spoofing detection performance degradation caused by the introduction of synthesized speech through the multi-task framework. Experiments on AISHELL-1, AISHELL-3 and ASVspoof2019LA databases show that our proposed method can improve the robustness of the speaker recognition model while also improving the anti-spoofing ability of the model.

引用

页码：210 / 215

页数：6

共 26 条

[1] Bu H., 2017, 2017 20 C OR CHAPT
[2] Chen NX, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P185
[3] Front-End Factor Analysis for Speaker Verification
Dehak, Najim
Kenny, Patrick J.
Dehak, Reda
Dumouchel, Pierre
Ouellet, Pierre
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 788 - 798
[4] SYNAUG: SYNTHESIS-BASED DATA AUGMENTATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION
Du, Chenpeng
Han, Bing
Wang, Shuai
Qian, Yanmin
Yu, Kai
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5844 - 5848
[5] Eom Y, 2022, Arxiv, DOI arXiv:2204.01387
[6] UNIT SELECTION SYNTHESIS BASED DATA AUGMENTATION FOR FIXED PHRASE SPEAKER VERIFICATION
Huang, Houjun
Xiang, Xu
Zhao, Fei
Wang, Shuai
Qian, Yanmin
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5849 - 5853
[7] AASIST: AUDIO ANTI-SPOOFING USING INTEGRATED SPECTRO-TEMPORAL GRAPH ATTENTION NETWORKS
Jung, Jee-weon
Heo, Hee-Soo
Tak, Hemlata
Shim, Hye-jin
Chung, Joon Son
Lee, Bong-Jin
Yu, Ha-Jin
Evans, Nicholas
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6367 - 6371
[8] Jung JW, 2022, Arxiv, DOI arXiv:2201.10283
[9] Joint Decision of Anti-Spoofing and Automatic Speaker Verification by Multi-Task Learning With Contrastive Loss
Li, Jiakang
Sun, Meng
Zhang, Xiongwei
Wang, Yimin
[J]. IEEE ACCESS, 2020, 8 : 7907 - 7915
[10] Liu WY, 2016, PR MACH LEARN RES, V48

← 1 2 3 →