A Multi-task Framework of Speaker Recognition with TTS Data Augmentation

被引:0
作者
Xie, Xingjia [1 ]
Zhi, Yiming [2 ]
Ouyang, Beibei [1 ]
Hong, Qingyang [2 ]
Li, Lin [1 ]
机构
[1] Xiamen Univ, Sch Elect Sci & Engn, Xiamen, Peoples R China
[2] Xiamen Univ, Sch Informat, Xiamen, Peoples R China
来源
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2022年
基金
中国国家自然科学基金;
关键词
speaker recognition; data augmentation; multi-task; text-to-speech;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning usually requires a lot of data, but we often have difficulties in collecting enough training data in many fields. In some limited resource application scenarios, data augmentation often plays a key role. A common method is adding background noise to the speech or changing the speed of speech to increase the number of utterances in the training dataset. For the training of ASV model, we propose a method to augment the training dataset through synthesizing large amounts of speech by VAE-based speech synthesis model, and we mitigate the problem of anti-spoofing detection performance degradation caused by the introduction of synthesized speech through the multi-task framework. Experiments on AISHELL-1, AISHELL-3 and ASVspoof2019LA databases show that our proposed method can improve the robustness of the speaker recognition model while also improving the anti-spoofing ability of the model.
引用
收藏
页码:210 / 215
页数:6
相关论文
共 26 条
  • [1] Bu H., 2017, 2017 20 C OR CHAPT
  • [2] Chen NX, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P185
  • [3] Front-End Factor Analysis for Speaker Verification
    Dehak, Najim
    Kenny, Patrick J.
    Dehak, Reda
    Dumouchel, Pierre
    Ouellet, Pierre
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 788 - 798
  • [4] SYNAUG: SYNTHESIS-BASED DATA AUGMENTATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    Du, Chenpeng
    Han, Bing
    Wang, Shuai
    Qian, Yanmin
    Yu, Kai
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5844 - 5848
  • [5] Eom Y, 2022, Arxiv, DOI arXiv:2204.01387
  • [6] UNIT SELECTION SYNTHESIS BASED DATA AUGMENTATION FOR FIXED PHRASE SPEAKER VERIFICATION
    Huang, Houjun
    Xiang, Xu
    Zhao, Fei
    Wang, Shuai
    Qian, Yanmin
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5849 - 5853
  • [7] AASIST: AUDIO ANTI-SPOOFING USING INTEGRATED SPECTRO-TEMPORAL GRAPH ATTENTION NETWORKS
    Jung, Jee-weon
    Heo, Hee-Soo
    Tak, Hemlata
    Shim, Hye-jin
    Chung, Joon Son
    Lee, Bong-Jin
    Yu, Ha-Jin
    Evans, Nicholas
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6367 - 6371
  • [8] Jung JW, 2022, Arxiv, DOI arXiv:2201.10283
  • [9] Joint Decision of Anti-Spoofing and Automatic Speaker Verification by Multi-Task Learning With Contrastive Loss
    Li, Jiakang
    Sun, Meng
    Zhang, Xiongwei
    Wang, Yimin
    [J]. IEEE ACCESS, 2020, 8 : 7907 - 7915
  • [10] Liu WY, 2016, PR MACH LEARN RES, V48