Jointing Multi-task Learning and Gradient Reversal Layer for Far-Field Speaker Verification

被引:1
|
作者
Xu, Wei [1 ]
Wang, Xinghao [1 ]
Wan, Hao [1 ,2 ]
Guo, Xin [3 ]
Zhao, Junhong [1 ]
Deng, Feiqi [1 ]
Kang, Wenxiong [1 ]
机构
[1] South China Univ Technol, Sch Automat Sci & Engn, Guangzhou 510641, Peoples R China
[2] Guangdong Baiyun Airport Informat Technol Co Ltd, Postdoctoral Innovat Base, Guangzhou, Peoples R China
[3] Guangdong Commun Polytech, Guangzhou, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Far-field speaker verification; Multi-task learning; Gradient reversal layer; Dynamic loss weights strategy;
D O I
10.1007/978-3-030-86608-2_49
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Far-field speaker verification is challenging, because of interferences caused by different distances between the speaker and the recorder. In this paper, a distance discriminator, which determines whether two utterances are recorded at the same distance, is used as an auxiliary task to learn distance discrepancy information. There are two identical auxiliary tasks, one is added before the speaker embedding layer to learn distance discrepancy information via multi-task learning, and then the other is added after that layer to suppress the learned discrepancy via a gradient reversal layer. In addition, to avoid conflicts among the optimization directions of all tasks, the loss weight of every task is updated dynamically during training. Experiments on AISHELL Wake-up show a relatively 7% and 10.3% reduction of equal error rate (EER) on far-far speaker verification and near-far speaker verification respectively, compared with the single-task model, demonstrating the effectiveness of the proposed method.
引用
收藏
页码:449 / 457
页数:9
相关论文
共 50 条
  • [31] ADVERSARIAL MULTI-TASK LEARNING FOR SPEAKER NORMALIZATION IN REPLAY DETECTION
    Suthokumar, Gajan
    Sethu, Vidhyasaharan
    Sriskandaraja, Kaavya
    Ambikairajah, Eliathamby
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6609 - 6613
  • [32] Online Multi-Task Learning for Policy Gradient Methods
    Ammar, Haitham Bou
    Eaton, Eric
    Ruvolo, Paul
    Taylor, Matthew E.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 1206 - 1214
  • [33] Robust Multi-Channel Far-Field Speaker Verification Under Different In-Domain Data Availability Scenarios
    Qin, Xiaoyi
    Cai, Danwei
    Li, Ming
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 71 - 85
  • [34] Multi-Channel Far-Field Speaker Verification with Large-Scale Ad-hoc Microphone Arrays
    Liang, Chengdong
    Chen, Yijiang
    Yao, Jiadi
    Zhang, Xiao-Lei
    INTERSPEECH 2022, 2022, : 3679 - 3683
  • [35] Multi-task learning for X-vector based speaker recognition
    Zhang Y.
    Liu L.
    International Journal of Speech Technology, 2023, 26 (04) : 817 - 823
  • [36] Speaker-Aware Multi-Task Learning for Automatic Speech Recognition
    Pironkov, Gueorgui
    Dupont, Stephane
    Dutoit, Thierry
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2900 - 2905
  • [37] Multi-Task Learning for Near/Far Field Channel Estimation in STAR-RIS Networks
    Xiao, Jian
    Wang, Ji
    Wang, Zhaolin
    Wang, Jun
    Xie, Wenwu
    Liu, Yuanwei
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2024, 72 (10) : 6344 - 6359
  • [38] Leveraging speaker attribute information using multi task learning for speaker verification and diarization
    Luu, Chau
    Bell, Peter
    Renals, Steve
    INTERSPEECH 2021, 2021, : 491 - 495
  • [39] Multi-Task Adversarial Network Bottleneck Features for Noise-Robust Speaker Verification
    Yu, Hong
    Hu, Tianrui
    Ma, Zhanyu
    Tan, Zheng-Hua
    Guo, Jun
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2018, : 165 - 169
  • [40] DEVELOPING FAR-FIELD SPEAKER SYSTEM VIA TEACHER-STUDENT LEARNING
    Li, Jinyu
    Zhao, Rui
    Chen, Zhuo
    Liu, Changliang
    Xiao, Xiong
    Ye, Guoli
    Gong, Yifan
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5699 - 5703