Jointing Multi-task Learning and Gradient Reversal Layer for Far-Field Speaker Verification

被引:1
|
作者
Xu, Wei [1 ]
Wang, Xinghao [1 ]
Wan, Hao [1 ,2 ]
Guo, Xin [3 ]
Zhao, Junhong [1 ]
Deng, Feiqi [1 ]
Kang, Wenxiong [1 ]
机构
[1] South China Univ Technol, Sch Automat Sci & Engn, Guangzhou 510641, Peoples R China
[2] Guangdong Baiyun Airport Informat Technol Co Ltd, Postdoctoral Innovat Base, Guangzhou, Peoples R China
[3] Guangdong Commun Polytech, Guangzhou, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Far-field speaker verification; Multi-task learning; Gradient reversal layer; Dynamic loss weights strategy;
D O I
10.1007/978-3-030-86608-2_49
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Far-field speaker verification is challenging, because of interferences caused by different distances between the speaker and the recorder. In this paper, a distance discriminator, which determines whether two utterances are recorded at the same distance, is used as an auxiliary task to learn distance discrepancy information. There are two identical auxiliary tasks, one is added before the speaker embedding layer to learn distance discrepancy information via multi-task learning, and then the other is added after that layer to suppress the learned discrepancy via a gradient reversal layer. In addition, to avoid conflicts among the optimization directions of all tasks, the loss weight of every task is updated dynamically during training. Experiments on AISHELL Wake-up show a relatively 7% and 10.3% reduction of equal error rate (EER) on far-far speaker verification and near-far speaker verification respectively, compared with the single-task model, demonstrating the effectiveness of the proposed method.
引用
收藏
页码:449 / 457
页数:9
相关论文
共 50 条
  • [1] Multi-task deep cross-attention networks for far-field speaker verification and keyword spotting
    Xingwei Liang
    Zehua Zhang
    Ruifeng Xu
    EURASIP Journal on Audio, Speech, and Music Processing, 2023
  • [2] Multi-task deep cross-attention networks for far-field speaker verification and keyword spotting
    Liang, Xingwei
    Zhang, Zehua
    Xu, Ruifeng
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
  • [3] Multi-task Discriminative Training of Hybrid DNN-TVM Model for Speaker Verification with Noisy and Far-Field Speech
    Jati, Arindam
    Peri, Raghuveer
    Pal, Monisankha
    Park, Tae Jin
    Kumar, Naveen
    Travadi, Ruchir
    Georgiou, Panayiotis
    Narayanan, Shrikanth
    INTERSPEECH 2019, 2019, : 2463 - 2467
  • [4] Noise processing and multi-task learning for far-field dialect classification
    Wang, Hai
    Qin, Chenguang
    Zhang, Kan
    Gao, Ling
    Ren, Jie
    2020 EIGHTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD 2020), 2020, : 143 - 148
  • [5] Multi-task Learning over Mixup Variants for the Speaker Verification Task
    Fathan, Abderrahim
    Alam, Jahangir
    Zhu, Xiaolin
    SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 446 - 460
  • [6] Speaker Verification for Multi-Task Interactions
    Cai, Yang
    Li, Xiaoyu
    Gong, Zhenjiang
    Codina, Tania Ros
    INTERACTING WITH COMPUTERS, 2014, 26 (02) : 135 - 144
  • [7] MULTI-TASK LEARNING FOR SPEAKER VERIFICATION AND VOICE TRIGGER DETECTION
    Sigtia, Siddharth
    Marchi, Erik
    Kajarekar, Sachin
    Naik, Devang
    Bridle, John
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6844 - 6848
  • [8] Multi-Task Learning for Text-dependent Speaker Verification
    Chen, Nanxin
    Qian, Yanmin
    Yu, Kai
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 185 - 189
  • [9] Multi-Level Transfer Learning from Near-Field to Far-Field Speaker Verification
    Zhang, Li
    Wang, Qing
    Lee, Kong Aik
    Xie, Lei
    Li, Haizhou
    INTERSPEECH 2021, 2021, : 1094 - 1098
  • [10] MULTISV: DATASET FOR FAR-FIELD MULTI-CHANNEL SPEAKER VERIFICATION
    Mosner, Ladislav
    Plchot, Oldrich
    Burget, Lukas
    Cernocky, Jan ''Honza''
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7977 - 7981