CAM: CONTEXT-AWARE MASKING FOR ROBUST SPEAKER VERIFICATION

被引:14
作者
Yu, Ya-Qi [1 ]
Zheng, Siqi [2 ]
Suo, Hongbin [2 ]
Lei, Yun [2 ]
Li, Wu-Jun [1 ]
机构
[1] Nanjing Univ, Dept Comp Sci & Technol, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Alibaba Grp, Speech Lab, Hangzhou, Peoples R China
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
Speaker verification; speech enhancement; context embedding; context-aware masking; FEATURE ENHANCEMENT;
D O I
10.1109/ICASSP39728.2021.9414704
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Performance degradation caused by noise has been a long-standing challenge for speaker verification. Previous methods usually involve applying a denoising transformation to speaker embeddings or enhancing input features. Nevertheless, these methods are lossy and inefficient for speaker embedding. In this paper, we propose context-aware masking (CAM), a novel method to extract robust speaker embedding. CAM enables the speaker embedding network to "focus" on the speaker of interest and "blur" unrelated noise. The threshold of masking is dynamically controlled by an auxiliary context embedding that captures speaker and noise characteristics. Moreover, models adopting CAM can be trained in an end-to-end manner without using synthesized noisy-clean speech pairs. Our results show that CAM improves speaker verification performance in the wild by a large margin, compared to the baselines.
引用
收藏
页码:6703 / 6707
页数:5
相关论文
共 50 条
[21]   A DISCRIMINATIVE CONDITION-AWARE BACKEND FOR SPEAKER VERIFICATION [J].
Ferrer, Luciana ;
McLaren, Mitchell .
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, :6604-6608
[22]   A Joint Noise Disentanglement and Adversarial Training Framework for Robust Speaker Verification [J].
Xing, Xujiang ;
Xu, Mingxing ;
Zheng, Thomas Fang .
INTERSPEECH 2024, 2024, :707-711
[23]   A Robust Speaker-Adaptive and Text-Prompted Speaker Verification System [J].
Hong, Qingyang ;
Wang, Sheng ;
Liu, Zhijian .
BIOMETRIC RECOGNITION (CCBR 2014), 2014, 8833 :385-393
[24]   A robust speaker-adaptive and text-prompted speaker verification system [J].
Hong, Qingyang, 1600, Springer Verlag (8833) :385-393
[25]   Modified Segmental Histogram Equalization for robust speaker verification [J].
Skosan, M ;
Mashao, D .
PATTERN RECOGNITION LETTERS, 2006, 27 (05) :479-486
[26]   DNN FEATURE COMPENSATION FOR NOISE ROBUST SPEAKER VERIFICATION [J].
Du, Steven ;
Xiao, Xiong ;
Chng, Eng Siong .
2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, :871-875
[27]   Robust Speaker Verification using Self Organizing Map [J].
Das, Pranab ;
Bhatacharjee, Utpal .
2014 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT, 2014,
[28]   Robust Session Variability Compensation for SVM Speaker Verification [J].
Seo, Hyunson ;
Jung, Chi-Sang ;
Kang, Hong-Goo .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (06) :1631-1641
[29]   Improved Multitaper PNCC Feature for Robust Speaker Verification [J].
Liu, Yi ;
He, Liang ;
Liu, Jia .
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, :168-172
[30]   Gradient Regularization for Noise-Robust Speaker Verification [J].
Li, Jianchen ;
Han, Jiqing ;
Song, Hongwei .
INTERSPEECH 2021, 2021, :1074-1078