Deep Noise-Aware Quality Loss for Speaker Verification

被引:0
作者
Chantangphol, Pantid [1 ]
Sakdejayont, Theerat [1 ]
Lertsutthiwong, Monchai [1 ]
Chalothorn, Tawunrat [1 ]
机构
[1] Kasikorn Business Technol Grp, Kasikorn Labs, Nonthaburi, Thailand
来源
PROCEEDINGS OF THE 33RD ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2024 | 2024年
关键词
Speaker verification; End-to-end loss; Noisy speech; ENHANCEMENT;
D O I
10.1145/3627673.3679895
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper addresses the common challenge of system performance degradation due to speech inconsistency and mismatched acoustic conditions across various domains in speaker verification tasks. We propose a Noise-Aware Quality Network designed to estimate a score based on speech quality and the presence of speech obscured by noise in real-world environments. The score, derived from the normalization of estimated speech quality evaluations, is incorporated into a proposed Noise-Aware Quality loss function, aiming to prioritize speech quality by weighting the embedding distances based on the quality score. Our methodology significantly improves speaker verification performance, particularly in noisy environments. Furthermore, our work highlights the importance of speech quality and the potential benefits of incorporating speech quality weight into the loss function for speaker verification tasks.
引用
收藏
页码:3669 / 3673
页数:5
相关论文
共 32 条
[1]  
Bredin Herve, 2023, P INTERSPEECH 2023
[2]  
Chen Yafeng, 2023, CORR, DOI [10.48550/ARXIV.2305, DOI 10.48550/ARXIV.2305.12838ARXIV:2305.12838]
[3]  
Chung J. S., 2018, INTERSPEECH
[4]   ArcFace: Additive Angular Margin Loss for Deep Face Recognition [J].
Deng, Jiankang ;
Guo, Jia ;
Xue, Niannan ;
Zafeiriou, Stefanos .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4685-4694
[5]   ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification [J].
Desplanques, Brecht ;
Thienpondt, Jenthe ;
Demuynck, Kris .
INTERSPEECH 2020, 2020, :3830-3834
[6]   Robust Training for Speaker Verification against Noisy Labels [J].
Fang, Zhihua ;
He, Liang ;
Ma, Hanhan ;
Guo, Xiaochen ;
Li, Lin .
INTERSPEECH 2023, 2023, :3192-3196
[7]   Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System [J].
Gao, Zhifu ;
Song, Yan ;
McLoughlin, Ian ;
Li, Pengcheng ;
Jiang, Yiheng ;
Dai, Lirong .
INTERSPEECH 2019, 2019, :361-365
[8]   End-to-end losses based on speaker basis vectors and all-speaker hard negative mining for speaker verification [J].
Heo, Hee-Soo ;
Jung, Jee-weon ;
Yang, IL-Ho ;
Yoon, Sung-Hyun ;
Shim, Hye-jin ;
Yu, Ha-Jin .
INTERSPEECH 2019, 2019, :4035-4039
[9]   Subjective comparison and evaluation of speech enhancement algorithms [J].
Hu, Yi ;
Loizou, Philipos C. .
SPEECH COMMUNICATION, 2007, 49 (7-8) :588-601
[10]  
International Telecommunication Union ITU, 2008, ITU T REC P 862 3 11