CAM: CONTEXT-AWARE MASKING FOR ROBUST SPEAKER VERIFICATION

被引：14

作者：

Yu, Ya-Qi ^{[1
]}

Zheng, Siqi ^{[2
]}

Suo, Hongbin ^{[2
]}

Lei, Yun ^{[2
]}

Li, Wu-Jun ^{[1
]}

机构：

[1] Nanjing Univ, Dept Comp Sci & Technol, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China

[2] Alibaba Grp, Speech Lab, Hangzhou, Peoples R China

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

Speaker verification; speech enhancement; context embedding; context-aware masking; FEATURE ENHANCEMENT;

D O I：

10.1109/ICASSP39728.2021.9414704

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Performance degradation caused by noise has been a long-standing challenge for speaker verification. Previous methods usually involve applying a denoising transformation to speaker embeddings or enhancing input features. Nevertheless, these methods are lossy and inefficient for speaker embedding. In this paper, we propose context-aware masking (CAM), a novel method to extract robust speaker embedding. CAM enables the speaker embedding network to "focus" on the speaker of interest and "blur" unrelated noise. The threshold of masking is dynamically controlled by an auxiliary context embedding that captures speaker and noise characteristics. Moreover, models adopting CAM can be trained in an end-to-end manner without using synthesized noisy-clean speech pairs. Our results show that CAM improves speaker verification performance in the wild by a large margin, compared to the baselines.

引用

页码：6703 / 6707

页数：5

共 50 条

[31] Refining Cosine Distance Features for Robust Speaker Verification [J].

Balasingam, M. D. ;

Kumar, C. Santhosh .

PROCEEDINGS OF THE 2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), 2018, :152-155

[32] Gradient Regularization for Noise-Robust Speaker Verification [J].

Li, Jianchen ;

Han, Jiqing ;

Song, Hongwei .

INTERSPEECH 2021, 2021, :1074-1078

[33] Senone I-Vectors for Robust Speaker Verification [J].

Tan, Zhili ;

Zhu, Yingke ;

Mak, Man-Wai ;

Mak, Brian Kan-Wing .

2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,

[34] Robust Training for Speaker Verification against Noisy Labels [J].

Fang, Zhihua ;

He, Liang ;

Ma, Hanhan ;

Guo, Xiaochen ;

Li, Lin .

INTERSPEECH 2023, 2023, :3192-3196

[35] A speaker verification backend with robust performance across conditions [J].

Ferrer, Luciana ;

McLaren, Mitchell ;

Brummer, Niko .

COMPUTER SPEECH AND LANGUAGE, 2022, 71

[36] Noise Robust Speaker Verification with Delta Cepstrum Normalization [J].

Kanda, Naoyuki ;

Takeda, Ryu ;

Obuchi, Yasunari .

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, :3111-3115

[37] A Fused Speech Enhancement Framework for Robust Speaker Verification [J].

Wu, Yanfeng ;

Li, Taihao ;

Zhao, Junan ;

Wang, Qirui ;

Xu, Jing .

IEEE SIGNAL PROCESSING LETTERS, 2023, 30 :883-887

[38] A context-dependent sequential decision for speaker verification [J].

Noda, H ;

Harada, K ;

Kawaguchi, E .

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1999, E82D (10) :1433-1436

[39] Deep Noise-Aware Quality Loss for Speaker Verification [J].

Chantangphol, Pantid ;

Sakdejayont, Theerat ;

Lertsutthiwong, Monchai ;

Chalothorn, Tawunrat .

PROCEEDINGS OF THE 33RD ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2024, 2024, :3669-3673

[40] VSASV: a Vietnamese Dataset for Spoofing-Aware Speaker Verification [J].

Vu Hoang ;

Viet Thanh Pham ;

Hoa Nguyen Xuan ;

Nhi Pham ;

Phuong Dat ;

Thi Thu Trang Nguyen .

INTERSPEECH 2024, 2024, :4288-4292

← 1 2 3 4 5 →