A Joint Noise Disentanglement and Adversarial Training Framework for Robust Speaker Verification

被引：0

作者：

Xing, Xujiang ^{[1
]}

Xu, Mingxing ^{[2
]}

Zheng, Thomas Fang ^{[2
]}

机构：

[1] Xinjiang Univ, Sch Comp Sci & Technol, Urumqi, Peoples R China

[2] Tsinghua Univ, Beijing, Peoples R China

来源：

INTERSPEECH 2024 | 2024年

关键词：

speaker verification; noise-robust; multi-task; adversarial training; SPEECH ENHANCEMENT; RECOGNITION;

D O I：

10.21437/Interspeech.2024-700

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automatic Speaker Verification (ASV) suffers from performance degradation in noisy conditions. To address this issue, we propose a novel adversarial learning framework that incorporates noise-disentanglement to establish a noise-independent speaker invariant embedding space. Specifically, the disentanglement module includes two encoders for separating speaker related and irrelevant information, respectively. The reconstruction module serves as a regularization term to constrain the noise. A feature-robust loss is also used to supervise the speaker encoder to learn noise-independent speaker embeddings without losing speaker information. In addition, adversarial training is introduced to discourage the speaker encoder from encoding acoustic condition information for achieving a speaker-invariant embedding space. Experiments on VoxCeleb1 indicate that the proposed method improves the performance of the speaker verification system under both clean and noisy conditions.

引用

页码：707 / 711

页数：5

共 28 条

[1]

Cai Danwei, 2023, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P1, DOI 10.1109/ICASSP49357.2023.10096659

[2]

Chen Aochuan, 2023, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P1, DOI 10.1109/ICASSP49357.2023.10097245

[3]

Chung JS, 2018, INTERSPEECH, P1086

[4] ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification [J].

Desplanques, Brecht ;

Thienpondt, Jenthe ;

Demuynck, Kris .

INTERSPEECH 2020, 2020, :3830-3834

[5]

Han Sangwook, 2023, 2023 Fourteenth International Conference on Ubiquitous and Future Networks (ICUFN), P179, DOI 10.1109/ICUFN57995.2023.10201239

[6] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[7] LEARNING DISENTANGLED FEATURE REPRESENTATIONS FOR SPEECH ENHANCEMENT VIA ADVERSARIAL TRAINING [J].

Hou, Nana ;

Xu, Chenglin ;

Chng, Eng Siong ;

Li, Haizhou .

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :666-670

[8]

Jakubec Maros, 2020, 2020 18th International Conference on Emerging eLearning Technologies and Applications (ICETA), P211, DOI 10.1109/ICETA51985.2020.9379245

[9] Extended U-Net for Speaker Verification in Noisy Environments [J].

Kim, Ju-ho ;

Heo, Jungwoo ;

Shim, Hye-jin ;

Yu, Ha-Jin .

INTERSPEECH 2022, 2022, :590-594

[10] Gradient Regularization for Noise-Robust Speaker Verification [J].

Li, Jianchen ;

Han, Jiqing ;

Song, Hongwei .

INTERSPEECH 2021, 2021, :1074-1078

← 1 2 3 →