Disentangled Speaker and Nuisance Attribute Embedding for Robust Speaker Verification

被引：11

作者：

Kang, Woo Hyun ^{[2
]}

Mun, Sung Hwan ^{[2
]}

Han, Min Hyun ^{[2
]}

Kim, Nam Soo ^{[1
,2
]}

机构：

[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea

[2] Seoul Natl Univ, Seoul, South Korea

来源：

IEEE ACCESS | 2020年 / 8卷

关键词：

Training; Robustness; Performance evaluation; Law enforcement; Machine learning; Task analysis; Licenses; Speech embedding; speaker verification; domain disentanglement; deep learning; RECOGNITION;

D O I：

10.1109/ACCESS.2020.3012893

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based methods are known to suffer from severe performance degradation when dealing with speech samples with different conditions (e.g., recording devices, emotional states). In this paper, we propose a novel fully supervised training method for extracting a speaker embedding vector disentangled from the variability caused by the nuisance attributes. The proposed framework was compared with the conventional deep learning-based embedding methods using the RSR2015 and VoxCeleb1 dataset. Experimental results show that the proposed approach can extract speaker embeddings robust to channel and emotional variability.

引用

页码：141838 / 141849

页数：12

共 40 条

[1] Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
Albanie, Samuel
Nagrani, Arsha
Vedaldi, Andrea
Zisserman, Andrew
[J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 292 - 301
[2] [Anonymous], 2016, TENSORFLOW LARGE SCA
[3] Arjovsky M., 2017, INT C LEARNING REPRE
[4] Multitask learning
Caruana, R
[J]. MACHINE LEARNING, 1997, 28 (01) : 41 - 75
[5] Chen L., 2013, P CCBR, P394
[6] Chowdhury FARR, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5359, DOI 10.1109/ICASSP.2018.8461587
[7] Front-End Factor Analysis for Speaker Verification
Dehak, Najim
Kenny, Patrick J.
Dehak, Reda
Dumouchel, Pierre
Ouellet, Pierre
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 788 - 798
[8] Fang X, 2019, INT CONF ACOUST SPEE, P6221, DOI 10.1109/ICASSP.2019.8682327
[9] Ganin Y, 2016, J MACH LEARN RES, V17
[10] Ganin Y, 2015, PR MACH LEARN RES, V37, P1180

← 1 2 3 4 →