Disentangled Speaker and Nuisance Attribute Embedding for Robust Speaker Verification

被引：13

作者：

Kang, Woo Hyun ^{[2
]}

Mun, Sung Hwan ^{[2
]}

Han, Min Hyun ^{[2
]}

Kim, Nam Soo ^{[1
,2
]}

机构：

[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea

[2] Seoul Natl Univ, Seoul, South Korea

来源：

IEEE ACCESS | 2020年 / 8卷

关键词：

Training; Robustness; Performance evaluation; Law enforcement; Machine learning; Task analysis; Licenses; Speech embedding; speaker verification; domain disentanglement; deep learning; RECOGNITION;

D O I：

10.1109/ACCESS.2020.3012893

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based methods are known to suffer from severe performance degradation when dealing with speech samples with different conditions (e.g., recording devices, emotional states). In this paper, we propose a novel fully supervised training method for extracting a speaker embedding vector disentangled from the variability caused by the nuisance attributes. The proposed framework was compared with the conventional deep learning-based embedding methods using the RSR2015 and VoxCeleb1 dataset. Experimental results show that the proposed approach can extract speaker embeddings robust to channel and emotional variability.

引用

页码：141838 / 141849

页数：12

共 40 条

[21] Speaker Recognition With Random Digit Strings Using Uncertainty Normalized HMM-Based i-Vectors [J].

Maghsoodi, Nooshin ;

Sameti, Hossein ;

Zeinal, Hossein ;

Stafylakis, Themos .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) :1815-1825

[22]

Meng Z, 2019, INT CONF ACOUST SPEE, P6216, DOI [10.1109/ICASSP.2019.8682488, 10.1109/icassp.2019.8682488]

[23]

Mirza M., 2014, P NIPS, P2672

[24]

Morgen O., 2018, THESIS

[25] VoxCeleb: a large-scale speaker identification dataset [J].

Nagrani, Arsha ;

Chung, Joon Son ;

Zisserman, Andrew .

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :2616-2620

[26]

Nidadavolu PS, 2019, 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), P710, DOI [10.1109/ASRU46091.2019.9003748, 10.1109/asru46091.2019.9003748]

[27]

Ramos D, 2008, INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, P1493

[28] Deep Neural Network Approaches to Speaker and Language Recognition [J].

Richardson, Fred ;

Reynolds, Douglas ;

Dehak, Najim .

IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (10) :1671-1675

[29]

Sak H, 2014, INTERSPEECH, P338

[30] Adversarial Multi-task Learning of Deep Neural Networks for Robust Speech Recognition [J].

Shinohara, Yusuke .

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :2369-2372

← 1 2 3 4 →