Disentangled Speaker and Nuisance Attribute Embedding for Robust Speaker Verification

被引:11
作者
Kang, Woo Hyun [2 ]
Mun, Sung Hwan [2 ]
Han, Min Hyun [2 ]
Kim, Nam Soo [1 ,2 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea
[2] Seoul Natl Univ, Seoul, South Korea
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Training; Robustness; Performance evaluation; Law enforcement; Machine learning; Task analysis; Licenses; Speech embedding; speaker verification; domain disentanglement; deep learning; RECOGNITION;
D O I
10.1109/ACCESS.2020.3012893
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based methods are known to suffer from severe performance degradation when dealing with speech samples with different conditions (e.g., recording devices, emotional states). In this paper, we propose a novel fully supervised training method for extracting a speaker embedding vector disentangled from the variability caused by the nuisance attributes. The proposed framework was compared with the conventional deep learning-based embedding methods using the RSR2015 and VoxCeleb1 dataset. Experimental results show that the proposed approach can extract speaker embeddings robust to channel and emotional variability.
引用
收藏
页码:141838 / 141849
页数:12
相关论文
共 40 条
  • [1] Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
    Albanie, Samuel
    Nagrani, Arsha
    Vedaldi, Andrea
    Zisserman, Andrew
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 292 - 301
  • [2] [Anonymous], 2016, TENSORFLOW LARGE SCA
  • [3] Arjovsky M., 2017, INT C LEARNING REPRE
  • [4] Multitask learning
    Caruana, R
    [J]. MACHINE LEARNING, 1997, 28 (01) : 41 - 75
  • [5] Chen L., 2013, P CCBR, P394
  • [6] Chowdhury FARR, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5359, DOI 10.1109/ICASSP.2018.8461587
  • [7] Front-End Factor Analysis for Speaker Verification
    Dehak, Najim
    Kenny, Patrick J.
    Dehak, Reda
    Dumouchel, Pierre
    Ouellet, Pierre
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 788 - 798
  • [8] Fang X, 2019, INT CONF ACOUST SPEE, P6221, DOI 10.1109/ICASSP.2019.8682327
  • [9] Ganin Y, 2016, J MACH LEARN RES, V17
  • [10] Ganin Y, 2015, PR MACH LEARN RES, V37, P1180