Lightweight Embeddings for Speaker Verification

被引:1
作者
Tkachenko, Maxim [1 ]
Yamshinin, Alexander [1 ]
Kotov, Mikhail [1 ]
Nastasenko, Marina [2 ]
机构
[1] ASM Solut LLC, Moscow, Russia
[2] Master Synth LLC, Moscow, Russia
来源
SPEECH AND COMPUTER (SPECOM 2018) | 2018年 / 11096卷
关键词
Hash; Embeddings; Binarization; Neural networks; Speaker verification;
D O I
10.1007/978-3-319-99579-3_70
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents speaker verification (SV) system using deep neural networks with hash representations (binarization) of embeddings. The training procedure is performed on NIST SRE train set, verification is performed on the same corpus with test set. The system architecture is based on deep recurrent layers with attention mechanism. Semi-hard triplets selection is used for the training procedure. The resulting layer of neural network is the tanh function and it makes the hash representation training as end-to-end possible. As a consequence, such a system decreases the embedding memory size in 32x times and increases the system evaluation performance. The equal error rate (EER) is achieved with regard to embeddings without binarization.
引用
收藏
页码:687 / 696
页数:10
相关论文
共 16 条
[1]   HashNet: Deep Learning to Hash by Continuation [J].
Cao, Zhangjie ;
Long, Mingsheng ;
Wang, Jianmin ;
Yu, Philip S. .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5609-5618
[2]  
Cho K., 2014, ARXIV, DOI 10.3115/v1/w14-4012
[3]  
Cumani S, 2013, IEEE INT C AC SPEECH
[4]  
David S., 2016, IEEE SPOK LANG TECHN
[5]   Front-End Factor Analysis for Speaker Verification [J].
Dehak, Najim ;
Kenny, Patrick J. ;
Dehak, Reda ;
Dumouchel, Pierre ;
Ouellet, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798
[6]  
Heigold G, 2016, INT CONF ACOUST SPEE, P5115, DOI 10.1109/ICASSP.2016.7472652
[7]   Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].
Hinton, Geoffrey ;
Deng, Li ;
Yu, Dong ;
Dahl, George E. ;
Mohamed, Abdel-rahman ;
Jaitly, Navdeep ;
Senior, Andrew ;
Vanhoucke, Vincent ;
Patrick Nguyen ;
Sainath, Tara N. ;
Kingsbury, Brian .
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97
[8]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]
[9]  
Jozefowicz R, 2015, PR MACH LEARN RES, V37, P2342
[10]  
Kingma D. P., P 3 INT C LEARN REPR