Autoencoder based Domain Adaptation for Speaker Recognition under Insufficient Channel Information

被引:26
作者
Shon, Suwon [1 ]
Mun, Seongkyu [2 ]
Kim, Wooil [3 ]
Ko, Hanseok [1 ]
机构
[1] Korea Univ, Sch Elect Engn, Seoul, South Korea
[2] Korea Univ, Dept Visual Informat Proc, Seoul, South Korea
[3] Incheon Natl Univ, Dept Comp Sci & Engn, Incheon, South Korea
来源
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年
基金
新加坡国家研究基金会;
关键词
unsupervised domain adaptation; domain mismatch; speaker recognition; autoencoder; denoising autoencoder;
D O I
10.21437/Interapeech.2017-49
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In real-life conditions, mismatch between development and test domain degrades speaker recognition performance. To solve the issue, many researchers explored domain adaptation approaches using matched in-domain dataset. However, adaptation would be not effective if the dataset is insufficient to estimate channel variability of the domain. In this paper, we explore the problem of performance degradation under such a situation of insufficient channel information. In order to exploit limited in-domain dataset effectively, we propose an unsupervised domain adaptation approach using Autoencoder based Domain Adaptation (AEDA). The proposed approach combines an autoencoder with a denoising autoencoder to adapt resource-rich development dataset to test domain. The proposed technique is evaluated on the Domain Adaptation Challenge 13 experimental protocols that is widely used in speaker recognition for domain mismatched condition. The results show significant improvements over baselines and results from other prior studies.
引用
收藏
页码:1014 / 1018
页数:5
相关论文
共 26 条
[1]  
[Anonymous], 2014, P SPEAK LANG REC WOR
[2]  
[Anonymous], 2011, INTERSPEECH
[3]  
[Anonymous], 2014, P OD SPEAK LANG REC
[4]  
[Anonymous], 2012, P 29 INT C MACH LEAR
[5]  
Aronowitz Hagai, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P4002, DOI 10.1109/ICASSP.2014.6854353
[6]   Front-End Factor Analysis for Speaker Verification [J].
Dehak, Najim ;
Kenny, Patrick J. ;
Dehak, Reda ;
Dumouchel, Pierre ;
Ouellet, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798
[7]   Autoencoder-based Unsupervised Domain Adaptation for Speech Emotion Recognition [J].
Deng, Jun ;
Zhang, Zixing ;
Eyben, Florian ;
Schuller, Bjoern .
IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) :1068-1072
[8]   Least angle regression - Rejoinder [J].
Efron, B ;
Hastie, T ;
Johnstone, I ;
Tibshirani, R .
ANNALS OF STATISTICS, 2004, 32 (02) :494-499
[9]  
Garcia-Romero Daniel, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P4047, DOI 10.1109/ICASSP.2014.6854362
[10]  
Glembek O., 2014, IEEE ICASSP, P4060