IMPROVING SPEAKER RECOGNITION PERFORMANCE IN THE DOMAIN ADAPTATION CHALLENGE USING DEEP NEURAL NETWORKS

被引：0

作者：

Garcia-Romero, Daniel ^{[1
]}

Zhang, Xiaohui

McCree, Alan

Povey, Daniel

机构：

[1] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA

来源：

2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014 | 2014年

关键词：

Unsupervised adaptation; speaker recognition; i-vectors; deep neural networks;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Traditional i-vector speaker recognition systems use a Gaussian mixture model (GMM) to collect sufficient statistics (SS). Recently, replacing this GMM with a deep neural network (DNN) has shown promising results. In this paper, we explore the use of DNNs to collect SS for the unsupervised domain adaptation task of the Domain Adaptation Challenge (DAC). We show that collecting SS with a DNN trained on out-of-domain data boosts the speaker recognition performance of an out-of-domain system by more than 25%. Moreover, we integrate the DNN in an unsupervised adaptation framework, that uses agglomerative hierarchical clustering with a stopping criterion based on unsupervised calibration, and show that the initial gains of the out-of-domain system carry over to the final adapted system. Despite the fact that the DNN is trained on the out-of-domain data, the final adapted system produces a relative improvement of more than 30% with respect to the best published results on this task.

引用

页码：378 / 383

页数：6

共 50 条

[31] Towards improving the performance of speaker recognition systems
Johnson, Neethu
George, Kuruvachan K.
Kumar, Santhosh C.
Raj, Reghu P. C.
2014 FIRST INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS AND COMMUNICATIONS (ICCSC), 2014, : 38 - 41
[32] LOW-RESOURCE DOMAIN ADAPTATION FOR SPEAKER RECOGNITION USING CYCLE-GANS
Nidadavolu, Phani Sankar
Kataria, Saurabh
Villalba, Jesus
Dehak, Najim
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 710 - 717
[33] IMPROVED SPEAKER INDEPENDENT LIP READING USING SPEAKER ADAPTIVE TRAINING AND DEEP NEURAL NETWORKS
Almajai, Ibrahim
Cox, Stephen
Harvey, Richard
Lan, Yuxuan
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2722 - 2726
[34] Speaker-dependent Multipitch Tracking Using Deep Neural Networks
Liu, Yuzhou
Wang, DeLiang
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3279 - 3283
[35] Improving Deep Neural Networks Using Softplus Units
Zheng, Hao
Yang, Zhanlei
Liu, Wenju
Liang, Jizhong
Li, Yanpeng
2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
[36] RECOGNITION OF ACOUSTIC EVENTS USING DEEP NEURAL NETWORKS
Gencoglu, Oguzhan
Virtanen, Tuomas
Huttunen, Heikki
2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 506 - 510
[37] Hierarchical Bayesian combination of plug-in maximum a posteriori decoders in deep neural networks-based speech recognition and speaker adaptation
Huang, Zhen
Siniscalchi, Sabato Marco
Lee, Chin-Hui
PATTERN RECOGNITION LETTERS, 2017, 98 : 1 - 7
[38] Unconstrained ear recognition using deep neural networks
Dodge, Samuel
Mounsef, Jinane
Karam, Lina
IET BIOMETRICS, 2018, 7 (03) : 207 - 214
[39] Deep Neural Network Approaches to Speaker and Language Recognition
Richardson, Fred
Reynolds, Douglas
Dehak, Najim
IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (10) : 1671 - 1675
[40] A Unified Deep Neural Network for Speaker and Language Recognition
Richardson, Fred
Reynolds, Doug
Dehak, Najim
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1146 - 1150

← 1 2 3 4 5 →