IMPROVING SPEAKER RECOGNITION PERFORMANCE IN THE DOMAIN ADAPTATION CHALLENGE USING DEEP NEURAL NETWORKS

被引:0
|
作者
Garcia-Romero, Daniel [1 ]
Zhang, Xiaohui
McCree, Alan
Povey, Daniel
机构
[1] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
来源
2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014 | 2014年
关键词
Unsupervised adaptation; speaker recognition; i-vectors; deep neural networks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional i-vector speaker recognition systems use a Gaussian mixture model (GMM) to collect sufficient statistics (SS). Recently, replacing this GMM with a deep neural network (DNN) has shown promising results. In this paper, we explore the use of DNNs to collect SS for the unsupervised domain adaptation task of the Domain Adaptation Challenge (DAC). We show that collecting SS with a DNN trained on out-of-domain data boosts the speaker recognition performance of an out-of-domain system by more than 25%. Moreover, we integrate the DNN in an unsupervised adaptation framework, that uses agglomerative hierarchical clustering with a stopping criterion based on unsupervised calibration, and show that the initial gains of the out-of-domain system carry over to the final adapted system. Despite the fact that the DNN is trained on the out-of-domain data, the final adapted system produces a relative improvement of more than 30% with respect to the best published results on this task.
引用
收藏
页码:378 / 383
页数:6
相关论文
共 50 条
  • [31] Towards improving the performance of speaker recognition systems
    Johnson, Neethu
    George, Kuruvachan K.
    Kumar, Santhosh C.
    Raj, Reghu P. C.
    2014 FIRST INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS AND COMMUNICATIONS (ICCSC), 2014, : 38 - 41
  • [32] LOW-RESOURCE DOMAIN ADAPTATION FOR SPEAKER RECOGNITION USING CYCLE-GANS
    Nidadavolu, Phani Sankar
    Kataria, Saurabh
    Villalba, Jesus
    Dehak, Najim
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 710 - 717
  • [33] IMPROVED SPEAKER INDEPENDENT LIP READING USING SPEAKER ADAPTIVE TRAINING AND DEEP NEURAL NETWORKS
    Almajai, Ibrahim
    Cox, Stephen
    Harvey, Richard
    Lan, Yuxuan
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2722 - 2726
  • [34] Speaker-dependent Multipitch Tracking Using Deep Neural Networks
    Liu, Yuzhou
    Wang, DeLiang
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3279 - 3283
  • [35] Improving Deep Neural Networks Using Softplus Units
    Zheng, Hao
    Yang, Zhanlei
    Liu, Wenju
    Liang, Jizhong
    Li, Yanpeng
    2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [36] RECOGNITION OF ACOUSTIC EVENTS USING DEEP NEURAL NETWORKS
    Gencoglu, Oguzhan
    Virtanen, Tuomas
    Huttunen, Heikki
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 506 - 510
  • [37] Hierarchical Bayesian combination of plug-in maximum a posteriori decoders in deep neural networks-based speech recognition and speaker adaptation
    Huang, Zhen
    Siniscalchi, Sabato Marco
    Lee, Chin-Hui
    PATTERN RECOGNITION LETTERS, 2017, 98 : 1 - 7
  • [38] Unconstrained ear recognition using deep neural networks
    Dodge, Samuel
    Mounsef, Jinane
    Karam, Lina
    IET BIOMETRICS, 2018, 7 (03) : 207 - 214
  • [39] Deep Neural Network Approaches to Speaker and Language Recognition
    Richardson, Fred
    Reynolds, Douglas
    Dehak, Najim
    IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (10) : 1671 - 1675
  • [40] A Unified Deep Neural Network for Speaker and Language Recognition
    Richardson, Fred
    Reynolds, Doug
    Dehak, Najim
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1146 - 1150