IMPROVING SPEAKER RECOGNITION PERFORMANCE IN THE DOMAIN ADAPTATION CHALLENGE USING DEEP NEURAL NETWORKS

被引:0
|
作者
Garcia-Romero, Daniel [1 ]
Zhang, Xiaohui
McCree, Alan
Povey, Daniel
机构
[1] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
来源
2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014 | 2014年
关键词
Unsupervised adaptation; speaker recognition; i-vectors; deep neural networks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional i-vector speaker recognition systems use a Gaussian mixture model (GMM) to collect sufficient statistics (SS). Recently, replacing this GMM with a deep neural network (DNN) has shown promising results. In this paper, we explore the use of DNNs to collect SS for the unsupervised domain adaptation task of the Domain Adaptation Challenge (DAC). We show that collecting SS with a DNN trained on out-of-domain data boosts the speaker recognition performance of an out-of-domain system by more than 25%. Moreover, we integrate the DNN in an unsupervised adaptation framework, that uses agglomerative hierarchical clustering with a stopping criterion based on unsupervised calibration, and show that the initial gains of the out-of-domain system carry over to the final adapted system. Despite the fact that the DNN is trained on the out-of-domain data, the final adapted system produces a relative improvement of more than 30% with respect to the best published results on this task.
引用
收藏
页码:378 / 383
页数:6
相关论文
共 50 条
  • [1] Insights into Deep Neural Networks for Speaker Recognition
    Garcia-Romero, Daniel
    McCree, Alan
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1141 - 1145
  • [2] Contrastive Adversarial Domain Adaptation Networks for Speaker Recognition
    Li, Longxin
    Mak, Man-Wai
    Chien, Jen-Tzung
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (05) : 2236 - 2245
  • [3] SPEAKER ADAPTATION OF CONTEXT DEPENDENT DEEP NEURAL NETWORKS
    Liao, Hank
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7947 - 7951
  • [4] Improving Deep Neural Networks Based Speaker Verification Using Unlabeled Data
    Tian, Yao
    Cai, Meng
    He, Liang
    Zhang, Wei-Qiang
    Liu, Jia
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1863 - 1867
  • [5] DOMAIN ADAPTATION OF DEEP NEURAL NETWORKS FOR AUTOMATIC SPEECH RECOGNITION VIA WIRELESS SENSORS
    Gosztolya, Gabor
    Grosz, Tamas
    JOURNAL OF ELECTRICAL ENGINEERING-ELEKTROTECHNICKY CASOPIS, 2016, 67 (02): : 124 - 130
  • [6] Speaker2Vec: Unsupervised Learning and Adaptation of a Speaker Manifold using Deep Neural Networks with an Evaluation on Speaker Segmentation
    Jati, Arindam
    Georgiou, Panayiotis
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3567 - 3571
  • [7] MODELLING SPEAKER AND CHANNEL VARIABILITY USING DEEP NEURAL NETWORKS FOR ROBUST SPEAKER VERIFICATION
    Bhattacharya, Gautam
    Alam, Jahangir
    Kenny, Patrick
    Gupta, Vishwa
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 192 - 198
  • [8] OliVaR: Improving olive variety recognition using deep neural networks
    Miho, Hristofor
    Pagnotta, Giulio
    De Gaspari, Fabio
    Hitaj, Dorjan
    Mancini, Luigi Vincenzo
    Koubouris, Georgios
    Godino, Gianluca
    Hakan, Mehmet
    Diez, Concepcion Munoz
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2024, 216
  • [9] Approaches for Out-of-Domain Adaptation to Improve Speaker Recognition Performance
    Shulipa, Andrey
    Novoselov, Sergey
    Melnikov, Aleksandr
    SPEECH AND COMPUTER, 2016, 9811 : 124 - 130
  • [10] Application of Convolutional Neural Networks to Speaker Recognition in Noisy Conditions
    McLaren, Mitchell
    Lei, Yun
    Scheffer, Nicolas
    Ferrer, Luciana
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 686 - 690