IMPROVING SPEAKER RECOGNITION PERFORMANCE IN THE DOMAIN ADAPTATION CHALLENGE USING DEEP NEURAL NETWORKS

被引:0
|
作者
Garcia-Romero, Daniel [1 ]
Zhang, Xiaohui
McCree, Alan
Povey, Daniel
机构
[1] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
来源
2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014 | 2014年
关键词
Unsupervised adaptation; speaker recognition; i-vectors; deep neural networks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional i-vector speaker recognition systems use a Gaussian mixture model (GMM) to collect sufficient statistics (SS). Recently, replacing this GMM with a deep neural network (DNN) has shown promising results. In this paper, we explore the use of DNNs to collect SS for the unsupervised domain adaptation task of the Domain Adaptation Challenge (DAC). We show that collecting SS with a DNN trained on out-of-domain data boosts the speaker recognition performance of an out-of-domain system by more than 25%. Moreover, we integrate the DNN in an unsupervised adaptation framework, that uses agglomerative hierarchical clustering with a stopping criterion based on unsupervised calibration, and show that the initial gains of the out-of-domain system carry over to the final adapted system. Despite the fact that the DNN is trained on the out-of-domain data, the final adapted system produces a relative improvement of more than 30% with respect to the best published results on this task.
引用
收藏
页码:378 / 383
页数:6
相关论文
共 50 条
  • [21] Improving training datasets for resource-constrained speaker recognition neural networks
    Bousquet, Pierre-Michel
    Rouvier, Mickael
    INTERSPEECH 2023, 2023, : 3167 - 3171
  • [22] A Simple Unsupervised Knowledge-Free Domain Adaptation for Speaker Recognition
    Lin, Wan
    Li, Lantian
    Wang, Dong
    APPLIED SCIENCES-BASEL, 2024, 14 (03):
  • [23] SUPERVISED DOMAIN ADAPTATION FOR I-VECTOR BASED SPEAKER RECOGNITION
    Garcia-Romero, Daniel
    McCree, Alan
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [24] DOMAIN ADAPTATION FOR SPEAKER RECOGNITION IN SINGING AND SPOKEN VOICE
    Chowdhury, Anurag
    Cozzo, Austin
    Ross, Arun
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7192 - 7196
  • [25] I-VECTOR-BASED SPEAKER ADAPTATION OF DEEP NEURAL NETWORKS FOR FRENCH BROADCAST AUDIO TRANSCRIPTION
    Gupta, Vishwa
    Kenny, Patrick
    Ouellet, Pierre
    Stafylakis, Themos
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [26] SPEAKER ADAPTIVE TRAINING IN DEEP NEURAL NETWORKS USING SPEAKER DEPENDENT BOTTLENECK FEATURES
    Doddipatla, Rama
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5290 - 5294
  • [27] UNSUPERVISED DOMAIN ADAPTATION VIA DOMAIN ADVERSARIAL TRAINING FOR SPEAKER RECOGNITION
    Wang, Qing
    Rao, Wei
    Sun, Sining
    Xie, Lei
    Chng, Eng Siong
    Li, Haizhou
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4889 - 4893
  • [28] JOINT SPEAKER DIARIZATION AND RECOGNITION USING CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS
    Zhou, Zhihan
    Zhang, Yichi
    Duan, Zhiyao
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2496 - 2500
  • [29] Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech
    Simic, Nikola
    Suzic, Sinisa
    Nosek, Tijana
    Vujovic, Mia
    Peric, Zoran
    Savic, Milan
    Delic, Vlado
    ENTROPY, 2022, 24 (03)
  • [30] An Artificial Neural Networks Model by Using Wavelet Analysis for Speaker Recognition
    Returi, Kanaka Durga
    Radhika, Y.
    INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS, VOL 2, 2015, 340 : 859 - 874