IMPROVING SPEAKER RECOGNITION PERFORMANCE IN THE DOMAIN ADAPTATION CHALLENGE USING DEEP NEURAL NETWORKS

被引：0

作者：

Garcia-Romero, Daniel ^{[1
]}

Zhang, Xiaohui

McCree, Alan

Povey, Daniel

机构：

[1] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA

来源：

2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014 | 2014年

关键词：

Unsupervised adaptation; speaker recognition; i-vectors; deep neural networks;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Traditional i-vector speaker recognition systems use a Gaussian mixture model (GMM) to collect sufficient statistics (SS). Recently, replacing this GMM with a deep neural network (DNN) has shown promising results. In this paper, we explore the use of DNNs to collect SS for the unsupervised domain adaptation task of the Domain Adaptation Challenge (DAC). We show that collecting SS with a DNN trained on out-of-domain data boosts the speaker recognition performance of an out-of-domain system by more than 25%. Moreover, we integrate the DNN in an unsupervised adaptation framework, that uses agglomerative hierarchical clustering with a stopping criterion based on unsupervised calibration, and show that the initial gains of the out-of-domain system carry over to the final adapted system. Despite the fact that the DNN is trained on the out-of-domain data, the final adapted system produces a relative improvement of more than 30% with respect to the best published results on this task.

引用

页码：378 / 383

页数：6

共 50 条

[21] Improving training datasets for resource-constrained speaker recognition neural networks
Bousquet, Pierre-Michel
Rouvier, Mickael
INTERSPEECH 2023, 2023, : 3167 - 3171
[22] A Simple Unsupervised Knowledge-Free Domain Adaptation for Speaker Recognition
Lin, Wan
Li, Lantian
Wang, Dong
APPLIED SCIENCES-BASEL, 2024, 14 (03):
[23] SUPERVISED DOMAIN ADAPTATION FOR I-VECTOR BASED SPEAKER RECOGNITION
Garcia-Romero, Daniel
McCree, Alan
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[24] DOMAIN ADAPTATION FOR SPEAKER RECOGNITION IN SINGING AND SPOKEN VOICE
Chowdhury, Anurag
Cozzo, Austin
Ross, Arun
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7192 - 7196
[25] I-VECTOR-BASED SPEAKER ADAPTATION OF DEEP NEURAL NETWORKS FOR FRENCH BROADCAST AUDIO TRANSCRIPTION
Gupta, Vishwa
Kenny, Patrick
Ouellet, Pierre
Stafylakis, Themos
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[26] SPEAKER ADAPTIVE TRAINING IN DEEP NEURAL NETWORKS USING SPEAKER DEPENDENT BOTTLENECK FEATURES
Doddipatla, Rama
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5290 - 5294
[27] UNSUPERVISED DOMAIN ADAPTATION VIA DOMAIN ADVERSARIAL TRAINING FOR SPEAKER RECOGNITION
Wang, Qing
Rao, Wei
Sun, Sining
Xie, Lei
Chng, Eng Siong
Li, Haizhou
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4889 - 4893
[28] JOINT SPEAKER DIARIZATION AND RECOGNITION USING CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS
Zhou, Zhihan
Zhang, Yichi
Duan, Zhiyao
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2496 - 2500
[29] Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech
Simic, Nikola
Suzic, Sinisa
Nosek, Tijana
Vujovic, Mia
Peric, Zoran
Savic, Milan
Delic, Vlado
ENTROPY, 2022, 24 (03)
[30] An Artificial Neural Networks Model by Using Wavelet Analysis for Speaker Recognition
Returi, Kanaka Durga
Radhika, Y.
INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS, VOL 2, 2015, 340 : 859 - 874

← 1 2 3 4 5 →