On the Issue of Calibration in DNN-based Speaker Recognition Systems

被引：5

作者：

McLaren, Mitchell ^{[1
]}

Castan, Diego ^{[1
]}

Ferrer, Luciana ^{[2
,3
]}

Lawson, Aaron ^{[1
]}

机构：

[1] SRI Int, Speech Technol & Res Lab, 333 Ravenswood Ave, Menlo Pk, CA 94025 USA

[2] Univ Buenos Aires, FCEN, Dept Comp, Buenos Aires, DF, Argentina

[3] Consejo Nacl Invest Cient & Tecn, Buenos Aires, DF, Argentina

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

speaker recognition; mismatch; calibration; deep neural network; bottleneck features;

D O I：

10.21437/Interspeech.2016-1134

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. DNNs have provided a new standard in technology when used in place of the traditional universal background model (UBM) for feature alignment, or to augment traditional features with those extracted from a bottleneck layer of the DNN. These techniques provide extremely good performance for constrained trial conditions that are well matched to development conditions. However, when applied to unseen conditions or a wide variety of conditions, some DNN-based techniques offer poor calibration performance. Through analysis on both PRISM and the recently released Speakers in the Wild (SITW) corpora, we illustrate that bottleneck features hinder calibration if used in the calculation of first-order Baum Welch statistics during i-vector extraction. We propose a hybrid alignment framework, which stems from our previous work in DNN senone alignment, that uses the bottleneck features only for the alignment of features during statistics calculation. This framework not only addresses the issue of calibration, but provides a more computationally efficient system based on bottleneck features with improved discriminative power.

引用

页码：1825 / 1829

页数：5

共 50 条

[21] DNN-based Amplitude and Phase Feature Enhancement for Noise Robust Speaker Identification
Oo, Zeyan
Kawakami, Yuta
Wang, Longbiao
Nakagawa, Seiichi
Xiao, Xiong
Iwahashi, Masahiro
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2204 - 2208
[22] Investigation of DNN-Based Audio-Visual Speech Recognition
Tamura, Satoshi
Ninomiya, Hiroshi
Kitaoka, Norihide
Osuga, Shin
Iribe, Yurie
Takeda, Kazuya
Hayamizu, Satoru
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (10): : 2444 - 2451
[23] Speaker adaptation in DNN-based speech synthesis using d-vectors
Doddipatla, Rama
Braunschweiler, Norbert
Maia, Ranniery
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3404 - 3408
[24] Evaluating and Improving Adversarial Attacks on DNN-Based Modulation Recognition
Zhao, Haojun
Lin, Yun
Gao, Song
Yu, Shui
2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2020,
[25] Unsupervised Speaker Adaptation for DNN-based Speech Synthesis using Input Codes
Takaki, Shinji
Nishimura, Yoshikazu
Yamagishi, Junichi
2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 649 - 658
[26] Scores Calibration in Speaker Recognition Systems
Shulipa, Andrey
Novoselov, Sergey
Matveev, Yuri
SPEECH AND COMPUTER, 2016, 9811 : 596 - 603
[27] DronePaint: Swarm Light Painting with DNN-based Gesture Recognition
Serpiva, Valerii
Karmanova, Ekaterina
Fedoseev, Aleksey
Perminov, Stepan
Tsetserukou, Dzmitry
SIGGRAPH '21: ACM SIGGRAPH 2021 EMERGING TECHNOLOGIES, 2021,
[28] INVESTIGATING DOMAIN SENSITIVITY OF DNN EMBEDDINGS FOR SPEAKER RECOGNITION SYSTEMS
Rahman, Md Hafizur
Himawan, Ivan
Sridharan, Sridha
Fookes, Clinton
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5811 - 5815
[29] On Parameter Adaptation in Softmax-based Cross-Entropy Loss for Improved Convergence Speed and Accuracy in DNN-based Speaker Recognition
Rybicka, Magdalena
Kowalczyk, Konrad
INTERSPEECH 2020, 2020, : 3805 - 3809
[30] ENVIRONMENT AWARE SPEAKER DIARIZATION FOR MOVING TARGETS USING PARALLEL DNN-BASED RECOGNIZERS
Najafian, Maryam
Hansen, John H. L.
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5450 - 5454

← 1 2 3 4 5 →