Analysis of Critical Metadata Factors for the Calibration of Speaker Recognition Systems

被引：8

作者：

Nandwana, Mahesh Kumar ^{[1
]}

Ferrer, Luciana ^{[2
]}

McLaren, Mitchell ^{[1
]}

Castan, Diego ^{[1
]}

Lawson, Aaron ^{[1
]}

机构：

[1] SRI Int, Speech Technol & Res Lab, 333 Ravenswood Ave, Menlo Pk, CA 94025 USA

[2] UBA CONICET, Inst Invest Ciencias Comp, Buenos Aires, DF, Argentina

来源：

INTERSPEECH 2019 | 2019年

关键词：

speaker recognition; calibration; metadata; calibration loss;

D O I：

10.21437/Interspeech.2019-1808

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

In this paper, we analyze and assess the impact of critical metadata factors on the calibration performance of speaker recognition systems. In particular, we study the effect of duration, distance, language, and gender by using a variety of datasets and systematically varying the conditions in the evaluation and calibration sets. For all experiments, the system is based on i-vectors and a probabilistic linear discriminant analysis (PLDA) back-end and linear calibration. We measure system performance in terms of calibration loss. Our experiments reveal (i) a large degradation when the duration used for calibration is significantly different from that in the evaluation set; (ii) no significant degradation when a different gender is used for calibration than for evaluation; (iii) a large degradation when microphone distance is significantly different between the sets; and (iv) a small loss for closely related languages and languages with shared vocabulary. This analysis will be beneficial in the development of speaker recognition systems for use in unseen environments and for forensic speaker recognition analysts when selecting relevant population data.

引用

页码：4325 / 4329

页数：5

共 19 条

[1] [Anonymous], 2012, P ODYSSEY 2012 THE S
[2] Application-independent evaluation of speaker detection
Brümmer, N
du Preez, J
[J]. COMPUTER SPEECH AND LANGUAGE, 2006, 20 (2-3) : 230 - 275
[3] Front-End Factor Analysis for Speaker Verification
Dehak, Najim
Kenny, Patrick J.
Dehak, Reda
Dumouchel, Pierre
Ouellet, Pierre
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 788 - 798
[4] Toward Fail-Safe Speaker Recognition: Trial-Based Calibration With a Reject Option
Ferrer, Luciana
Nandwana, Mahesh Kumar
McLaren, Mitchell
Castan, Diego
Lawson, Aaron
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (01) : 140 - 153
[5] Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition
Gonzalez-Rodriguez, J
Drygajlo, A
Ramos-Castro, D
Garcia-Gomar, M
Ortega-Garcia, J
[J]. COMPUTER SPEECH AND LANGUAGE, 2006, 20 (2-3) : 331 - 355
[6] Speaker Recognition by Machines and Humans
Hansen, John H. L.
Hasan, Taufiq
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (06) : 74 - 99
[7] What is the relevant population? Considerations for the computation of likelihood ratios in forensic voice comparison
Hughes, Vincent
Foulkes, Paul
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3772 - 3776
[8] The relevant population in forensic voice comparison: Effects of varying delimitations of social class and age
Hughes, Vincent
Foulkes, Paul
[J]. SPEECH COMMUNICATION, 2015, 66 : 218 - 230
[9] Kelly F, 2018, IEEE W SP LANG TECH, P1060, DOI 10.1109/SLT.2018.8639595
[10] Kelly F, 2016, IEEE W SP LANG TECH, P205, DOI 10.1109/SLT.2016.7846266

← 1 2 →