Analysis of Critical Metadata Factors for the Calibration of Speaker Recognition Systems

被引:8
作者
Nandwana, Mahesh Kumar [1 ]
Ferrer, Luciana [2 ]
McLaren, Mitchell [1 ]
Castan, Diego [1 ]
Lawson, Aaron [1 ]
机构
[1] SRI Int, Speech Technol & Res Lab, 333 Ravenswood Ave, Menlo Pk, CA 94025 USA
[2] UBA CONICET, Inst Invest Ciencias Comp, Buenos Aires, DF, Argentina
来源
INTERSPEECH 2019 | 2019年
关键词
speaker recognition; calibration; metadata; calibration loss;
D O I
10.21437/Interspeech.2019-1808
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this paper, we analyze and assess the impact of critical metadata factors on the calibration performance of speaker recognition systems. In particular, we study the effect of duration, distance, language, and gender by using a variety of datasets and systematically varying the conditions in the evaluation and calibration sets. For all experiments, the system is based on i-vectors and a probabilistic linear discriminant analysis (PLDA) back-end and linear calibration. We measure system performance in terms of calibration loss. Our experiments reveal (i) a large degradation when the duration used for calibration is significantly different from that in the evaluation set; (ii) no significant degradation when a different gender is used for calibration than for evaluation; (iii) a large degradation when microphone distance is significantly different between the sets; and (iv) a small loss for closely related languages and languages with shared vocabulary. This analysis will be beneficial in the development of speaker recognition systems for use in unseen environments and for forensic speaker recognition analysts when selecting relevant population data.
引用
收藏
页码:4325 / 4329
页数:5
相关论文
共 19 条
  • [1] [Anonymous], 2012, P ODYSSEY 2012 THE S
  • [2] Application-independent evaluation of speaker detection
    Brümmer, N
    du Preez, J
    [J]. COMPUTER SPEECH AND LANGUAGE, 2006, 20 (2-3) : 230 - 275
  • [3] Front-End Factor Analysis for Speaker Verification
    Dehak, Najim
    Kenny, Patrick J.
    Dehak, Reda
    Dumouchel, Pierre
    Ouellet, Pierre
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 788 - 798
  • [4] Toward Fail-Safe Speaker Recognition: Trial-Based Calibration With a Reject Option
    Ferrer, Luciana
    Nandwana, Mahesh Kumar
    McLaren, Mitchell
    Castan, Diego
    Lawson, Aaron
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (01) : 140 - 153
  • [5] Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition
    Gonzalez-Rodriguez, J
    Drygajlo, A
    Ramos-Castro, D
    Garcia-Gomar, M
    Ortega-Garcia, J
    [J]. COMPUTER SPEECH AND LANGUAGE, 2006, 20 (2-3) : 331 - 355
  • [6] Speaker Recognition by Machines and Humans
    Hansen, John H. L.
    Hasan, Taufiq
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (06) : 74 - 99
  • [7] What is the relevant population? Considerations for the computation of likelihood ratios in forensic voice comparison
    Hughes, Vincent
    Foulkes, Paul
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3772 - 3776
  • [8] The relevant population in forensic voice comparison: Effects of varying delimitations of social class and age
    Hughes, Vincent
    Foulkes, Paul
    [J]. SPEECH COMMUNICATION, 2015, 66 : 218 - 230
  • [9] Kelly F, 2018, IEEE W SP LANG TECH, P1060, DOI 10.1109/SLT.2018.8639595
  • [10] Kelly F, 2016, IEEE W SP LANG TECH, P205, DOI 10.1109/SLT.2016.7846266