Investigation of Different Calibration Methods for Deep Speaker Embedding Based Verification Systems

被引:0
作者
Novoselov, Sergey [1 ]
Lavrentyeva, Galina [1 ]
Volokhov, Vladimir [1 ,2 ]
Volkova, Marina [1 ,2 ]
Khmelev, Nikita [1 ,2 ]
Akulov, Artem [1 ,2 ]
机构
[1] ITMO Univ, St Petersburg, Russia
[2] STC Ltd, St Petersburg, Russia
来源
SPEECH AND COMPUTER, SPECOM 2023, PT I | 2023年 / 14338卷
关键词
Speaker verification; Calibration; MagNetO;
D O I
10.1007/978-3-031-48309-7_13
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep speaker embedding extractors have already become new state-of-the-art systems in the speaker verification field. However, the problem of verification score calibration for such systems often remains out of focus. An irrelevant score calibration leads to serious issues, especially in the case of unknown acoustic conditions, even if we use a strong speaker verification system in terms of threshold-free metrics. This paper presents an investigation over several methods of score calibration: a classical approach based on the logistic regression model; the recently presented magnitude estimation network MagNetO that uses activations from the pooling layer of the trained deep speaker extractor and generalization of such approach based on separate scale and offset prediction neural networks. An additional focus of this research is to estimate the impact of score normalization on the calibration performance of the system. The obtained results demonstrate that there are no serious problems if in-domain development data are used for calibration tuning. Otherwise, a trade-off between good calibration performance and threshold-free system quality arises. In most cases using adaptive s-norm helps to stabilize score distributions and to improve system performance.
引用
收藏
页码:159 / 168
页数:10
相关论文
共 18 条
  • [1] Alam J., 2020, Analysis of ABC submission to NIST SRE 2019 CMN and VAST challenge, P289, DOI [10.21437/odyssey.2020-41, DOI 10.21437/ODYSSEY.2020-41]
  • [2] Brummer N, 2014, Arxiv, DOI arXiv:1402.2447
  • [3] Brümmer N, 2013, INTERSPEECH, P1975
  • [4] Chung JS, 2018, INTERSPEECH, P1086
  • [5] Nuance - Politecnico di Torino's 2016 NIST Speaker Recognition Evaluation System
    Colibro, Daniele
    Vair, Claudio
    Dalmasso, Emanuele
    Farrell, Kevin
    Karvitsky, Gennady
    Cumani, Sandro
    Laface, Pietro
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1338 - 1342
  • [6] Ferrer L, 2020, INT CONF ACOUST SPEE, P6604, DOI [10.1109/ICASSP40776.2020.9053485, 10.1109/icassp40776.2020.9053485]
  • [7] Garcia-Romero, 2020, MagNetO: X-vector magnitude estimation network plus offset for improved Speaker Recognition, P1, DOI [10.21437/odyssey.2020-1, DOI 10.21437/ODYSSEY.2020-1]
  • [8] Gusev A., Tech. rep
  • [9] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [10] Blind speech signal quality estimation for speaker verification systems
    Lavrentyeva, Galina
    Volkova, Marina
    Avdeeva, Anastasia
    NovoselovL, Sergey
    Gorlanov, Artem
    Andzukaev, Tseren
    Ivanov, Artem
    Kozlov, Aleksandr
    [J]. INTERSPEECH 2020, 2020, : 1535 - 1539