Investigation of Different Calibration Methods for Deep Speaker Embedding Based Verification Systems

被引：0

作者：

Novoselov, Sergey ^{[1
]}

Lavrentyeva, Galina ^{[1
]}

Volokhov, Vladimir ^{[1
,2
]}

Volkova, Marina ^{[1
,2
]}

Khmelev, Nikita ^{[1
,2
]}

Akulov, Artem ^{[1
,2
]}

机构：

[1] ITMO Univ, St Petersburg, Russia

[2] STC Ltd, St Petersburg, Russia

来源：

SPEECH AND COMPUTER, SPECOM 2023, PT I | 2023年 / 14338卷

关键词：

Speaker verification; Calibration; MagNetO;

D O I：

10.1007/978-3-031-48309-7_13

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deep speaker embedding extractors have already become new state-of-the-art systems in the speaker verification field. However, the problem of verification score calibration for such systems often remains out of focus. An irrelevant score calibration leads to serious issues, especially in the case of unknown acoustic conditions, even if we use a strong speaker verification system in terms of threshold-free metrics. This paper presents an investigation over several methods of score calibration: a classical approach based on the logistic regression model; the recently presented magnitude estimation network MagNetO that uses activations from the pooling layer of the trained deep speaker extractor and generalization of such approach based on separate scale and offset prediction neural networks. An additional focus of this research is to estimate the impact of score normalization on the calibration performance of the system. The obtained results demonstrate that there are no serious problems if in-domain development data are used for calibration tuning. Otherwise, a trade-off between good calibration performance and threshold-free system quality arises. In most cases using adaptive s-norm helps to stabilize score distributions and to improve system performance.

引用

页码：159 / 168

页数：10

共 18 条

[1] Alam J., 2020, Analysis of ABC submission to NIST SRE 2019 CMN and VAST challenge, P289, DOI [10.21437/odyssey.2020-41, DOI 10.21437/ODYSSEY.2020-41]
[2] Brummer N, 2014, Arxiv, DOI arXiv:1402.2447
[3] Brümmer N, 2013, INTERSPEECH, P1975
[4] Chung JS, 2018, INTERSPEECH, P1086
[5] Nuance - Politecnico di Torino's 2016 NIST Speaker Recognition Evaluation System
Colibro, Daniele
Vair, Claudio
Dalmasso, Emanuele
Farrell, Kevin
Karvitsky, Gennady
Cumani, Sandro
Laface, Pietro
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1338 - 1342
[6] Ferrer L, 2020, INT CONF ACOUST SPEE, P6604, DOI [10.1109/ICASSP40776.2020.9053485, 10.1109/icassp40776.2020.9053485]
[7] Garcia-Romero, 2020, MagNetO: X-vector magnitude estimation network plus offset for improved Speaker Recognition, P1, DOI [10.21437/odyssey.2020-1, DOI 10.21437/ODYSSEY.2020-1]
[8] Gusev A., Tech. rep
[9] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[10] Blind speech signal quality estimation for speaker verification systems
Lavrentyeva, Galina
Volkova, Marina
Avdeeva, Anastasia
NovoselovL, Sergey
Gorlanov, Artem
Andzukaev, Tseren
Ivanov, Artem
Kozlov, Aleksandr
[J]. INTERSPEECH 2020, 2020, : 1535 - 1539

← 1 2 →