BUT Text-Dependent Speaker Verification System for SdSV Challenge 2020

被引：6

作者：

Lozano-Diez, Alicia ^{[1
]}

Silnova, Anna ^{[1
]}

Pulugundla, Bhargav ^{[1
]}

Rohdin, Johan ^{[1
]}

Vesely, Karel ^{[1
]}

Burget, Lukas ^{[1
]}

Plchot, Oldrich ^{[1
]}

Glembek, Ondrej ^{[1
]}

Novotny, Ondvrej ^{[1
]}

Matejka, Pavel ^{[1
]}

机构：

[1] Brno Univ Technol, Fac Informat Technol, IT4I Ctr Excellence, Brno, Czech Republic

来源：

INTERSPEECH 2020 | 2020年

基金：

美国国家科学基金会;

关键词：

text-dependent speaker verification; phrase-dependent PLDA; phrase recognizer; I-VECTORS;

D O I：

10.21437/Interspeech.2020-2882

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

In this paper, we present the winning BUT submission for the text-dependent task of the SdSV challenge 2020. Given the large amount of training data available in this challenge, we explore successful techniques from text-independent systems in the text-dependent scenario. In particular, we trained x-vector extractors on both in-domain and out-of-domain datasets and combine them with i-vectors trained on concatenated MFCCs and bottleneck features, which have proven effective for the text-dependent scenario. Moreover, we proposed the use of phrase-dependent PLDA backend for scoring and its combination with a simple phrase recognizer, which brings up to 63% relative improvement on our development set with respect to using standard PLDA. Finally, we combine our different i-vector and x-vector based systems using a simple linear logistic regression score level fusion, which provides 28% relative improvement on the evaluation set with respect to our best single system.

引用

页码：761 / 765

页数：5

共 28 条

[1]

[Anonymous], 2012, ODYSSEY

[2]

Boulianne D., 2011, IEEE 2011 WORKSH AUT, P1, DOI DOI 10.1017/CBO9781107415324.004

[3]

Chung JS, 2018, INTERSPEECH, P1086

[4] Front-End Factor Analysis for Speaker Verification [J].

Dehak, Najim ;

Kenny, Patrick J. ;

Dehak, Reda ;

Dumouchel, Pierre ;

Ouellet, Pierre .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798

[5]

Ghahremani Pegah, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P2494, DOI 10.1109/ICASSP.2014.6854049

[6]

Grezl Frantisek, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P7654, DOI 10.1109/ICASSP.2014.6855089

[7]

Grézl F, 2009, INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, P2915

[8]

Kenny P., 2010, ODYSSEY

[9]

Lozano-Diez A, 2016, Odyssey 2016, P352, DOI [10.21437/Odyssey.2016-51, DOI 10.21437/ODYSSEY.2016-51]

[10]

Martínez D, 2011, 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, P868

← 1 2 3 →