Deep Speaker Embedding with Long Short Term Centroid Learning for Text-independent Speaker Verification

被引：3

作者：

Peng, Junyi ^{[1
]}

Gu, Rongzhi ^{[1
]}

Zou, Yuexian ^{[1
,2
]}

机构：

[1] Peking Univ, Sch ECE, ADSPLAB, Shenzhen, Peoples R China

[2] Peng Cheng Lab, Shenzhen, Peoples R China

来源：

INTERSPEECH 2020 | 2020年

关键词：

speaker verification; speaker embedding; speaker centroid; x-vectors; MARGIN SOFTMAX;

D O I：

10.21437/Interspeech.2020-2470

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Recently, speaker verification systems using deep neural networks have shown their effectiveness on large scale datasets. The widely used pairwise loss functions only consider the discrimination within a mini-batch data (short-term), while either the speaker identity information or the whole training dataset is not fully exploited. Thus, these pairwise comparisons may suffer from the interferences and variances brought by speaker-unrelated factors. To tackle this problem, we introduce the speaker identity information to form long-term speaker embedding centroids, which are determined by all the speakers in the training set. During the training process, each centroid dynamically accumulates the statistics of all samples belonging to a specific speaker. Since the long-term speaker embedding centroids are associated with a wide range of training samples, these centroids have the potential to be more robust and discriminative. Finally, these centroids are employed to construct a loss function, named long short term speaker loss (LSTSL). The proposed LSTSL constrains that the distances between samples and centroid from the same speaker are compact while those from different speakers are dispersed. Experiments are conducted on VoxCeleb1 and VoxCeleb2. Results on the VoxCeleb1 dataset demonstrate the effectiveness of our proposed LSTSL.

引用

页码：3246 / 3250

页数：5

共 50 条

[31] Text-independent speaker verification with dynamic trajectory model
Xiang, B
IEEE SIGNAL PROCESSING LETTERS, 2003, 10 (05) : 141 - 143
[32] Score normalization for text-independent speaker verification systems
Auckenthaler, R
Carey, M
Lloyd-Thomas, H
DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) : 42 - 54
[33] Automatic text-independent speaker verification using convolutional deep belief network
Rakhmanenko, I. A.
Shelupanov, A. A.
Kostyuchenko, E. Y.
COMPUTER OPTICS, 2020, 44 (04) : 596 - +
[34] Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification
You, Lanhua
Guo, Wu
Dai, Li-Rong
Du, Jun
INTERSPEECH 2019, 2019, : 1168 - 1172
[35] Text-Independent Speaker Verification Using Rank Threshold in Large Number of Speaker Models
Okamoto, Haruka
Tsuge, Satoru
Abdelwahab, Amira
Nishida, Masafumi
Horiuchi, Yasuo
Kuroiwa, Shingo
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2319 - +
[36] Pseudo-Phoneme Label Loss for Text-Independent Speaker Verification
Niu, Mengqi
He, Liang
Fang, Zhihua
Zhao, Baowei
Wang, Kai
APPLIED SCIENCES-BASEL, 2022, 12 (15):
[37] Cross similarity measurement for speaker adaptive test normalization in text-independent speaker verification
ZHAO Jian
The Journal of China Universities of Posts and Telecommunications, 2008, (02) : 130 - 134
[38] Significance of Constraining Text in Limited Data Text-independent Speaker Verification
Das, Rohan Kumar
Jelil, Sarfaraz
Prasanna, S. R. Mahadeva
2016 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2016,
[39] FRAME-LEVEL PHONEME-INVARIANT SPEAKER EMBEDDING FOR TEXT-INDEPENDENT SPEAKER RECOGNITION ON EXTREMELY SHORT UTTERANCES
Tawara, Naohiro
Ogawa, Atsunori
Iwata, Tomoharu
Delcroix, Marc
Ogawa, Tetsuji
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6799 - 6803
[40] GENERATIVE X-VECTORS FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
Xu, Longting
Das, Rohan Kumar
Yilmaz, Emre
Yang, Jichen
Li, Haizhou
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 1014 - 1020

← 1 2 3 4 5 →