Deep Speaker Embedding with Long Short Term Centroid Learning for Text-independent Speaker Verification

被引：3

作者：

Peng, Junyi ^{[1
]}

Gu, Rongzhi ^{[1
]}

Zou, Yuexian ^{[1
,2
]}

机构：

[1] Peking Univ, Sch ECE, ADSPLAB, Shenzhen, Peoples R China

[2] Peng Cheng Lab, Shenzhen, Peoples R China

来源：

INTERSPEECH 2020 | 2020年

关键词：

speaker verification; speaker embedding; speaker centroid; x-vectors; MARGIN SOFTMAX;

D O I：

10.21437/Interspeech.2020-2470

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Recently, speaker verification systems using deep neural networks have shown their effectiveness on large scale datasets. The widely used pairwise loss functions only consider the discrimination within a mini-batch data (short-term), while either the speaker identity information or the whole training dataset is not fully exploited. Thus, these pairwise comparisons may suffer from the interferences and variances brought by speaker-unrelated factors. To tackle this problem, we introduce the speaker identity information to form long-term speaker embedding centroids, which are determined by all the speakers in the training set. During the training process, each centroid dynamically accumulates the statistics of all samples belonging to a specific speaker. Since the long-term speaker embedding centroids are associated with a wide range of training samples, these centroids have the potential to be more robust and discriminative. Finally, these centroids are employed to construct a loss function, named long short term speaker loss (LSTSL). The proposed LSTSL constrains that the distances between samples and centroid from the same speaker are compact while those from different speakers are dispersed. Experiments are conducted on VoxCeleb1 and VoxCeleb2. Results on the VoxCeleb1 dataset demonstrate the effectiveness of our proposed LSTSL.

引用

页码：3246 / 3250

页数：5

共 50 条

[41] Maximum Likelihood Discriminant Feature for Text-Independent Speaker Verification
Liu, Qingsong
Dai, Beiqian
PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 3733 - 3736
[42] GRAPH ATTENTIVE FEATURE AGGREGATION FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
Shim, Hye-Jin
Heo, Jungwoo
Park, Jae-Han
Lee, Ga-Hui
Yu, Ha-Jin
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7972 - 7976
[43] Text-independent speaker verification using predictive neural networks
Finan, RA
Sapeluk, AT
Damper, RI
FIFTH INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS, 1997, (440): : 274 - 279
[44] SMALL FOOTPRINT TEXT-INDEPENDENT SPEAKER VERIFICATION FOR EMBEDDED SYSTEMS
Balian, Julien
Tavarone, Raffaele
Poumeyrol, Mathieu
Coucke, Alice
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6179 - 6183
[45] Score Fusion Methods for Text-Independent Speaker Verification Applications
Rastoceanu, Florin
Lazar, Marilena
2011 6TH CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2011,
[46] End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances
Zhang, Chunlei
Koishida, Kazuhito
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1487 - 1491
[47] An efficient text-independent speaker verification for short utterance data from Mobile devices
Sanghamitra V. Arora
Rekha Vig
Multimedia Tools and Applications, 2020, 79 : 3049 - 3074
[48] Strategies for End-to-End Text-Independent Speaker Verification
Lin, Weiwei
Mak, Man-Wai
Chien, Jen-Tzung
INTERSPEECH 2020, 2020, : 4308 - 4312
[49] Acoustic Feature Shuffling Network for Text-Independent Speaker Verification
Li, Jin
Fang, Xin
Chu, Fan
Gao, Tian
Song, Yan
Dai, Lirong
INTERSPEECH 2022, 2022, : 4790 - 4794
[50] Robust text-independent speaker verification using genetic programming
Day, Peter
Nandi, Asoke K.
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (01): : 285 - 295

← 1 2 3 4 5 →