Deep Speaker Embedding with Long Short Term Centroid Learning for Text-independent Speaker Verification

被引:3
|
作者
Peng, Junyi [1 ]
Gu, Rongzhi [1 ]
Zou, Yuexian [1 ,2 ]
机构
[1] Peking Univ, Sch ECE, ADSPLAB, Shenzhen, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
来源
INTERSPEECH 2020 | 2020年
关键词
speaker verification; speaker embedding; speaker centroid; x-vectors; MARGIN SOFTMAX;
D O I
10.21437/Interspeech.2020-2470
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Recently, speaker verification systems using deep neural networks have shown their effectiveness on large scale datasets. The widely used pairwise loss functions only consider the discrimination within a mini-batch data (short-term), while either the speaker identity information or the whole training dataset is not fully exploited. Thus, these pairwise comparisons may suffer from the interferences and variances brought by speaker-unrelated factors. To tackle this problem, we introduce the speaker identity information to form long-term speaker embedding centroids, which are determined by all the speakers in the training set. During the training process, each centroid dynamically accumulates the statistics of all samples belonging to a specific speaker. Since the long-term speaker embedding centroids are associated with a wide range of training samples, these centroids have the potential to be more robust and discriminative. Finally, these centroids are employed to construct a loss function, named long short term speaker loss (LSTSL). The proposed LSTSL constrains that the distances between samples and centroid from the same speaker are compact while those from different speakers are dispersed. Experiments are conducted on VoxCeleb1 and VoxCeleb2. Results on the VoxCeleb1 dataset demonstrate the effectiveness of our proposed LSTSL.
引用
收藏
页码:3246 / 3250
页数:5
相关论文
共 50 条
  • [41] Maximum Likelihood Discriminant Feature for Text-Independent Speaker Verification
    Liu, Qingsong
    Dai, Beiqian
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 3733 - 3736
  • [42] GRAPH ATTENTIVE FEATURE AGGREGATION FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Shim, Hye-Jin
    Heo, Jungwoo
    Park, Jae-Han
    Lee, Ga-Hui
    Yu, Ha-Jin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7972 - 7976
  • [43] Text-independent speaker verification using predictive neural networks
    Finan, RA
    Sapeluk, AT
    Damper, RI
    FIFTH INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS, 1997, (440): : 274 - 279
  • [44] SMALL FOOTPRINT TEXT-INDEPENDENT SPEAKER VERIFICATION FOR EMBEDDED SYSTEMS
    Balian, Julien
    Tavarone, Raffaele
    Poumeyrol, Mathieu
    Coucke, Alice
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6179 - 6183
  • [45] Score Fusion Methods for Text-Independent Speaker Verification Applications
    Rastoceanu, Florin
    Lazar, Marilena
    2011 6TH CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2011,
  • [46] End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances
    Zhang, Chunlei
    Koishida, Kazuhito
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1487 - 1491
  • [47] An efficient text-independent speaker verification for short utterance data from Mobile devices
    Sanghamitra V. Arora
    Rekha Vig
    Multimedia Tools and Applications, 2020, 79 : 3049 - 3074
  • [48] Strategies for End-to-End Text-Independent Speaker Verification
    Lin, Weiwei
    Mak, Man-Wai
    Chien, Jen-Tzung
    INTERSPEECH 2020, 2020, : 4308 - 4312
  • [49] Acoustic Feature Shuffling Network for Text-Independent Speaker Verification
    Li, Jin
    Fang, Xin
    Chu, Fan
    Gao, Tian
    Song, Yan
    Dai, Lirong
    INTERSPEECH 2022, 2022, : 4790 - 4794
  • [50] Robust text-independent speaker verification using genetic programming
    Day, Peter
    Nandi, Asoke K.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (01): : 285 - 295