AN ITERATIVE FRAMEWORK FOR SELF-SUPERVISED DEEP SPEAKER REPRESENTATION LEARNING

被引:20
|
作者
Cai, Danwei [1 ]
Wang, Weiqing [1 ]
Li, Ming [2 ]
机构
[1] Duke Univ, Dept Elect & Comp Engn, Durham, NC USA
[2] Duke Kunshan Univ, Data Sci Res Ctr, Kunshan, Peoples R China
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
speaker recognition; speaker embedding; self-supervised learning; contrastive learning; clustering;
D O I
10.1109/ICASSP39728.2021.9414713
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose an iterative framework for self-supervised speaker representation learning based on a deep neural network (DNN). The framework starts with training a self-supervision speaker embedding network by maximizing agreement between different segments within an utterance via a contrastive loss. Taking advantage of DNN's ability to learn from data with label noise, we propose to cluster the speaker embedding obtained from the previous speaker network and use the subsequent class assignments as pseudo labels to train a new DNN. Moreover, we iteratively train the speaker network with pseudo labels generated from the previous step to bootstrap the discriminative power of a DNN. Speaker verification experiments are conducted on the VoxCeleb dataset. The results show that our proposed iterative self-supervised learning framework outperformed previous works using self-supervision. The speaker network after 5 iterations obtains a 61% performance gain over the speaker embedding model trained with contrastive loss.
引用
收藏
页码:6728 / 6732
页数:5
相关论文
共 50 条
  • [1] Augmentation Adversarial Training for Self-Supervised Speaker Representation Learning
    Kang, Jingu
    Huh, Jaesung
    Heo, Hee Soo
    Chung, Joon Son
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1253 - 1262
  • [2] A COMPREHENSIVE STUDY ON SELF-SUPERVISED DISTILLATION FOR SPEAKER REPRESENTATION LEARNING
    Chen, Zhengyang
    Qian, Yao
    Han, Bing
    Qian, Yanmin
    Zeng, Michael
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 599 - 604
  • [3] Bootstrap Equilibrium and Probabilistic Speaker Representation Learning for Self-Supervised Speaker Verification
    Mun, Sung Hwan
    Han, Min Hyun
    Lee, Dongjune
    Kim, Jihwan
    Kim, Nam Soo
    IEEE ACCESS, 2021, 9 : 167615 - 167627
  • [4] IPCL: ITERATIVE PSEUDO-SUPERVISED CONTRASTIVE LEARNING TO IMPROVE SELF-SUPERVISED FEATURE REPRESENTATION
    Kumar, Sonal
    Phukan, Anirudh
    Sur, Arijit
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 6270 - 6274
  • [5] Self-Supervised Representation Learning With Path Integral Clustering for Speaker Diarization
    Singh, Prachi
    Ganapathy, Sriram
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1639 - 1649
  • [6] Self-supervised speaker embeddings
    Stafylakis, Themos
    Rohdin, Johan
    Plchot, Oldrich
    Mizera, Petr
    Burget, Lukas
    INTERSPEECH 2019, 2019, : 2863 - 2867
  • [7] Self-Supervised RF Signal Representation Learning for NextG Signal Classification With Deep Learning
    Davaslioglu, Kemal
    Boztas, Serdar
    Ertem, Mehmet Can
    Sagduyu, Yalin E.
    Ayanoglu, Ender
    IEEE WIRELESS COMMUNICATIONS LETTERS, 2023, 12 (01) : 65 - 69
  • [8] Self-Supervised Dense Visual Representation Learning
    Ozcelik, Timoteos Onur
    Gokberk, Berk
    Akarun, Lale
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
  • [9] TimeCLR: A self-supervised contrastive learning framework for univariate time series representation
    Yang, Xinyu
    Zhang, Zhenguo
    Cui, Rongyi
    KNOWLEDGE-BASED SYSTEMS, 2022, 245
  • [10] ROBUST SELF-SUPERVISED SPEAKER REPRESENTATION LEARNING VIA INSTANCE MIX REGULARIZATION
    Kang, Woo Hyun
    Alam, Jahangir
    Fathan, Abderrahim
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6617 - 6621