Disentangled Speaker Representation Learning via Mutual Information Minimization

被引:0
|
作者
Mun, Sung Hwan [1 ]
Han, Min Hyun
Kim, Minchan
Lee, Dongjune
Kim, Nam Soo
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul, South Korea
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Domain mismatch problem caused by speaker-unrelated feature has been a major topic in speaker recognition. In this paper, we propose an explicit disentanglement framework to unravel speaker-relevant features from speaker-unrelated features via mutual information (MI) minimization. To achieve our goal of minimizing MI between speaker-related and speaker-unrelated features, we adopt a contrastive log-ratio upper bound (CLUB), which exploits the upper bound of MI. Our framework is constructed in a 3-stage structure. First, in the front-end encoder, input speech is encoded into shared initial embedding. Next, in the decoupling block, shared initial embedding is split into separate speaker-related and speaker-unrelated embeddings. Finally, disentanglement is conducted by MI minimization in the last stage. Experiments on Far-Field Speaker Verification Challenge 2022 (FFSVC2022) demonstrate that our proposed framework is effective for disentanglement. Also, to utilize domain-unknown datasets containing numerous speakers, we pre-trained the front-end encoder with VoxCeleb datasets. We then fine-tuned the speaker embedding model in the disentanglement framework with FFSVC 2022 dataset. The experimental results show that fine-tuning with a disentanglement framework on a existing pre-trained model is valid and can further improve performance.
引用
收藏
页码:89 / 96
页数:8
相关论文
共 50 条
  • [1] Learning Disentangled Representations for Counterfactual Regression via Mutual Information Minimization
    Cheng, Mingyuan
    Liao, Xinru
    Liu, Quan
    Ma, Bin
    Xu, Jian
    Zheng, Bo
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 1802 - 1806
  • [2] Learning Unsupervised Disentangled Capsule via Mutual Information
    Hu, MingFei
    Liu, ZeYu
    Liu, JianWei
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [3] Disentangled Representation for Age-Invariant Face Recognition: A Mutual Information Minimization Perspective
    Hou, Xuege
    Li, Yali
    Wang, Shengjin
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3672 - 3681
  • [4] Disentangled Representation Learning for Multilingual Speaker Recognition
    Nam, Kihyun
    Kim, Youkyum
    Huh, Jaesung
    Heo, Hee-Soo
    Jung, Jee-weon
    Chung, Joon Son
    INTERSPEECH 2023, 2023, : 5316 - 5320
  • [5] DRFormer: Learning Disentangled Representation for Pan-Sharpening via Mutual Information- Based Transformer
    Zhang, Feng
    Zhang, Kai
    Sun, Jiande
    Wang, Jian
    Bruzzone, Lorenzo
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
  • [6] DISENTANGLED SPEAKER AND LANGUAGE REPRESENTATIONS USING MUTUAL INFORMATION MINIMIZATION AND DOMAIN ADAPTATION FOR CROSS-LINGUAL TTS
    Xin, Detai
    Komatsu, Tatsuya
    Takamichi, Shinnosuke
    Saruwatari, Hiroshi
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6608 - 6612
  • [7] Learning Task-Specific Morphological Representation for Pyramidal Cells via Mutual Information Minimization
    Sun, Chunli
    Guo, Qinghai
    Yang, Gang
    Zhao, Feng
    PREDICTIVE INTELLIGENCE IN MEDICINE, PRIME 2023, 2023, 14277 : 134 - 145
  • [8] SPEAKER-INDEPENDENT LIPREADING BY DISENTANGLED REPRESENTATION LEARNING
    Zhang, Qun
    Wang, Shilin
    Chen, Gongliang
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2493 - 2497
  • [9] Learning Disentangled Representation with Mutual Information Maximization for Real-Time UAV Tracking
    Wang, Xucheng
    Yang, Xiangyang
    Ye, Hengzhou
    Li, Shuiwang
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1331 - 1336
  • [10] Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation
    Guo, Weikuo
    Huang, Huaibo
    Kong, Xiangwei
    He, Ran
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1712 - 1720