Zero-shot voice conversion based on feature disentanglement

被引:0
|
作者
Guo, Na [1 ]
Wei, Jianguo [1 ]
Li, Yongwei [2 ]
Lu, Wenhuan [1 ]
Tao, Jianhua [3 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin, Peoples R China
[2] Chinese Acad Sci, Inst Psychol, CAS Key Lab Behav Sci, Beijing, Peoples R China
[3] Tsinghua Univ, Dept Automat, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Zero-shot voice conversion; Mixed speaker layer normalization; Adaptive attention weight normalization; Dynamic convolution; SPARSE REPRESENTATION; ADAPTATION; SPEAKER;
D O I
10.1016/j.specom.2024.103143
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Voice conversion (VC) aims to convert the voice from a source speaker to a target speaker without modifying the linguistic content. Zero-shot voice conversion has attracted significant attention in the task of VC because it can achieve conversion for speakers who did not appear during the training stage. Despite the significant progress made by previous methods in zero-shot VC, there is still room for improvement in separating speaker information and content information. In this paper, we propose a zero-shot VC method based on feature disentanglement. The proposed model uses a speaker encoder for extracting speaker embeddings, introduces mixed speaker layer normalization to eliminate residual speaker information in content encoding, and employs adaptive attention weight normalization for conversion. Furthermore, dynamic convolution is introduced to improve speech content modeling while requiring a small number of parameters. The experiments demonstrate that performance of the proposed model is superior to several state-of-the-art models, achieving both high similarity with the target speaker and intelligibility. In addition, the decoding speed of our model is much higher than the existing state-of-the-art models.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network
    Jia, Dongya
    Tian, Qiao
    Peng, Kainan
    Li, Jiaxin
    Chen, Yuanzhe
    Ma, Mingbo
    Wang, Yuping
    Wang, Yuxuan
    INTERSPEECH 2023, 2023, : 5476 - 5480
  • [2] Zero-Shot Face-Based Voice Conversion: Bottleneck-Free Speech Disentanglement in the Real-World Scenario
    Weng, Shao-En
    Shuai, Hong-Han
    Cheng, Wen-Huang
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13718 - 13726
  • [3] YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for Everyone
    Casanova, Edresson
    Weber, Julian
    Shulby, Christopher
    Candido Junior, Arnaldo
    Goelge, Eren
    Ponti, Moacir Antonelli
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [4] Towards Improved Zero-shot Voice Conversion with Conditional DSVAE
    Lian, Jiachen
    Zhang, Chunlei
    Anumanchipalli, Gopala Krishna
    Yu, Dong
    INTERSPEECH 2022, 2022, : 2598 - 2602
  • [5] Zero-Shot Unseen Speaker Anonymization via Voice Conversion
    Chang, Hyung-Pil
    Yoo, In-Chul
    Jeong, Changhyeon
    Yook, Dongsuk
    IEEE ACCESS, 2022, 10 : 130190 - 130199
  • [6] Improved Zero-Shot Voice Conversion Using Explicit Conditioning Signals
    Nercessian, Shahan
    INTERSPEECH 2020, 2020, : 4711 - 4715
  • [7] Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment
    Sheng, Zheng-Yan
    Ai, Yang
    Chen, Yan-Nian
    Ling, Zhen-Hua
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8443 - 8452
  • [8] ZERO-SHOT VOICE CONVERSION WITH ADJUSTED SPEAKER EMBEDDINGS AND SIMPLE ACOUSTIC FEATURES
    Tan, Zhiyuan
    Wei, Jianguo
    Xu, Junhai
    He, Yuqing
    Lu, Wenhuan
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5964 - 5968
  • [9] DGC-VECTOR: A NEW SPEAKER EMBEDDING FOR ZERO-SHOT VOICE CONVERSION
    Xiao, Ruitong
    Zhang, Haitong
    Lin, Yue
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6547 - 6551
  • [10] END-TO-END ZERO-SHOT VOICE CONVERSION USING A DDSP VOCODER
    Nercessian, Shahan
    2021 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2021, : 306 - 310