Zero-shot voice conversion based on feature disentanglement

被引:0
|
作者
Guo, Na [1 ]
Wei, Jianguo [1 ]
Li, Yongwei [2 ]
Lu, Wenhuan [1 ]
Tao, Jianhua [3 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin, Peoples R China
[2] Chinese Acad Sci, Inst Psychol, CAS Key Lab Behav Sci, Beijing, Peoples R China
[3] Tsinghua Univ, Dept Automat, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Zero-shot voice conversion; Mixed speaker layer normalization; Adaptive attention weight normalization; Dynamic convolution; SPARSE REPRESENTATION; ADAPTATION; SPEAKER;
D O I
10.1016/j.specom.2024.103143
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Voice conversion (VC) aims to convert the voice from a source speaker to a target speaker without modifying the linguistic content. Zero-shot voice conversion has attracted significant attention in the task of VC because it can achieve conversion for speakers who did not appear during the training stage. Despite the significant progress made by previous methods in zero-shot VC, there is still room for improvement in separating speaker information and content information. In this paper, we propose a zero-shot VC method based on feature disentanglement. The proposed model uses a speaker encoder for extracting speaker embeddings, introduces mixed speaker layer normalization to eliminate residual speaker information in content encoding, and employs adaptive attention weight normalization for conversion. Furthermore, dynamic convolution is introduced to improve speech content modeling while requiring a small number of parameters. The experiments demonstrate that performance of the proposed model is superior to several state-of-the-art models, achieving both high similarity with the target speaker and intelligibility. In addition, the decoding speed of our model is much higher than the existing state-of-the-art models.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Connectionist temporal classification loss for vector quantized variational autoencoder in zero-shot voice conversion
    Kang, Xiao
    Huang, Hao
    Hu, Ying
    Huang, Zhihua
    DIGITAL SIGNAL PROCESSING, 2021, 116
  • [42] Zero-Shot Dialogue Disentanglement by Self-Supervised Entangled Response Selection
    Chi, Ta-Chung
    Rudnicky, Alexander, I
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 4897 - 4902
  • [43] Multimodal Disentanglement Variational AutoEncoders for Zero-Shot Cross-Modal Retrieval
    Tian, Jialin
    Wang, Kai
    Xu, Xing
    Cao, Zuo
    Shen, Fumin
    Shen, Heng Tao
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 960 - 969
  • [44] Underwater Sonar Image Classification with Image Disentanglement Reconstruction and Zero-Shot Learning
    Peng, Ye
    Li, Houpu
    Zhang, Wenwen
    Zhu, Junhui
    Liu, Lei
    Zhai, Guojun
    REMOTE SENSING, 2025, 17 (01)
  • [45] Alleviating Feature Confusion for Generative Zero-shot Learning
    Li, Jingjing
    Jing, Mengmeng
    Lu, Ke
    Zhu, Lei
    Yang, Yang
    Huang, Zi
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1587 - 1595
  • [46] Zero-Shot Task Adaptation with Relevant Feature Information
    Kumagai, Atsutoshi
    Iwata, Tomoharu
    Fujiwara, Yasuhiro
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13283 - 13291
  • [47] Synthetic Feature Assessment for Zero-Shot Object Detection
    Dai, Xinmiao
    Wang, Chong
    Li, Haohe
    Lin, Sunqi
    Dong, Li
    Wu, Jiafei
    Wang, Jun
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 444 - 449
  • [48] Transductive Zero-Shot Learning by Decoupled Feature Generation
    Marmoreo, Federico
    Cavazza, Jacopo
    Murino, Vittorio
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3108 - 3117
  • [49] Semantic Feature Extraction for Generalized Zero-Shot Learning
    Kim, Junhan
    Shim, Kyuhong
    Shim, Byonghyo
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1166 - 1173
  • [50] FREE: Feature Refinement for Generalized Zero-Shot Learning
    Chen, Shiming
    Wang, Wenjie
    Xia, Beihao
    Peng, Qinmu
    You, Xinge
    Zheng, Feng
    Shao, Ling
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 122 - 131