Zero-shot voice conversion based on feature disentanglement

被引:0
|
作者
Guo, Na [1 ]
Wei, Jianguo [1 ]
Li, Yongwei [2 ]
Lu, Wenhuan [1 ]
Tao, Jianhua [3 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin, Peoples R China
[2] Chinese Acad Sci, Inst Psychol, CAS Key Lab Behav Sci, Beijing, Peoples R China
[3] Tsinghua Univ, Dept Automat, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Zero-shot voice conversion; Mixed speaker layer normalization; Adaptive attention weight normalization; Dynamic convolution; SPARSE REPRESENTATION; ADAPTATION; SPEAKER;
D O I
10.1016/j.specom.2024.103143
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Voice conversion (VC) aims to convert the voice from a source speaker to a target speaker without modifying the linguistic content. Zero-shot voice conversion has attracted significant attention in the task of VC because it can achieve conversion for speakers who did not appear during the training stage. Despite the significant progress made by previous methods in zero-shot VC, there is still room for improvement in separating speaker information and content information. In this paper, we propose a zero-shot VC method based on feature disentanglement. The proposed model uses a speaker encoder for extracting speaker embeddings, introduces mixed speaker layer normalization to eliminate residual speaker information in content encoding, and employs adaptive attention weight normalization for conversion. Furthermore, dynamic convolution is introduced to improve speech content modeling while requiring a small number of parameters. The experiments demonstrate that performance of the proposed model is superior to several state-of-the-art models, achieving both high similarity with the target speaker and intelligibility. In addition, the decoding speed of our model is much higher than the existing state-of-the-art models.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Feature Selection for Zero-Shot Gesture Recognition
    Madapana, Naveen
    Wachs, Juan
    2020 15TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2020), 2020, : 683 - 687
  • [32] Feature Generating Networks for Zero-Shot Learning
    Xian, Yongqin
    Lorenz, Tobias
    Schiele, Bernt
    Akata, Zeynep
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5542 - 5551
  • [33] Utilizing Adaptive Global Response Normalization and Cluster-Based Pseudo Labels for Zero-Shot Voice Conversion
    Um, Ji Sub
    Kim, Hoirin
    INTERSPEECH 2024, 2024, : 2740 - 2744
  • [34] GAZEV: GAN-Based Zero-Shot Voice Conversion over Non-parallel Speech Corpus
    Zhang, Zining
    He, Bingsheng
    Zhang, Zhenjie
    INTERSPEECH 2020, 2020, : 791 - 795
  • [35] Zero-shot sketch-based image retrieval with structure-aware asymmetric disentanglement
    Li, Jiangtong
    Ling, Zhixin
    Niu, Li
    Zhang, Liqing
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 218
  • [36] Zero-Shot Traffic Sign Recognition Based on Midlevel Feature Matching
    Gan, Yaozong
    Li, Guang
    Togo, Ren
    Maeda, Keisuke
    Ogawa, Takahiro
    Haseyama, Miki
    Martinez, Francisco J.
    SENSORS, 2023, 23 (23)
  • [37] New Indicators and Optimizations for Zero-Shot NAS Based on Feature Maps
    Jiang, Tangyu
    Wang, Haodi
    Bie, Rongfang
    Jiao, Libin
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT III, KSEM 2024, 2024, 14886 : 411 - 422
  • [38] Embedded Zero-Shot Image Classification Based on Bidirectional Feature Mapping
    Sun, Huadong
    Zhen, Zhibin
    Liu, Yinghui
    Zhang, Xu
    Han, Xiaowei
    Zhang, Pengyi
    APPLIED SCIENCES-BASEL, 2024, 14 (12):
  • [39] Multi-Level Temporal-Channel Speaker Retrieval for Zero-Shot Voice Conversion
    Wang, Zhichao
    Xue, Liumeng
    Kong, Qiuqiang
    Xie, Lei
    Chen, Yuanzhe
    Tian, Qiao
    Wang, Yuping
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2926 - 2937
  • [40] SLMGAN: EXPLOITING SPEECH LANGUAGE MODEL REPRESENTATIONS FOR UNSUPERVISED ZERO-SHOT VOICE CONVERSION IN GANS
    Li, Yinghao Aaron
    Han, Cong
    Mesgarani, Nima
    2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,