Zero-shot voice conversion based on feature disentanglement

被引：0

作者：

Guo, Na ^{[1
]}

Wei, Jianguo ^{[1
]}

Li, Yongwei ^{[2
]}

Lu, Wenhuan ^{[1
]}

Tao, Jianhua ^{[3
]}

机构：

[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin, Peoples R China

[2] Chinese Acad Sci, Inst Psychol, CAS Key Lab Behav Sci, Beijing, Peoples R China

[3] Tsinghua Univ, Dept Automat, Beijing, Peoples R China

来源：

SPEECH COMMUNICATION | 2024年 / 165卷

基金：

中国国家自然科学基金;

关键词：

Zero-shot voice conversion; Mixed speaker layer normalization; Adaptive attention weight normalization; Dynamic convolution; SPARSE REPRESENTATION; ADAPTATION; SPEAKER;

D O I：

10.1016/j.specom.2024.103143

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Voice conversion (VC) aims to convert the voice from a source speaker to a target speaker without modifying the linguistic content. Zero-shot voice conversion has attracted significant attention in the task of VC because it can achieve conversion for speakers who did not appear during the training stage. Despite the significant progress made by previous methods in zero-shot VC, there is still room for improvement in separating speaker information and content information. In this paper, we propose a zero-shot VC method based on feature disentanglement. The proposed model uses a speaker encoder for extracting speaker embeddings, introduces mixed speaker layer normalization to eliminate residual speaker information in content encoding, and employs adaptive attention weight normalization for conversion. Furthermore, dynamic convolution is introduced to improve speech content modeling while requiring a small number of parameters. The experiments demonstrate that performance of the proposed model is superior to several state-of-the-art models, achieving both high similarity with the target speaker and intelligibility. In addition, the decoding speed of our model is much higher than the existing state-of-the-art models.

引用

页数：10

共 50 条

[1] Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network
Jia, Dongya
Tian, Qiao
Peng, Kainan
Li, Jiaxin
Chen, Yuanzhe
Ma, Mingbo
Wang, Yuping
Wang, Yuxuan
INTERSPEECH 2023, 2023, : 5476 - 5480
[2] Zero-Shot Face-Based Voice Conversion: Bottleneck-Free Speech Disentanglement in the Real-World Scenario
Weng, Shao-En
Shuai, Hong-Han
Cheng, Wen-Huang
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13718 - 13726
[3] YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for Everyone
Casanova, Edresson
Weber, Julian
Shulby, Christopher
Candido Junior, Arnaldo
Goelge, Eren
Ponti, Moacir Antonelli
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[4] Towards Improved Zero-shot Voice Conversion with Conditional DSVAE
Lian, Jiachen
Zhang, Chunlei
Anumanchipalli, Gopala Krishna
Yu, Dong
INTERSPEECH 2022, 2022, : 2598 - 2602
[5] Zero-Shot Unseen Speaker Anonymization via Voice Conversion
Chang, Hyung-Pil
Yoo, In-Chul
Jeong, Changhyeon
Yook, Dongsuk
IEEE ACCESS, 2022, 10 : 130190 - 130199
[6] Improved Zero-Shot Voice Conversion Using Explicit Conditioning Signals
Nercessian, Shahan
INTERSPEECH 2020, 2020, : 4711 - 4715
[7] Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment
Sheng, Zheng-Yan
Ai, Yang
Chen, Yan-Nian
Ling, Zhen-Hua
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8443 - 8452
[8] ZERO-SHOT VOICE CONVERSION WITH ADJUSTED SPEAKER EMBEDDINGS AND SIMPLE ACOUSTIC FEATURES
Tan, Zhiyuan
Wei, Jianguo
Xu, Junhai
He, Yuqing
Lu, Wenhuan
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5964 - 5968
[9] DGC-VECTOR: A NEW SPEAKER EMBEDDING FOR ZERO-SHOT VOICE CONVERSION
Xiao, Ruitong
Zhang, Haitong
Lin, Yue
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6547 - 6551
[10] END-TO-END ZERO-SHOT VOICE CONVERSION USING A DDSP VOCODER
Nercessian, Shahan
2021 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2021, : 306 - 310

← 1 2 3 4 5 →