Non-Parallel Voice Conversion Using Cycle-Consistent Adversarial Networks with Self-Supervised Representations

被引：5

作者：

Chun, Chanjun ^{[1
]}

Lee, Young Han ^{[2
]}

Lee, Geon Woo ^{[3
]}

Jeon, Moongu ^{[3
]}

Kim, Hong Kook ^{[3
]}

机构：

[1] Chosun Univ, Gwangju, South Korea

[2] Korea Elect Technol Inst, Seongnam, South Korea

[3] Gwangju Inst Sci & Technol GIST, Gwangju, South Korea

来源：

2023 IEEE 20TH CONSUMER COMMUNICATIONS & NETWORKING CONFERENCE, CCNC | 2023年

关键词：

Voice conversion; generative adversarial networks; CycleGAN-VC; self-supervised learning; wav2vec;

D O I：

10.1109/CCNC51644.2023.10060510

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Numerous voice conversion techniques using non-parallel data have been presented. Among these, there are many algorithms related to style transfer. This is because the voice conversion problem can be determined as a style transfer problem, where the linguistic and speaker information can be regarded as domains and styles, respectively. Here, the group of CycleGAN-VC series has considerable achievement, and thus we examine the feasibility of CycleGAN-VC for self-supervised representations. In other words, we incorporate analysis features extracted from wav2vec into the CycleGAN-VC model. Objective experiments showed that the quality of the converted speech is comparable to that of the original speech, and the source speech was successfully transformed into the voice of the target speech while preserving the linguistic information.

引用

页数：2

共 12 条

[1]

Baevski A, 2020, ADV NEUR IN, V33

[2]

Choi H.-S., 2021, ADV NEUR IN, V34, p16 251

[3]

Kameoka H, 2018, IEEE W SP LANG TECH, P266, DOI 10.1109/SLT.2018.8639535

[4]

Kaneko T, 2017, Arxiv, DOI [arXiv:1711.11293, 10.48550/ARXIV.1711.11293]

[5] MASKCYCLEGAN-VC: LEARNING NON-PARALLEL VOICE CONVERSION WITH FILLING IN FRAMES [J].

Kaneko, Takuhiro ;

Kameoka, Hirokazu ;

Tanaka, Kou ;

Hojo, Nobukatsu .

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :5919-5923

[6] CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion [J].

Kaneko, Takuhiro ;

Kameoka, Hirokazu ;

Tanaka, Kou ;

Hojo, Nobukatsu .

INTERSPEECH 2020, 2020, :2017-2021

[7]

Kaneko T, 2019, INT CONF ACOUST SPEE, P6820, DOI [10.1109/icassp.2019.8682897, 10.1109/ICASSP.2019.8682897]

[8] StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion [J].

Li, Yinghao Aaron ;

Zare, Ali ;

Mesgarani, Nima .

INTERSPEECH 2021, 2021, :1349-1353

[9]

Mohammadi SH, 2014, IEEE W SP LANG TECH, P19, DOI 10.1109/SLT.2014.7078543

[10] Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory [J].

Toda, Tomoki ;

Black, Alan W. ;

Tokuda, Keiichi .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (08) :2222-2235

← 1 2 →