MoCoVC: Non-parallel Voice Conversion with Momentum Contrastive Representation Learning

被引：0

作者：

Onishi, Kotaro ^{[1
]}

Nakashika, Toru ^{[1
]}

机构：

[1] Univ Elect Commun, Tokyo, Japan

来源：

PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2022年

关键词：

D O I：

10.23919/APSIPAASC55919.2022.9979937

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Non-parallel voice conversion with deep neural networks often disentangle speaker individuality and speech content. However, these methods rely on external models, text data, or implicit constraints for ways to disentangle. They may require learning other models or annotating text, or may not understand how latent representations are acquired. Therefore, we propose voice conversion with momentum contrastive representation learning (MoCoVC), a method of explicitly adding constraints to intermediate features using contrastive representation learning, which is a self-supervised learning method. Using contrastive representation learning with transformations that preserve utterance content allows us to explicitly constrain the intermediate features to preserve utterance content. We present transformations used for contrastive representation learning that could be used for voice conversion and verify the effectiveness of each in an experiment. Moreover, MoCoVC demonstrates a high or comparable performance to the vector quantization constrained method in terms of both naturalness and speaker individuality in subjective evaluation experiments.

引用

页码：1438 / 1443

页数：6

共 28 条

[1]

Chen T, 2020, Arxiv, DOI arXiv:2002.05709

[2]

de Vries Harm, 2017, Advances in Neural Information Processing Systems, V30

[3] Voice Conversion from Non-parallel Corpora Using Variational Auto-encoder [J].

Hsu, Chin-Cheng ;

Hwang, Hsin-Te ;

Wu, Yi-Chiao ;

Tsao, Yu ;

Wang, Hsin-Min .

2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,

[4] Momentum Contrast for Unsupervised Visual Representation Learning [J].

He, Kaiming ;

Fan, Haoqi ;

Wu, Yuxin ;

Xie, Saining ;

Girshick, Ross .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :9726-9735

[5]

Kameoka Hirokazu, 2018, ACVAE VC NONPARALLEL, P1

[6]

Kim Jaehyeon, 2020, Adv Neural Inf Process Syst, V33

[7]

Kim Jaehyeon, 2021, PROC INT C MACH LEAR

[8]

Kingma D. P., 2015, ACS SYM SER

[9]

Kong Jungil, 2020, ADV NEUR IN, V33

[10]

KUBICHEK RF, 1993, IEEE PACIF, P125, DOI 10.1109/PACRIM.1993.407206

← 1 2 3 →