U-GAT-VC: UNSUPERVISED GENERATIVE ATTENTIONAL NETWORKS FOR NON-PARALLEL VOICE CONVERSION

被引:2
|
作者
Shi, Sheng [1 ,2 ]
Shao, Jiahao [3 ]
Hao, Yifei [4 ]
Du, Yangzhou [2 ]
Fan, Jianping [1 ,2 ]
机构
[1] Northwest Univ, Xian 710127, Peoples R China
[2] Lenovo Res, AI Lab, Beijing 100094, Peoples R China
[3] Tsinghua Univ, Beijing 100084, Peoples R China
[4] Univ Southern Calif, Los Angeles, CA 90007 USA
关键词
Non-parallel Voice Conversion; Generative Adversarial Network; Inter attention mechanism; Intra attention mechanism; Perceptual loss;
D O I
10.1109/ICASSP43922.2022.9746992
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Non-parallel voice conversion (VC) is a technique of transferring voice from one style to another without using a parallel corpus in model training. Various methods are proposed to approach non-parallel VC using deep neural networks. Among them, CycleGAN-VC and its variants have been widely accepted as benchmark methods. However, there is still a gap to bridge between the real target and converted voice and an increased number of parameters leads to slow convergence in training process. Inspired by recent advancements in unsupervised image translation, we propose a new end-to-end unsupervised framework U-GAT-VC that adopts a novel inter- and intra-attention mechanism to guide the voice conversion to focus on more important regions in spectrograms. We also introduce disentangle perceptual loss in our model to capture high-level spectral features. Subjective and objective evaluations shows our proposed model outperforms CycleGAN-VC2/3 in terms of conversion quality and voice naturalness.
引用
收藏
页码:7017 / 7021
页数:5
相关论文
共 50 条
  • [31] MoCoVC: Non-parallel Voice Conversion with Momentum Contrastive Representation Learning
    Onishi, Kotaro
    Nakashika, Toru
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1438 - 1443
  • [32] A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data
    Tian, Xiaohai
    Chng, Eng Siong
    Li, Haizhou
    INTERSPEECH 2019, 2019, : 201 - 205
  • [33] Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning
    Al-Radhi, Mohammed Salah
    Csapo, Tamas Gabor
    Nemeth, Geza
    APPLIED SCIENCES-BASEL, 2021, 11 (16):
  • [34] Non-parallel Sequence-to-Sequence Voice Conversion for Arbitrary Speakers
    Zhang, Ying
    Che, Hao
    Wang, Xiaorui
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [35] StarGAN-VC plus ASR: StarGAN-based Non-Parallel Voice Conversion Regularized by Automatic Speech Recognition
    Sakamoto, Shoki
    Taniguchi, Akira
    Taniguchi, Tadahiro
    Kameoka, Hirokazu
    INTERSPEECH 2021, 2021, : 1359 - 1363
  • [36] Non-parallel Voice Conversion with Controllable Speaker Individuality using Variational Autoencoder
    Tuan Vu Ho
    Akagi, Masato
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 106 - 111
  • [37] Non-parallel Many-to-many Voice Conversion with PSR-StarGAN
    Li, Yanping
    Xu, Dongxiang
    Zhang, Yan
    Wang, Yang
    Chen, Binbin
    INTERSPEECH 2020, 2020, : 781 - 785
  • [38] Non-Parallel Voice Conversion System With WaveNet Vocoder and Collapsed Speech Suppression
    Wu, Yi-Chiao
    Tobing, Patrick Lumban
    Kobayashi, Kazuhiro
    Hayashi, Tomoki
    Toda, Tomoki
    IEEE ACCESS, 2020, 8 : 62094 - 62106
  • [39] Non-parallel Many-to-many Singing Voice Conversion by Adversarial Learning
    Hu, Jinsen
    Yu, Chunyan
    Guan, Faqian
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 125 - 132
  • [40] Mapping Frames with DNN-HMM Recognizer for Non-parallel Voice Conversion
    Dong, Minghui
    Yang, Chenyu
    Lu, Yanfeng
    Ehnes, Jochen Walter
    Huang, Dongyan
    Ming, Huaiping
    Tong, Rong
    Lee, Siu Wa
    Li, Haizhou
    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 488 - 494