U-GAT-VC: UNSUPERVISED GENERATIVE ATTENTIONAL NETWORKS FOR NON-PARALLEL VOICE CONVERSION

被引:2
|
作者
Shi, Sheng [1 ,2 ]
Shao, Jiahao [3 ]
Hao, Yifei [4 ]
Du, Yangzhou [2 ]
Fan, Jianping [1 ,2 ]
机构
[1] Northwest Univ, Xian 710127, Peoples R China
[2] Lenovo Res, AI Lab, Beijing 100094, Peoples R China
[3] Tsinghua Univ, Beijing 100084, Peoples R China
[4] Univ Southern Calif, Los Angeles, CA 90007 USA
关键词
Non-parallel Voice Conversion; Generative Adversarial Network; Inter attention mechanism; Intra attention mechanism; Perceptual loss;
D O I
10.1109/ICASSP43922.2022.9746992
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Non-parallel voice conversion (VC) is a technique of transferring voice from one style to another without using a parallel corpus in model training. Various methods are proposed to approach non-parallel VC using deep neural networks. Among them, CycleGAN-VC and its variants have been widely accepted as benchmark methods. However, there is still a gap to bridge between the real target and converted voice and an increased number of parameters leads to slow convergence in training process. Inspired by recent advancements in unsupervised image translation, we propose a new end-to-end unsupervised framework U-GAT-VC that adopts a novel inter- and intra-attention mechanism to guide the voice conversion to focus on more important regions in spectrograms. We also introduce disentangle perceptual loss in our model to capture high-level spectral features. Subjective and objective evaluations shows our proposed model outperforms CycleGAN-VC2/3 in terms of conversion quality and voice naturalness.
引用
收藏
页码:7017 / 7021
页数:5
相关论文
共 50 条
  • [1] Non-parallel Voice Conversion with Generative Attentional Networks
    Chiu, Tse Wei
    Guo, You Sheng
    Chang, Pao-Chi
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 141 - 145
  • [2] Non-parallel Voice Conversion using Generative Adversarial Networks
    Hasunuma, Yuta
    Hirayama, Chiaki
    Kobayashi, Masayuki
    Nagao, Tomoharu
    2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 1635 - 1640
  • [3] Non-parallel Voice Conversion using Weighted Generative Adversarial Networks
    Paul, Dipjyoti
    Pantazis, Yannis
    Stylianou, Yannis
    INTERSPEECH 2019, 2019, : 659 - 663
  • [4] STARGAN-VC: NON-PARALLEL MANY-TO-MANY VOICE CONVERSION USING STAR GENERATIVE ADVERSARIAL NETWORKS
    Kameoka, Hirokazu
    Kaneko, Takuhiro
    Tanaka, Kou
    Hojo, Nobukatsu
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 266 - 273
  • [5] Non-parallel Voice Conversion with Fewer Labeled Data by Conditional Generative Adversarial Networks
    Chen, Minchuan
    Hou, Weijian
    Ma, Jun
    Wang, Shaojun
    Xiao, Jing
    INTERSPEECH 2020, 2020, : 4716 - 4720
  • [6] StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
    Li, Yinghao Aaron
    Zare, Ali
    Mesgarani, Nima
    INTERSPEECH 2021, 2021, : 1349 - 1353
  • [7] CycleGAN-VC: Non-parallel Voice Conversion Using Cycle-Consistent Adversarial Networks
    Kaneko, Takuhiro
    Kameoka, Hirokazu
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2100 - 2104
  • [8] MASKCYCLEGAN-VC: LEARNING NON-PARALLEL VOICE CONVERSION WITH FILLING IN FRAMES
    Kaneko, Takuhiro
    Kameoka, Hirokazu
    Tanaka, Kou
    Hojo, Nobukatsu
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5919 - 5923
  • [9] A Survey on Generative Adversarial Networks based Models for Many-to-many Non-parallel Voice Conversion
    Alaa, Yasmin
    Alfonse, Marco
    Aref, Mostafa M.
    5TH INTERNATIONAL CONFERENCE ON COMPUTING AND INFORMATICS (ICCI 2022), 2022, : 221 - 226
  • [10] Non-parallel Voice Conversion Based on Perceptual Star Generative Adversarial Network
    Yanping Li
    Xiangtian Qiu
    Pan Cao
    Yan Zhang
    Bingkun Bao
    Circuits, Systems, and Signal Processing, 2022, 41 : 4632 - 4648