U-GAT-VC: UNSUPERVISED GENERATIVE ATTENTIONAL NETWORKS FOR NON-PARALLEL VOICE CONVERSION

被引:2
|
作者
Shi, Sheng [1 ,2 ]
Shao, Jiahao [3 ]
Hao, Yifei [4 ]
Du, Yangzhou [2 ]
Fan, Jianping [1 ,2 ]
机构
[1] Northwest Univ, Xian 710127, Peoples R China
[2] Lenovo Res, AI Lab, Beijing 100094, Peoples R China
[3] Tsinghua Univ, Beijing 100084, Peoples R China
[4] Univ Southern Calif, Los Angeles, CA 90007 USA
关键词
Non-parallel Voice Conversion; Generative Adversarial Network; Inter attention mechanism; Intra attention mechanism; Perceptual loss;
D O I
10.1109/ICASSP43922.2022.9746992
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Non-parallel voice conversion (VC) is a technique of transferring voice from one style to another without using a parallel corpus in model training. Various methods are proposed to approach non-parallel VC using deep neural networks. Among them, CycleGAN-VC and its variants have been widely accepted as benchmark methods. However, there is still a gap to bridge between the real target and converted voice and an increased number of parameters leads to slow convergence in training process. Inspired by recent advancements in unsupervised image translation, we propose a new end-to-end unsupervised framework U-GAT-VC that adopts a novel inter- and intra-attention mechanism to guide the voice conversion to focus on more important regions in spectrograms. We also introduce disentangle perceptual loss in our model to capture high-level spectral features. Subjective and objective evaluations shows our proposed model outperforms CycleGAN-VC2/3 in terms of conversion quality and voice naturalness.
引用
收藏
页码:7017 / 7021
页数:5
相关论文
共 50 条
  • [41] Non-Parallel Voice Conversion System Using An Auto-Regressive Model
    Ezzine, Kadria
    Frikha, Mondher
    Di Martino, Joseph
    PROCEEDINGS OF THE 2022 5TH INTERNATIONAL CONFERENCE ON ADVANCED SYSTEMS AND EMERGENT TECHNOLOGIES IC_ASET'2022), 2022, : 500 - 504
  • [42] Non-Parallel Training in Voice Conversion Using an Adaptive Restricted Boltzmann Machine
    Nakashika, Toru
    Takiguchi, Tetsuya
    Minami, Yasuhiro
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) : 2032 - 2045
  • [43] Non-Parallel Any-to-Many Voice Conversion by Replacing Speaker Statistics
    Liu, Yufei
    Yu, Chengzhu
    Shuai, Wang
    Yang, Zhenchuan
    Chao, Yang
    Zhang, Weibin
    INTERSPEECH 2021, 2021, : 1369 - 1373
  • [44] Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning
    Zhang, Jing-Xuan
    Ling, Zhen-Hua
    Dai, Li-Rong
    INTERSPEECH 2020, 2020, : 771 - 775
  • [45] VAW-GAN for Singing Voice Conversion with Non-parallel Training Data
    Lu, Junchen
    Zhou, Kun
    Sisman, Berrak
    Li, Haizhou
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 514 - 519
  • [46] Non-Parallel Voice Conversion Using Cycle-Consistent Adversarial Networks with Self-Supervised Representations
    Chun, Chanjun
    Lee, Young Han
    Lee, Geon Woo
    Jeon, Moongu
    Kim, Hong Kook
    2023 IEEE 20TH CONSUMER COMMUNICATIONS & NETWORKING CONFERENCE, CCNC, 2023,
  • [47] Non-parallel dictionary learning for voice conversion using non-negative Tucker decomposition
    Takashima, Yuki
    Nakashika, Toru
    Takiguchi, Tetsuya
    Ariki, Yasuo
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2019, 2019 (01)
  • [48] Non-parallel dictionary learning for voice conversion using non-negative Tucker decomposition
    Yuki Takashima
    Toru Nakashika
    Tetsuya Takiguchi
    Yasuo Ariki
    EURASIP Journal on Audio, Speech, and Music Processing, 2019
  • [49] Mixture of Factor Analyzers Using Priors From Non-Parallel Speech for Voice Conversion
    Wu, Zhizheng
    Kinnunen, Tomi
    Chng, Eng Siong
    Li, Haizhou
    IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (12) : 914 - 917
  • [50] Enhanced Variational Auto-encoder for Voice Conversion Using Non-parallel Corpora
    Huang Guojie
    Jin Hui
    Yu Yibiao
    PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 46 - 49