A Survey on Generative Adversarial Networks based Models for Many-to-many Non-parallel Voice Conversion

被引:3
|
作者
Alaa, Yasmin [1 ]
Alfonse, Marco [1 ]
Aref, Mostafa M. [1 ]
机构
[1] Ain Shams Univ, Dept Comp Sci, Fac Comp & Informat Sci, Cairo, Egypt
关键词
Voice Conversion; many-to-many Voice Conversion; non-parallel Voice Conversion; mono-lingual Voice Conversion; Generative Adversarial Networks (GANs); StarGAN-VC; CycleGAN-VC; RECOGNITION;
D O I
10.1109/ICCI54321.2022.9756059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Voice Conversion (VC) is a task of converting speaker-dependent features of a source speaker's speech without changing the linguistic content. There are many successful VC systems, each trying to overcome some challenges. These challenges include the unavailability of parallel data and solving problems due to the language difference between the source and target speech. Also, one of these challenges is extending the VC system to cover a conversion across many source and target domains with minimal cost. Generative Adversarial Networks (GANs) are showing promising VC results. This work focuses on exploring many-to-many non-parallel GAN-based mono-lingual VC models (nine models that are highly cited), explains the used evaluation methods including objective and subjective methods (eight evaluation methods are presented), and comments on these models.
引用
收藏
页码:221 / 226
页数:6
相关论文
共 50 条
  • [31] Non-Parallel Any-to-Many Voice Conversion by Replacing Speaker Statistics
    Liu, Yufei
    Yu, Chengzhu
    Shuai, Wang
    Yang, Zhenchuan
    Chao, Yang
    Zhang, Weibin
    INTERSPEECH 2021, 2021, : 1369 - 1373
  • [32] Accent and Speaker Disentanglement in Many-to-many Voice Conversion
    Wang, Zhichao
    Ge, Wenshuo
    Wang, Xiong
    Yang, Shan
    Gan, Wendong
    Chen, Haitao
    Li, Hai
    Xie, Lei
    Li, Xiulin
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [33] Multiple Non-Negative Matrix Factorization for Many-to-Many Voice Conversion
    Aihara, Ryo
    Takiguchi, Tetsuya
    Ariki, Yasuo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (07) : 1175 - 1184
  • [34] Many-to-Many Voice Conversion based Feature Disentanglement using Variational Autoencoder
    Luang, Manh
    Viet Anh Tran
    INTERSPEECH 2021, 2021, : 851 - 855
  • [35] VoiceGrad: Non-Parallel Any-to-Many Voice Conversion With Annealed Langevin Dynamics
    Kameoka, Hirokazu
    Kaneko, Takuhiro
    Tanaka, Kou
    Hojo, Nobukatsu
    Seki, Shogo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2213 - 2226
  • [36] Diverse style oriented many-to-many emotional voice conversion
    Zhou, Jian
    Luo, Xiangyu
    Wang, Huabin
    Zheng, Wenming
    Tao, Liang
    Shengxue Xuebao/Acta Acustica, 2024, 49 (06): : 1297 - 1303
  • [37] StyleVC: Non-Parallel Voice Conversion with Adversarial Style Generalization
    Hwang, In-Sun
    Lee, Sang-Hoon
    Lee, Seong-Whan
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 23 - 30
  • [38] Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning
    Zhang, Jing-Xuan
    Ling, Zhen-Hua
    Dai, Li-Rong
    INTERSPEECH 2020, 2020, : 771 - 775
  • [39] Many-to-many voice conversion experiments using a Korean speech corpus
    Yook, Dongsuk
    Seo, HyungJin
    Ko, Bonggu
    Yoo, In-Chul
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2022, 41 (03): : 351 - 358
  • [40] U-GAT-VC: UNSUPERVISED GENERATIVE ATTENTIONAL NETWORKS FOR NON-PARALLEL VOICE CONVERSION
    Shi, Sheng
    Shao, Jiahao
    Hao, Yifei
    Du, Yangzhou
    Fan, Jianping
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7017 - 7021