A Survey on Generative Adversarial Networks based Models for Many-to-many Non-parallel Voice Conversion

被引:3
|
作者
Alaa, Yasmin [1 ]
Alfonse, Marco [1 ]
Aref, Mostafa M. [1 ]
机构
[1] Ain Shams Univ, Dept Comp Sci, Fac Comp & Informat Sci, Cairo, Egypt
关键词
Voice Conversion; many-to-many Voice Conversion; non-parallel Voice Conversion; mono-lingual Voice Conversion; Generative Adversarial Networks (GANs); StarGAN-VC; CycleGAN-VC; RECOGNITION;
D O I
10.1109/ICCI54321.2022.9756059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Voice Conversion (VC) is a task of converting speaker-dependent features of a source speaker's speech without changing the linguistic content. There are many successful VC systems, each trying to overcome some challenges. These challenges include the unavailability of parallel data and solving problems due to the language difference between the source and target speech. Also, one of these challenges is extending the VC system to cover a conversion across many source and target domains with minimal cost. Generative Adversarial Networks (GANs) are showing promising VC results. This work focuses on exploring many-to-many non-parallel GAN-based mono-lingual VC models (nine models that are highly cited), explains the used evaluation methods including objective and subjective methods (eight evaluation methods are presented), and comments on these models.
引用
收藏
页码:221 / 226
页数:6
相关论文
共 50 条
  • [21] Non-parallel Voice Conversion Based on Perceptual Star Generative Adversarial Network
    Li, Yanping
    Qiu, Xiangtian
    Cao, Pan
    Zhang, Yan
    Bao, Bingkun
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (08) : 4632 - 4648
  • [22] Non-parallel Voice Conversion Based on Perceptual Star Generative Adversarial Network
    Yanping Li
    Xiangtian Qiu
    Pan Cao
    Yan Zhang
    Bingkun Bao
    Circuits, Systems, and Signal Processing, 2022, 41 : 4632 - 4648
  • [23] Non-parallel Voice Conversion with Fewer Labeled Data by Conditional Generative Adversarial Networks
    Chen, Minchuan
    Hou, Weijian
    Ma, Jun
    Wang, Shaojun
    Xiao, Jing
    INTERSPEECH 2020, 2020, : 4716 - 4720
  • [24] High-Quality Many-to-Many Voice Conversion Using Transitive Star Generative Adversarial Networks with Adaptive Instance Normalization
    Li, Yanping
    He, Zhengtao
    Zhang, Yan
    Yang, Zhen
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2021, 30 (10)
  • [25] Many-to-many voice conversion with sentence embedding based on VAACGAN
    Li, Yanping
    Cao, Pan
    Shi, Yang
    Zhang, Yan
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2021, 47 (03): : 500 - 508
  • [26] Many-to-many eigenvoice conversion with reference voice
    Ohtani, Yamato
    Toda, Tomoki
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1591 - 1594
  • [27] Region Normalized Capsule Network Based Generative Adversarial Network for Non-parallel Voice Conversion
    Akhter, Md Tousin
    Banerjee, Padmanabha
    Dhar, Sandipan
    Ghosh, Subhayu
    Jana, Nanda Dulal
    SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 233 - 244
  • [28] Many-to-Many and Completely Parallel-Data-Free Voice Conversion Based on Eigenspace DNN
    Hashimoto, Tetsuya
    Saito, Daisuke
    Minematsu, Nobuaki
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (02) : 332 - 341
  • [29] Many-to-many Voice Conversion Based on Multiple Non-negative Matrix Factorization
    Aihara, Ryo
    Takiguchi, Testuya
    Ariki, Yasuo
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2749 - 2753
  • [30] Evaluation of a Singing Voice Conversion Method Based on Many-to-Many Eigenvoice Conversion
    Doi, Hironori
    Toda, Tomoki
    Nakano, Tomoyasu
    Goto, Masataka
    Nakamura, Satoshi
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1066 - 1070