A Survey on Generative Adversarial Networks based Models for Many-to-many Non-parallel Voice Conversion

被引:3
|
作者
Alaa, Yasmin [1 ]
Alfonse, Marco [1 ]
Aref, Mostafa M. [1 ]
机构
[1] Ain Shams Univ, Dept Comp Sci, Fac Comp & Informat Sci, Cairo, Egypt
关键词
Voice Conversion; many-to-many Voice Conversion; non-parallel Voice Conversion; mono-lingual Voice Conversion; Generative Adversarial Networks (GANs); StarGAN-VC; CycleGAN-VC; RECOGNITION;
D O I
10.1109/ICCI54321.2022.9756059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Voice Conversion (VC) is a task of converting speaker-dependent features of a source speaker's speech without changing the linguistic content. There are many successful VC systems, each trying to overcome some challenges. These challenges include the unavailability of parallel data and solving problems due to the language difference between the source and target speech. Also, one of these challenges is extending the VC system to cover a conversion across many source and target domains with minimal cost. Generative Adversarial Networks (GANs) are showing promising VC results. This work focuses on exploring many-to-many non-parallel GAN-based mono-lingual VC models (nine models that are highly cited), explains the used evaluation methods including objective and subjective methods (eight evaluation methods are presented), and comments on these models.
引用
收藏
页码:221 / 226
页数:6
相关论文
共 50 条
  • [1] Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star Generative Adversarial Networks
    Zhao, Shengkui
    Nguyen, Trung Hieu
    Wang, Hao
    Ma, Bin
    INTERSPEECH 2019, 2019, : 689 - 693
  • [2] Non-parallel Many-to-many Singing Voice Conversion by Adversarial Learning
    Hu, Jinsen
    Yu, Chunyan
    Guan, Faqian
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 125 - 132
  • [3] STARGAN-VC: NON-PARALLEL MANY-TO-MANY VOICE CONVERSION USING STAR GENERATIVE ADVERSARIAL NETWORKS
    Kameoka, Hirokazu
    Kaneko, Takuhiro
    Tanaka, Kou
    Hojo, Nobukatsu
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 266 - 273
  • [4] Improving the Speaker Identity of Non-Parallel Many-to-Many Voice Conversion with Adversarial Speaker Recognition
    Ding, Shaojin
    Zhao, Guanlong
    Gutierrez-Osuna, Ricardo
    INTERSPEECH 2020, 2020, : 776 - 780
  • [5] Non-parallel Many-to-many Voice Conversion with PSR-StarGAN
    Li, Yanping
    Xu, Dongxiang
    Zhang, Yan
    Wang, Yang
    Chen, Binbin
    INTERSPEECH 2020, 2020, : 781 - 785
  • [6] NON-PARALLEL MANY-TO-MANY VOICE CONVERSION USING LOCAL LINGUISTIC TOKENS
    Wang, Chao
    Yu, Yibiao
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5929 - 5933
  • [7] NON-PARALLEL TRAINING FOR MANY-TO-MANY EIGENVOICE CONVERSION
    Ohtani, Yamato
    Toda, Tomoki
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4822 - 4825
  • [8] Many-to-Many Voice Conversion based on Bottleneck Features with Variational Autoencoder for Non-parallel Training Data
    Li, Yanping
    Lee, Kong Aik
    Yuan, Yougen
    Li, Haizhou
    Yang, Zhen
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 829 - 833
  • [9] Non-parallel Voice Conversion using Generative Adversarial Networks
    Hasunuma, Yuta
    Hirayama, Chiaki
    Kobayashi, Masayuki
    Nagao, Tomoharu
    2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 1635 - 1640
  • [10] TEXT-FREE NON-PARALLEL MANY-TO-MANY VOICE CONVERSION USING NORMALISING FLOWS
    Merritt, Thomas
    Ezzerg, Abdelhamid
    Bilinski, Piotr
    Proszewska, Magdalena
    Pokora, Kamil
    Barra-Chicote, Roberto
    Korzekwa, Daniel
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6782 - 6786