Non-parallel training for voice conversion by maximum likelihood constrained adaptation

被引:0
|
作者
Mouchtaris, A [1 ]
Van der Spiegel, J [1 ]
Mueller, P [1 ]
机构
[1] Univ Penn, Dept Elect & Syst Engn, Philadelphia, PA 19104 USA
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The objective of voice conversion methods is to modify the speech characteristics of a particular speaker in such manner, as to sound like speech by a different target speaker. Current voice conversion algorithms are based on deriving a conversion function by estimating its parameters through a corpus that contains the same utterances spoken by both speakers. Such a corpus, usually referred to as a parallel corpus, has the disadvantage that many times it is difficult or even impossible to collect. Here, we propose a voice conversion method that does not require a parallel corpus for training, i.e. the spoken utterances by the two speakers need not be the same, by employing speaker adaptation techniques to adapt to a particular pair of source and target speakers, the derived conversion parameters from a different pair of speakers. We show that adaptation reduces the error obtained when simply applying the conversion parameters of one pair of speakers to another by a factor that can reach 30% in many cases, and with performance comparable with the ideal case when a parallel corpus is available.
引用
收藏
页码:1 / 4
页数:4
相关论文
共 50 条
  • [11] Non-Parallel Voice Conversion with Cyclic Variational Autoencoder
    Tobing, Patrick Lumban
    Wu, Yi-Chiao
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    INTERSPEECH 2019, 2019, : 674 - 678
  • [12] Transferring Source Style in Non-Parallel Voice Conversion
    Liu, Songxiang
    Cao, Yuewen
    Kang, Shiyin
    Hu, Na
    Liu, Xunying
    Su, Dan
    Yu, Dong
    Meng, Helen
    INTERSPEECH 2020, 2020, : 4721 - 4725
  • [13] SPEAKER ADAPTIVE MODEL BASED ON BOLTZMANN MACHINE FOR NON-PARALLEL TRAINING IN VOICE CONVERSION
    Nakashika, Torsi
    Minami, Yasuhiro
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5530 - 5534
  • [14] Transfer Learning From Speech Synthesis to Voice Conversion With Non-Parallel Training Data
    Zhang, Mingyang
    Zhou, Yi
    Zhao, Li
    Li, Haizhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1290 - 1302
  • [15] Parallel vs. Non-parallel Voice Conversion for Esophageal Speech
    Serrano, Luis
    Raman, Sneha
    Tavarez, David
    Navas, Eva
    Hernaez, Inma
    INTERSPEECH 2019, 2019, : 4549 - 4553
  • [16] MAP-BASED ADAPTATION FOR SPEECH CONVERSION USING ADAPTATION DATA SELECTION AND NON-PARALLEL TRAINING
    Lee, Chung-Han
    Wu, Chung-Hsien
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2254 - 2257
  • [17] Non-parallel Voice Conversion using Generative Adversarial Networks
    Hasunuma, Yuta
    Hirayama, Chiaki
    Kobayashi, Masayuki
    Nagao, Tomoharu
    2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 1635 - 1640
  • [18] StyleVC: Non-Parallel Voice Conversion with Adversarial Style Generalization
    Hwang, In-Sun
    Lee, Sang-Hoon
    Lee, Seong-Whan
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 23 - 30
  • [19] A novel method for voice conversion based on non-parallel corpus
    Sayadian A.
    Mozaffari F.
    International Journal of Speech Technology, 2017, 20 (3) : 587 - 592
  • [20] TONGUE SHAPE CONVERSION WITH NON-PARALLEL TRAINING DATA
    Li, Hao
    Yang, Minghao
    Tao, Jianhua
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,