Non-parallel training for voice conversion by maximum likelihood constrained adaptation

被引:0
|
作者
Mouchtaris, A [1 ]
Van der Spiegel, J [1 ]
Mueller, P [1 ]
机构
[1] Univ Penn, Dept Elect & Syst Engn, Philadelphia, PA 19104 USA
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The objective of voice conversion methods is to modify the speech characteristics of a particular speaker in such manner, as to sound like speech by a different target speaker. Current voice conversion algorithms are based on deriving a conversion function by estimating its parameters through a corpus that contains the same utterances spoken by both speakers. Such a corpus, usually referred to as a parallel corpus, has the disadvantage that many times it is difficult or even impossible to collect. Here, we propose a voice conversion method that does not require a parallel corpus for training, i.e. the spoken utterances by the two speakers need not be the same, by employing speaker adaptation techniques to adapt to a particular pair of source and target speakers, the derived conversion parameters from a different pair of speakers. We show that adaptation reduces the error obtained when simply applying the conversion parameters of one pair of speakers to another by a factor that can reach 30% in many cases, and with performance comparable with the ideal case when a parallel corpus is available.
引用
收藏
页码:1 / 4
页数:4
相关论文
共 50 条
  • [41] Non-parallel dictionary learning for voice conversion using non-negative Tucker decomposition
    Takashima, Yuki
    Nakashika, Toru
    Takiguchi, Tetsuya
    Ariki, Yasuo
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2019, 2019 (01)
  • [42] Non-parallel dictionary learning for voice conversion using non-negative Tucker decomposition
    Yuki Takashima
    Toru Nakashika
    Tetsuya Takiguchi
    Yasuo Ariki
    EURASIP Journal on Audio, Speech, and Music Processing, 2019
  • [43] Non-parallel Voice Conversion with Fewer Labeled Data by Conditional Generative Adversarial Networks
    Chen, Minchuan
    Hou, Weijian
    Ma, Jun
    Wang, Shaojun
    Xiao, Jing
    INTERSPEECH 2020, 2020, : 4716 - 4720
  • [44] Mixture of Factor Analyzers Using Priors From Non-Parallel Speech for Voice Conversion
    Wu, Zhizheng
    Kinnunen, Tomi
    Chng, Eng Siong
    Li, Haizhou
    IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (12) : 914 - 917
  • [45] Enhanced Variational Auto-encoder for Voice Conversion Using Non-parallel Corpora
    Huang Guojie
    Jin Hui
    Yu Yibiao
    PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 46 - 49
  • [46] ACVAE-VC: Non-Parallel Voice Conversion With Auxiliary Classifier Variational Autoencoder
    Katneoka, Hirokazu
    Kaneko, Takuhiro
    Tanaka, Kou
    Hojo, Nobukatsu
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (09) : 1432 - 1443
  • [47] Unsupervised Vocal Tract Length Warped Posterior Features for Non-Parallel Voice Conversion
    Shah, Nirmesh J.
    Madhavi, Maulik C.
    Patil, Hemant A.
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1968 - 1972
  • [48] Fast Model Alignment for Structured Statistical Approach of Non-parallel Corpora Voice Conversion
    Che, Yingxia
    Yu, Yibiao
    2014 4TH IEEE INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2014, : 88 - 92
  • [49] Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion
    Ding, Shaojin
    Gutierrez-Osuna, Ricardo
    INTERSPEECH 2019, 2019, : 724 - 728
  • [50] Non-Parallel Sequence-to-Sequence Voice Conversion With Disentangled Linguistic and Speaker Representations
    Zhang, Jing-Xuan
    Ling, Zhen-Hua
    Dai, Li-Rong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 540 - 552