VOICE CONVERSION USING CONDITIONAL RESTRICTED BOLTZMANN MACHINE

被引:0
作者
Zhu, Fengyun [1 ]
Fan, Ziye [1 ]
Wu, Xihong [1 ]
机构
[1] Peking Univ, Sch Elect Engn & Comp Sci, Minist Educ, Speech & Hearing Res Ctr,Key Lab Machine Percept, Beijing 100871, Peoples R China
来源
2014 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (CHINASIP) | 2014年
关键词
Voice conversion; Conditional restricted Boltzmann machine; TRANSFORMATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we proposed a new method for voice conversion using conditional restricted Boltzmann machine (Conditional RBM, CRBM). The joint distribution of source and target acoustic features are modeled by the RBM part of the model. Short-term temporal constraints are introduced by conditioning on contextual frames, say, the past and future frames of the source speaker. In contrast to conventional methods, temporal structure of the data could be modeled without using dynamic features. Objective and subjective experiments were conducted to evaluate the method. Experimental results show that short-term temporal structure could be modeled well by CRBM, and the proposed method outperforms conventional joint density Gaussian mixture models based method significantly.
引用
收藏
页码:110 / 114
页数:5
相关论文
共 16 条
  • [1] Abe M., 1988, ICASSP 88: 1988 International Conference on Acoustics, Speech, and Signal Processing (Cat. No.88CH2561-9), P655, DOI 10.1109/ICASSP.1988.196671
  • [2] [Anonymous], 2007, ADV NEURAL INFORM PR
  • [3] Black A., 2002, FESTVOX BUILDING SYN
  • [4] Chen LH, 2013, INTERSPEECH, P3051
  • [5] Training products of experts by minimizing contrastive divergence
    Hinton, GE
    [J]. NEURAL COMPUTATION, 2002, 14 (08) : 1771 - 1800
  • [6] Kain A, 1998, INT CONF ACOUST SPEE, P285, DOI 10.1109/ICASSP.1998.674423
  • [7] Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction:: Possible role of a repetitive structure in sounds
    Kawahara, H
    Masuda-Katsuse, I
    de Cheveigné, A
    [J]. SPEECH COMMUNICATION, 1999, 27 (3-4) : 187 - 207
  • [8] Kominek John, 2004, 5 ISCA WORKSHOP SPEE
  • [9] PHONE RECOGNITION USING RESTRICTED BOLTZMANN MACHINES
    Mohamed, Abdel-rahman
    Hinton, Geoffrey
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4354 - 4357
  • [10] Nakashika T, 2013, INTERSPEECH, P369