Continuous probabilistic transform for voice conversion

被引:707
作者
Stylianou, Y [1 ]
Cappe, O [1 ]
Moulines, E [1 ]
机构
[1] AT&T Bell Labs, Res, Murray Hill, NJ 07974 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1998年 / 6卷 / 02期
关键词
D O I
10.1109/89.661472
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Voice conversion, as considered in this paper, is defined as modifying the speech signal of one speaker (source speaker) so that it sounds as if it had been pronounced by a different speaker (target speaker), Our contribution includes the design of a new methodology for representing the relationship between two sets of spectral envelopes, The proposed method is based on the use of a Gaussian mixture model of the source speaker spectral envelopes, The conversion itself is represented by a continuous parametric function which takes into account the probabilistic classification provided by the mixture model. The parameters of the conversion function are estimated by least squares optimization on the training data, This conversion method is implemented in the context of the HNM (harmonic + noise model) system, which allows high-quality modifications of speech signals, Compared to earlier methods based on vector quantization, the proposed conversion scheme results in a much better match between the converted envelopes and the target envelopes, Evaluation by objective tests and formal listening tests shows that the proposed transform greatly improves the quality and naturalness of the converted speech signals compared with previous proposed conversion methods.
引用
收藏
页码:131 / 142
页数:12
相关论文
共 51 条
  • [1] A Reynolds D., 1992, GAUSSIAN MIXTURE MOD
  • [2] Abe M., 1990, Journal of the Acoustical Society of Japan (E), V11, P71, DOI 10.1250/ast.11.71
  • [3] ABE M, 1988, P ICASSP, P655
  • [4] Regularization techniques for discrete cepstrum estimation
    Cappe, O
    Moulines, E
    [J]. IEEE SIGNAL PROCESSING LETTERS, 1996, 3 (04) : 100 - 102
  • [5] CAPPE O, 1995, P IEEE ASSP WORKSH A
  • [6] CHATFIELD C, 1980, INTRO MULTIVARIATE A
  • [7] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [8] DEMPSTER AP, 1977, J ROY STAT SOC B, V39, P22
  • [9] SPEAKER RECOGNITION - IDENTIFYING PEOPLE BY THEIR VOICES
    DODDINGTON, GR
    [J]. PROCEEDINGS OF THE IEEE, 1985, 73 (11) : 1651 - 1664
  • [10] Duda R. O., 1973, PATTERN CLASSIFICATI, V3