Nonparallel training for voice conversion based on a parameter adaptation approach

被引:77
作者
Mouchtaris, A
Van der Spiegel, J
Mueller, P
机构
[1] Univ Penn, Dept Elect & Syst Engn, Philadelphia, PA 19104 USA
[2] Corticon Inc, King Of Prussia, PA 19406 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 03期
关键词
Gaussian mixture model; speaker adaptation; text-to-speech synthesis; voice conversion;
D O I
10.1109/TSA.2005.857790
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The objective of voice conversion algorithms is to modify the speech by a particular source speaker so that it sounds as if spoken by a different target speaker. Current conversion algorithms employ a training procedure, during which the same utterances spoken by both the source and target speakers are needed for deriving the desired conversion parameters. Such a (parallel) corpus, is often difficult or impossible to collect. Here, we propose an algorithm that relaxes this constraint, i.e., the training corpus does not necessarily contain the same utterances from both speakers. The proposed algorithm is based on speaker adaptation techniques, adapting the conversion parameters derived for a particular pair of speakers to a different pair, for which only a nonparallel corpus is available. We show that adaptation reduces the error obtained when simply applying the conversion parameters of one pair of speakers to another by a factor that can reach 30%. A speaker identification measure is also employed that more insightfully portrays the importance of adaptation, while listening tests confirm the success of our method. Both the objective and subjective tests employed, demonstrate that the proposed algorithm achieves comparable results with the ideal case when a parallel corpus is available.
引用
收藏
页码:952 / 963
页数:12
相关论文
共 31 条
  • [11] Kain A, 1998, INT CONF ACOUST SPEE, P285, DOI 10.1109/ICASSP.1998.674423
  • [12] Kain Alexander., 2001, HIGH RESOLUTION VOIC
  • [13] Kumar A, 2003, 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P720
  • [14] ACOUSTIC CHARACTERISTICS OF SPEAKER INDIVIDUALITY - CONTROL AND CONVERSION
    KUWABARA, H
    SAGISAKA, Y
    [J]. SPEECH COMMUNICATION, 1995, 16 (02) : 165 - 173
  • [15] SPEECH ANALYSIS SYNTHESIS BASED ON A SINUSOIDAL REPRESENTATION
    MCAULAY, RJ
    QUATIERI, TF
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (04): : 744 - 754
  • [16] MOKBEL C, 1991, INT CONF ACOUST SPEE, P925, DOI 10.1109/ICASSP.1991.150491
  • [17] Multichannel audio synthesis by subband-based spectral conversion and parameter adaptation
    Mouchtaris, A
    Narayanan, SS
    Kyriakakis, C
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (02): : 263 - 274
  • [18] Mouchtaris A, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P1
  • [19] Mouchtaris A, 2002, CONF REC ASILOMAR C, P227
  • [20] MOUCHTARIS A, 2004, IEEE INT C MULT EXP