Eigenvoice Conversion Based on Gaussian Mixture Model

被引:0
作者
Toda, Tomoki [1 ]
Ohtani, Yamato [1 ]
Shikano, Kiyohiro [1 ]
机构
[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Nara, Japan
来源
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 | 2006年
关键词
speech synthesis; voice conversion; GMM; eigenvoice; unsupervised training;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a novel framework of voice conversion (VC). We call it eigenvoice conversion (EVC). We apply EVC to the conversion from a source speaker's voice to arbitrary target speakers' voices. Using multiple parallel data sets consisting of utterance-pairs of the source and multiple pre-stored target speakers, a canonical eigenvoice GMM (EV-GMM) is trained in advance. That conversion model enables us to flexibly control the speaker individuality of the convened speech by manually setting weight parameters. In addition, the optimum weight set for a specific target speaker is estimated using only speech data of the target speaker without any linguistic restrictions. We evaluate the performance of EVC by a spectral distortion measure. Experimental results demonstrate that EVC works very well even if we use only a few utterances of the target speaker for the weight estimation.
引用
收藏
页码:2446 / 2449
页数:4
相关论文
共 14 条
  • [1] Abe M., 1990, Journal of the Acoustical Society of Japan (E), V11, P71, DOI 10.1250/ast.11.71
  • [2] Anastasakos T, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P1137, DOI 10.1109/ICSLP.1996.607807
  • [3] SPEECH SPECTRUM CONVERSION BASED ON SPEAKER INTERPOLATION AND MULTIFUNCTIONAL REPRESENTATION WITH WEIGHTING BY RADIAL BASIS FUNCTION NETWORKS
    IWAHASHI, N
    SAGISAKA, Y
    [J]. SPEECH COMMUNICATION, 1995, 16 (02) : 139 - 151
  • [4] Kain A, 1998, INT CONF ACOUST SPEE, P285, DOI 10.1109/ICASSP.1998.674423
  • [5] Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction:: Possible role of a repetitive structure in sounds
    Kawahara, H
    Masuda-Katsuse, I
    de Cheveigné, A
    [J]. SPEECH COMMUNICATION, 1999, 27 (3-4) : 187 - 207
  • [6] Rapid speaker adaptation in eigenvoice space
    Kuhn, R
    Junqua, JC
    Nguyen, P
    Niedzielski, N
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (06): : 695 - 707
  • [7] ACOUSTIC CHARACTERISTICS OF SPEAKER INDIVIDUALITY - CONTROL AND CONVERSION
    KUWABARA, H
    SAGISAKA, Y
    [J]. SPEECH COMMUNICATION, 1995, 16 (02) : 165 - 173
  • [8] MIYANAGA K, 2004, P ICSLP JEJ ISL KOR
  • [9] Mouchtaris A, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P1
  • [10] Shichiri K., 2002, P ICSLP, V1, P1269