Eigenvoice Conversion Based on Gaussian Mixture Model

被引：0

作者：

Toda, Tomoki ^{[1
]}

Ohtani, Yamato ^{[1
]}

Shikano, Kiyohiro ^{[1
]}

机构：

[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Nara, Japan

来源：

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 | 2006年

关键词：

speech synthesis; voice conversion; GMM; eigenvoice; unsupervised training;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes a novel framework of voice conversion (VC). We call it eigenvoice conversion (EVC). We apply EVC to the conversion from a source speaker's voice to arbitrary target speakers' voices. Using multiple parallel data sets consisting of utterance-pairs of the source and multiple pre-stored target speakers, a canonical eigenvoice GMM (EV-GMM) is trained in advance. That conversion model enables us to flexibly control the speaker individuality of the convened speech by manually setting weight parameters. In addition, the optimum weight set for a specific target speaker is estimated using only speech data of the target speaker without any linguistic restrictions. We evaluate the performance of EVC by a spectral distortion measure. Experimental results demonstrate that EVC works very well even if we use only a few utterances of the target speaker for the weight estimation.

引用

页码：2446 / 2449

页数：4

共 14 条

[1] Abe M., 1990, Journal of the Acoustical Society of Japan (E), V11, P71, DOI 10.1250/ast.11.71
[2] Anastasakos T, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P1137, DOI 10.1109/ICSLP.1996.607807
[3] SPEECH SPECTRUM CONVERSION BASED ON SPEAKER INTERPOLATION AND MULTIFUNCTIONAL REPRESENTATION WITH WEIGHTING BY RADIAL BASIS FUNCTION NETWORKS
IWAHASHI, N
SAGISAKA, Y
[J]. SPEECH COMMUNICATION, 1995, 16 (02) : 139 - 151
[4] Kain A, 1998, INT CONF ACOUST SPEE, P285, DOI 10.1109/ICASSP.1998.674423
[5] Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction:: Possible role of a repetitive structure in sounds
Kawahara, H
Masuda-Katsuse, I
de Cheveigné, A
[J]. SPEECH COMMUNICATION, 1999, 27 (3-4) : 187 - 207
[6] Rapid speaker adaptation in eigenvoice space
Kuhn, R
Junqua, JC
Nguyen, P
Niedzielski, N
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (06): : 695 - 707
[7] ACOUSTIC CHARACTERISTICS OF SPEAKER INDIVIDUALITY - CONTROL AND CONVERSION
KUWABARA, H
SAGISAKA, Y
[J]. SPEECH COMMUNICATION, 1995, 16 (02) : 165 - 173
[8] MIYANAGA K, 2004, P ICSLP JEJ ISL KOR
[9] Mouchtaris A, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P1
[10] Shichiri K., 2002, P ICSLP, V1, P1269

← 1 2 →