Minimum unit selection error training for HMM-based unit selection speech synthesis system

被引：0

作者：

Ling, Zhen-Hua ^{[1
]}

Wang, Ren-Hua ^{[1
]}

机构：

[1] Univ Sci & Technol China, iFlytek Speech Lab, Hefei, Anhui, Peoples R China

来源：

2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12 | 2008年

关键词：

speech synthesis; unit selection; HMM; minimum unit selection error; discriminative training;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a minimum unit selection error (MUSE) training method for HAM-based unit selection speech synthesis system, which selects the optimal phone-sized unit sequence from the speech database by maximizing the combined likelihood of a group of trained HMMs. Under MUSE criterion, the weights and distribution parameters of these HMMs are estimated to minimize the number of different units between the selected phone sequences and the natural phone sequences for the training sentences. The optimization is realized by discriminative training using generalized probabilistic descent (GPD) algorithm. Results of our experiment show that this proposed method is able to improve the performance of the baseline system where model weights are set manually and distribution parameters are trained under maximum likelihood criterion.

引用

页码：3949 / 3952

页数：4

共 8 条

[1] MULTIDIMENSIONAL STOCHASTIC APPROXIMATION METHODS [J].

BLUM, JR .

ANNALS OF MATHEMATICAL STATISTICS, 1954, 25 (04) :737-744

[2]

Hunt AJ, 1996, INT CONF ACOUST SPEE, P373, DOI 10.1109/ICASSP.1996.541110

[3] Minimum classification error rate methods for speech recognition [J].

Juang, BH ;

Chou, W ;

Lee, CH .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1997, 5 (03) :257-265

[4] Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction:: Possible role of a repetitive structure in sounds [J].

Kawahara, H ;

Masuda-Katsuse, I ;

de Cheveigné, A .

SPEECH COMMUNICATION, 1999, 27 (3-4) :187-207

[5]

Ling Z-H, 2007, BLIZZ CHALL WORKSH

[6]

Ling ZH, 2007, INT CONF ACOUST SPEE, P1245

[7] Hidden Markov models based on multi-space probability distribution for pitch pattern modeling [J].

Tokuda, K ;

Masuko, T ;

Miyazaki, N ;

Kobayashi, T .

ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, :229-232

[8]

YOSHIMURA T, 1999, P EUROSPEECH, V5, P2347

← 1 →