IMPROVING VOICE QUALITY OF HMM-BASED SPEECH SYNTHESIS USING VOICE CONVERSION METHOD

被引：0

作者：

Jiao, Yishan ^{[1
]}

Xie, Xiang ^{[1
]}

Na, Xingyu ^{[1
]}

Tu, Ming ^{[1
]}

机构：

[1] Beijing Inst Technol, Sch Informat & Elect, Beijing 100081, Peoples R China

来源：

2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年

关键词：

HMM-based speech synthesis; voice conversion; local linear transformation; temporal decomposition;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

HMM-based speech synthesis system (HTS) often generates buzzy and muffled speech. Such degradation of voice quality makes synthetic speech sound robotically rather than naturally. From this point, we suppose that synthetic speech is in a different speaker space apart from the original. We propose to use voice conversion method to transform synthetic speech toward the original so as to improve its quality. Local linear transformation (LLT) combined with temporal decomposition (TD) is proposed as the conversion method. It can not only ensure smooth spectral conversion but also avoid over-smoothing problem. Moreover, we design a robust spectral selection and modification strategy to make the modified spectra stable. Preference test shows that the proposed method can improve the quality of HMM-based speech synthesis.

引用

页数：5

共 18 条

[1]

Atal B. S., 1983, Proceedings of ICASSP 83. IEEE International Conference on Acoustics, Speech and Signal Processing, P81

[2]

HUNT A, 1996, P ICASSP, P373

[3]

Imai S., 2009, SPEECH SIGNAL PROCES

[4]

Kain A., 1998, P IEEE INT C AC SPEE, P258

[5] Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction:: Possible role of a repetitive structure in sounds [J].

Kawahara, H ;

Masuda-Katsuse, I ;

de Cheveigné, A .

SPEECH COMMUNICATION, 1999, 27 (3-4) :187-207

[6]

Nguyen B., 2009, P INTERSPEECH, P1631

[7]

Nguyen P. C., 2003, IEICE T INFORM SYS D, VE86-D

[8]

Phonetics Lab Institute of Linguistics CASS, 2006, ASCCD READ DISC CORP

[9]

Popa V., 2012, P IEEE INT C AC SPEE

[10] A speech parameter generation algorithm considering global variance for HMM-based speech synthesis [J].

Toda, Tomoki ;

Tokuda, Keiichi .

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (05) :816-824

← 1 2 →