A Covariance-Tying Technique for HMM-Based Speech Synthesis

被引：10

作者：

Oura, Keiichiro ^{[1
]}

Zen, Heiga ^{[1
]}

Nankaku, Yoshihiko ^{[1
]}

Lee, Akinobu ^{[1
]}

Tokuda, Keiichi ^{[1
]}

机构：

[1] Nagoya Inst Technol, Dept Comp Sci & Engn, Nagoya, Aichi 4668555, Japan

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2010年 / E93D卷 / 03期

关键词：

HMM; speech synthesis; decision tree; context-clustering; MDL criterion; embedded device;

D O I：

10.1587/transinf.E93.D.595

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A technique for reducing the footprints of HMM-based speech synthesis systems by tying all covariance matrices of state distributions is described. HMM-based speech synthesis systems usually leave smaller footprints than unit-selection synthesis systems because they store statistics rather than speech waveforms. However, further reduction is essential to put them on embedded devices, which have limited memory. In accordance with the empirical knowledge that covariance matrices have a smaller impact on the quality of synthesized speech than mean vectors, we propose a technique for clustering mean vectors while tying all covariance matrices. Subjective listening test results showed that the proposed technique can shrink the footprints of an HMM-based speech synthesis system while retaining the quality of the synthesized speech.

引用

页码：595 / 601

页数：7

共 24 条

[1]

[Anonymous], 1999, P EUROSPEECH

[2]

[Anonymous], THESIS CAMBRIDGE U

[3]

Black Alan W., 1994, P C COMP LING KYOT J, P983

[4]

Chou W, 1999, INT CONF ACOUST SPEE, P345

[5]

DONOVAN RE, 1995, INT CONF ACOUST SPEE, P640, DOI 10.1109/ICASSP.1995.479679

[6] Semi-tied covariance matrices for hidden Markov models [J].

Gales, MJF .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (03) :272-281

[7]

Hunt AJ, 1996, INT CONF ACOUST SPEE, P373, DOI 10.1109/ICASSP.1996.541110

[8]

HWANG MY, 1993, P ICASSP, P311

[9]

KATO T, 2000, IEICE T INF SYST JAP, P2128

[10] Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction:: Possible role of a repetitive structure in sounds [J].

Kawahara, H ;

Masuda-Katsuse, I ;

de Cheveigné, A .

SPEECH COMMUNICATION, 1999, 27 (3-4) :187-207

← 1 2 3 →