Thousands of Voices for HMM-based Speech Synthesis

被引：0

作者：

Yamagishi, Junichi ^{[1
]}

Usabaev, Bela ^{[2
]}

King, Simon ^{[1
]}

Watts, Oliver ^{[1
]}

Dines, John ^{[3
]}

Tian, Jilei ^{[4
]}

Hu, Rile ^{[4
]}

Guan, Yong ^{[4
]}

Oura, Keiichiro ^{[5
]}

Tokuda, Keiichi ^{[5
]}

Karhila, Reima

Kurimo, Mikko

机构：

[1] Univ Edinburgh, Edinburgh EH8 9AB, Midlothian, Scotland

[2] Univ Tubingen, D-72074 Tubingen, Germany

[3] Idiap Res Inst, CH-1920 Martigny, Switzerland

[4] Nokia Res Ctr, Beijing 100176, Peoples R China

[5] Nagoya Inst Technol, Dept Comp Sci & Engn, Nagoya, Aichi 4668555, Japan

来源：

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 | 2009年

基金：

英国工程与自然科学研究理事会;

关键词：

speech synthesis; HMMs; speaker adaptation;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Our recent experiments with HMM-based speech synthesis systems have demonstrated that speaker-adaptive HMM-based speech synthesis (which uses an 'average voice model' plus model adaptation) is robust to non-ideal speech data that are recorded under various conditions and with varying microphones, that are not perfectly clean, and/or that lack of phonetic balance. This enables us consider building high-quality voices on 'non-TTS' corpora such as ASR corpora. Since ASR corpora generally include a large number of speakers, this leads to the possibility of producing an enormous number of voices automatically. In this paper we show thousands of voices for HMM-based speech synthesis that we have made from several popular ASR corpora such as the Wall Street Journal databases (WSJ0/WSJ1/WSJCAM0), Resource Management, Globalphone and Speecon. We report some perceptual evaluation results and outline the outstanding issues.

引用

页码：416 / +

页数：2

共 6 条

[1]

[Anonymous], 2001, MULTIDIMENSIONAL SCA

[2]

FITT S, 1999, P EUR, V2, P823

[3]

Karaiskos V., 2008, P BLIZZ CHALL WORKSH

[4]

TOKUDA K, HHM BASED SPEECH SYN

[5]

YAMAGISHI J, 2009, IEEE T SPEE IN PRESS

[6]

Yoshimura T., 1999, PROC 6 EUR C SPEECH, P2374

← 1 →