Phrase splicing and variable substitution using the IBM trainable speech synthesis system

被引：4

作者：

Donovan, RE ^{[1
]}

Franz, M ^{[1
]}

Sorensen, JS ^{[1
]}

Roukos, S ^{[1
]}

机构：

[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Heights, NY 10598 USA

来源：

ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI | 1999年

关键词：

D O I：

10.1109/ICASSP.1999.758140

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper describes a phrase splicing and variable substitution system which offers an intermediate form of automated speech production lying in-between the extremes of recorded utterance playback and full Text-to-Speech synthesis. The system incorporates a trainable speech synthesiser and an application specific set of pre-recorded phrases. The text to be synthesised is converted to a phone sequence using phone sequences present in the pre-recorded phrases wherever possible, and a pronunciation dictionary elsewhere. The synthesis inventory of the synthesiser is augmented with the synthesis information associated with the pre-recorded phrases used to construct the phone sequence. The synthesiser then performs a dynamic programming search over the augmented inventory to select a segment sequence to produce the output speech. The system enables the seamless splicing of pre-recorded phrases both with other phrases and with synthetic speech. It enables very high quality speech to be produced automatically within a limited domain.

引用

页码：373 / 376

页数：4