Phrase splicing and variable substitution using the IBM trainable speech synthesis system

被引:4
作者
Donovan, RE [1 ]
Franz, M [1 ]
Sorensen, JS [1 ]
Roukos, S [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Heights, NY 10598 USA
来源
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI | 1999年
关键词
D O I
10.1109/ICASSP.1999.758140
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper describes a phrase splicing and variable substitution system which offers an intermediate form of automated speech production lying in-between the extremes of recorded utterance playback and full Text-to-Speech synthesis. The system incorporates a trainable speech synthesiser and an application specific set of pre-recorded phrases. The text to be synthesised is converted to a phone sequence using phone sequences present in the pre-recorded phrases wherever possible, and a pronunciation dictionary elsewhere. The synthesis inventory of the synthesiser is augmented with the synthesis information associated with the pre-recorded phrases used to construct the phone sequence. The synthesiser then performs a dynamic programming search over the augmented inventory to select a segment sequence to produce the output speech. The system enables the seamless splicing of pre-recorded phrases both with other phrases and with synthetic speech. It enables very high quality speech to be produced automatically within a limited domain.
引用
收藏
页码:373 / 376
页数:4
相关论文
共 4 条
  • [1] BAHL LR, 1993, P ICASSP 93 MIN, V2, P632
  • [2] Donovan R., 1996, THESIS CAMBRIDGE U E
  • [3] DONOVAN RE, 1998, P ICSLP 98 SYDN
  • [4] PITCH-SYNCHRONOUS WAVE-FORM PROCESSING TECHNIQUES FOR TEXT-TO-SPEECH SYNTHESIS USING DIPHONES
    MOULINES, E
    CHARPENTIER, F
    [J]. SPEECH COMMUNICATION, 1990, 9 (5-6) : 453 - 467