Pause prediction from text for speech synthesis with user-definable pause insertion likelihood threshold

被引:0
作者
Braunschweiler, Norbert [1 ]
Maia, Ranniery [1 ]
机构
[1] Toshiba Res Europe Ltd, Cambridge Res Lab, Cambridge, England
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
pause prediction; phrasing; prosody; speech synthesis; machine learning;
D O I
10.21437/Interspeech.2016-752
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Predicting the location of pauses from text is an important aspect for speech synthesizers. The accuracy of pause prediction can significantly influence both naturalness and intelligibility. Pauses which help listeners to better parse the synthesized speech into meaningful units are deemed to increase naturalness and intelligibility ratings, while pauses in unexpected or incorrect locations can reduce these ratings and cause confusion. This paper presents a multi-stage pause prediction approach including first prosodic chunk prediction, followed by a feature scoring algorithm and finally a pause sequence evaluation module. Preference tests showed that the new method outperformed a pauses-at-punctuation baseline while not yet matching human performance. In addition, the approach includes two more functionalities: (1) a user-specifiable pause insertion rate and (2) multiple output formats in the form of binary pauses, multi-level pauses or as a score reflecting pause strength.
引用
收藏
页码:3191 / +
页数:3
相关论文
共 24 条
  • [11] Liu F., 2012, 2011 INT C EL COMM A, P811
  • [12] Miranda J, 2013, 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P132, DOI 10.1109/ASRU.2013.6707718
  • [13] Mishra T, 2015, INT CONF ACOUST SPEE, P4919, DOI 10.1109/ICASSP.2015.7178906
  • [14] Ostendorf M., 1989, J COMPUTATIONAL LING, V20, P26
  • [15] Parlikar A., 2013, P 8 ISCA SPEECH SYNT
  • [16] Sentence segmentation and punctuation recovery for spoken language translation
    Paulik, Matthias
    Rao, Sharath
    Lane, Ian
    Vogel, Stephan
    Schultz, Tanja
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 5105 - 5108
  • [17] Stochastic and syntactic techniques for predicting phrase breaks
    Read, Ian
    Cox, Stephen
    [J]. COMPUTER SPEECH AND LANGUAGE, 2007, 21 (03) : 519 - 542
  • [18] Rosenberg A, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P3066
  • [19] Silverman Kim., 1992, INT C SPOK LANG, P12
  • [20] Sorin C., 1987, P 11 ICPHS TALL EST, P125