Pause prediction from text for speech synthesis with user-definable pause insertion likelihood threshold

被引:0
作者
Braunschweiler, Norbert [1 ]
Maia, Ranniery [1 ]
机构
[1] Toshiba Res Europe Ltd, Cambridge Res Lab, Cambridge, England
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
pause prediction; phrasing; prosody; speech synthesis; machine learning;
D O I
10.21437/Interspeech.2016-752
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Predicting the location of pauses from text is an important aspect for speech synthesizers. The accuracy of pause prediction can significantly influence both naturalness and intelligibility. Pauses which help listeners to better parse the synthesized speech into meaningful units are deemed to increase naturalness and intelligibility ratings, while pauses in unexpected or incorrect locations can reduce these ratings and cause confusion. This paper presents a multi-stage pause prediction approach including first prosodic chunk prediction, followed by a feature scoring algorithm and finally a pause sequence evaluation module. Preference tests showed that the new method outperformed a pauses-at-punctuation baseline while not yet matching human performance. In addition, the approach includes two more functionalities: (1) a user-specifiable pause insertion rate and (2) multiple output formats in the form of binary pauses, multi-level pauses or as a score reflecting pause strength.
引用
收藏
页码:3191 / +
页数:3
相关论文
共 24 条
  • [1] [Anonymous], 1993, C4.5: Programming for machine learning
  • [2] Atterer M., 2002, P SPEECH PROS 2002 A
  • [3] Bachenko J., 1990, Computational Linguistics, V16, P155
  • [4] Bell P., 2006, SPEECH PROS 2006
  • [5] Processing consequences of superfluous and missing prosodic breaks in auditory sentence comprehension
    Bogels, Sara
    Schriefers, Herbert
    Vonk, Wietske
    Chwilla, Dorothee J.
    Kerkhofs, Roe
    [J]. NEUROPSYCHOLOGIA, 2013, 51 (13) : 2715 - 2728
  • [6] Brierley C., 2011, THESIS
  • [7] Burrows T., 2005, P 9 EUR C SPEECH COM, P1829
  • [8] Chen Q, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P1581
  • [9] Ingulfsen T., 2004, UCAMCLTR610
  • [10] Keri V., 2007, P INT C NAT LANG PRO