INTONATIONAL PHRASE BREAK PREDICTION FOR TEXT-TO-SPEECH SYNTHESIS USING DEPENDENCY RELATIONS

被引:0
作者
Mishra, Taniya [1 ]
Kim, Yeon-jun [1 ]
Bangalore, Srinivas [1 ]
机构
[1] Interactions, 31 Hayward St, Franklin, MA 02038 USA
来源
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年
关键词
Intonational phrase; phrase breaks; IP prediction; prosody; text-analysis;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Intonational phrase (IP) break prediction is an important aspect of front-end analysis in a text-to-speech system. Standard approaches for intonational phrase break prediction rely on the use of linguistic rules or more recently, lexicalized data-driven models. Linguistic rules are not robust while data-driven models based on lexical identity do not generalize across domains. To overcome these challenges, in this paper, we explore the use of syntactic features to predict intonational phrase breaks. On a test set of over 40 thousand words, while a lexically driven IP break prediction model yields an F-score of 0.82, a non-lexicalized model that uses part-of-speech tags and dependency relations achieves an F-score of 0.81 with added feature of being more portable across domains. In this work, we also examine the effect of contextual information on prediction performance. Our evaluation shows that using a three-token left context in a POS-tag based model results in only a 2% drop in recall compared to a model that uses both a left and right context, which suggests the viability of using such a model for incremental text-to-speech system.
引用
收藏
页码:4919 / 4923
页数:5
相关论文
共 34 条
  • [31] Text-to-Speech Translation using Support Vector Machine, an approach to find a potential path for Human-Computer Speech Synthesizer
    Rashmi, S.
    Hanumanthappa, M.
    Jyothi, N. M.
    [J]. PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2016, : 1311 - 1315
  • [32] Prosody Correction Preserving Speaker Individuality for Chinese-Accented Japanese HMM-Based Text-to-Speech Synthesis
    Sekizawa, Daiki
    Takamichi, Shinnosuke
    Saruwatari, Hiroshi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (06): : 1218 - 1221
  • [33] Pause prediction from text for speech synthesis with user-definable pause insertion likelihood threshold
    Braunschweiler, Norbert
    Maia, Ranniery
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3191 - +
  • [34] GPR-based Thai speech synthesis using multi-level duration prediction
    Moungsri, Decha
    Koriyama, Tomoki
    Kobayashi, Takao
    [J]. SPEECH COMMUNICATION, 2018, 99 : 114 - 123