INTONATIONAL PHRASE BREAK PREDICTION FOR TEXT-TO-SPEECH SYNTHESIS USING DEPENDENCY RELATIONS

被引:0
作者
Mishra, Taniya [1 ]
Kim, Yeon-jun [1 ]
Bangalore, Srinivas [1 ]
机构
[1] Interactions, 31 Hayward St, Franklin, MA 02038 USA
来源
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年
关键词
Intonational phrase; phrase breaks; IP prediction; prosody; text-analysis;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Intonational phrase (IP) break prediction is an important aspect of front-end analysis in a text-to-speech system. Standard approaches for intonational phrase break prediction rely on the use of linguistic rules or more recently, lexicalized data-driven models. Linguistic rules are not robust while data-driven models based on lexical identity do not generalize across domains. To overcome these challenges, in this paper, we explore the use of syntactic features to predict intonational phrase breaks. On a test set of over 40 thousand words, while a lexically driven IP break prediction model yields an F-score of 0.82, a non-lexicalized model that uses part-of-speech tags and dependency relations achieves an F-score of 0.81 with added feature of being more portable across domains. In this work, we also examine the effect of contextual information on prediction performance. Our evaluation shows that using a three-token left context in a POS-tag based model results in only a 2% drop in recall compared to a model that uses both a left and right context, which suggests the viability of using such a model for incremental text-to-speech system.
引用
收藏
页码:4919 / 4923
页数:5
相关论文
共 34 条
  • [21] New Method for Delexicalization and its Application to Prosodic Tagging for Text-to-Speech Synthesis
    Vainio, Martti
    Suni, Antti
    Raitio, Tuomo
    Nurminen, Jani
    Jarvikivi, Juhani
    Alku, Paavo
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1671 - 1674
  • [22] A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese
    Chou, FC
    Tseng, CY
    Lee, LS
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (07): : 481 - 494
  • [23] A Novel Intonation Model to Improve the Quality of Tamil Text-to-Speech Synthesis System
    Rajeswari, K. C.
    UmaMaheswari, P.
    2014 SIXTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, 2014, : 335 - 340
  • [24] Planning the development of text-to-speech synthesis models and datasets with dynamic deep learning
    Ahmad, Hawraz A.
    Rashid, Tarik A.
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (07)
  • [25] End-of-Utterance Prediction by Prosodic Features and Phrase-Dependency Structure in Spontaneous Japanese Speech
    Ishimoto, Yuichi
    Teraoka, Takehiro
    Enomoto, Mika
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1681 - 1685
  • [26] Data-driven Foot-based Intonation Generator for Text-to-Speech Synthesis
    Langarani, Mahsa Sadat Elyasi
    van Santen, Jan
    Mohammadi, Seyed Hamidreza
    Kain, Alexander
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1596 - 1600
  • [27] Which Resemblance is Useful to Predict Phrase Boundary Rise Labels for Japanese Expressive Text-to-speech Synthesis, Numerically-Expressed Stylistic or Distribution-based Semantic?
    Nakajima, Hideharu
    Mizuno, Hideyuki
    Yoshioka, Osamu
    Takahashi, Satoshi
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1046 - 1050
  • [28] Syllable-level representations of suprasegmental features for DNN-based text-to-speech synthesis
    Ribeiro, Manuel Sam
    Watts, Oliver
    Yamagishi, Junichi
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3186 - 3190
  • [29] A perceptual investigation of wavelet-based decomposition of f0 for text-to-speech synthesis
    Ribeiro, Manuel Sam
    Yamagishi, Junichi
    Clark, Robert A. J.
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1586 - 1590
  • [30] Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement
    Tan, Daxin
    Lee, Tan
    INTERSPEECH 2021, 2021, : 4683 - 4687