A Dynamic Cost Weighting Framework for Unit Selection Text-to-Speech Synthesis

被引:9
|
作者
Bellegarda, Jerome R. [1 ]
机构
[1] Apple Comp Inc, Speech & Language Technol, Cupertino, CA 95014 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 06期
关键词
Candidate ranking; concatenation-specific cost weighting; concatenative speech synthesis; multiple information streams; unit selection;
D O I
10.1109/TASL.2009.2035209
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Unit selection text-to-speech synthesis relies on multiple cost criteria, each encapsulating a different aspect of acoustic and prosodic context at any given concatenation point. Constraints are normally invoked on diverse characteristics such as inter-unit discontinuity, overall pitch contour, local duration profile, etc., leading to costs often too heterogeneous for a direct quantitative comparison. In order to rank available candidate units, this complexity must be reduced to a single number, and the relative importance of each information stream becomes highly critical. Yet this influence is typically determined in an empirical manner (e. g., based on a limited amount of synthesized data), yielding global weights that are thus applied to broad classes of concatenations indiscriminately. This paper proposes an alternative approach, dynamic cost weighting, based on a data-driven framework separately optimized for each concatenation considered. Specifically, the cost distribution in every stream is dynamically leveraged on a per concatenation basis to locally shift weight towards those characteristics that offer a high discrimination between candidate units, and away from those characteristics that are intrinsically less discriminative. An illustrative case study demonstrates the potential benefits of this solution, and listening evidence suggests that it does indeed entail higher perceived TTS quality.
引用
收藏
页码:1455 / 1463
页数:9
相关论文
共 50 条
  • [31] Using Deep Bidirectional Recurrent Neural Networks for Prosodic-Target Prediction in a Unit-Selection Text-to-Speech System
    Fernandez, Raul
    Rendel, Asaf
    Ramabhadran, Bhuvana
    Hoory, Ron
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1606 - 1610
  • [32] Emotional Intelligence in Text-To-Speech Synthesis in Pali Language Using Fuzzy Logic
    Mache, Suhas
    Dabhade, Siddharth
    JOURNAL OF ADVANCED APPLIED SCIENTIFIC RESEARCH, 2024, 6 (03): : 179 - 192
  • [33] A Hybrid Text-to-Speech Synthesis using Vowel and Non Vowel like regions
    Adiga, Nagaraj
    Prasanna, S. R. Mahadeva
    2014 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2014,
  • [34] Building Text Corpus for Unit Selection Synthesis
    Kasparaitis, Pijus
    Anbinderis, Tomas
    INFORMATICA, 2014, 25 (04) : 551 - 562
  • [35] Assessing a Speaker for Fast Speech in Unit Selection Speech Synthesis
    Moers, Donata
    Wagner, Petra
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2015 - +
  • [36] Unit Selection Model in Arabic Speech Synthesis
    Al-Saiyd, Nedhal A.
    Hijjawi, Mohammad
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2018, 18 (04): : 126 - 131
  • [37] Optimal Utterance Selection for Unit Selection Speech Synthesis Databases
    Alan W. Black
    Kevin Lenzo
    International Journal of Speech Technology, 2003, 6 (4) : 357 - 363
  • [38] Unit selection based speech synthesis for converting short text message into voice message in mobile phones
    Bharthi, B.
    Kavitha, S.
    Kotwal, Nekshan Percy
    Parasaram, Nivedita
    Piriyanga, J.
    2017 4TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2017,
  • [39] A classifier-based target cost for unit selection speech synthesis trained on perceptual data
    Strom, Volker
    King, Simon
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 150 - 153
  • [40] One-Class Classification for Spectral Join Cost Calculation in Unit Selection Speech Synthesis
    Karabetsos, Sotiris
    Tsiakoulis, Pirros
    Chalamandaris, Aimilios
    Raptis, Spyros
    IEEE SIGNAL PROCESSING LETTERS, 2010, 17 (08) : 746 - 749