A Dynamic Cost Weighting Framework for Unit Selection Text-to-Speech Synthesis

被引:9
|
作者
Bellegarda, Jerome R. [1 ]
机构
[1] Apple Comp Inc, Speech & Language Technol, Cupertino, CA 95014 USA
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 06期
关键词
Candidate ranking; concatenation-specific cost weighting; concatenative speech synthesis; multiple information streams; unit selection;
D O I
10.1109/TASL.2009.2035209
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Unit selection text-to-speech synthesis relies on multiple cost criteria, each encapsulating a different aspect of acoustic and prosodic context at any given concatenation point. Constraints are normally invoked on diverse characteristics such as inter-unit discontinuity, overall pitch contour, local duration profile, etc., leading to costs often too heterogeneous for a direct quantitative comparison. In order to rank available candidate units, this complexity must be reduced to a single number, and the relative importance of each information stream becomes highly critical. Yet this influence is typically determined in an empirical manner (e. g., based on a limited amount of synthesized data), yielding global weights that are thus applied to broad classes of concatenations indiscriminately. This paper proposes an alternative approach, dynamic cost weighting, based on a data-driven framework separately optimized for each concatenation considered. Specifically, the cost distribution in every stream is dynamically leveraged on a per concatenation basis to locally shift weight towards those characteristics that offer a high discrimination between candidate units, and away from those characteristics that are intrinsically less discriminative. An illustrative case study demonstrates the potential benefits of this solution, and listening evidence suggests that it does indeed entail higher perceived TTS quality.
引用
收藏
页码:1455 / 1463
页数:9
相关论文
共 50 条
  • [41] Polish unit selection speech synthesis with BOSS: extensions and speech corpora
    Demenko, Grazyna
    Klessa, Katarzyna
    Szymanski, Marcin
    Breuer, Stefan
    Hess, Wolfgang
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2010, 13 (02) : 85 - 99
  • [42] Expressive Prosody for Unit-selection Speech Synthesis
    Strom, Volker
    Clark, Robert
    King, Simon
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1296 - 1299
  • [43] On the Impact of Labialization Contexts on Unit Selection Speech Synthesis
    Tihelka, Daniel
    Hanzlicek, Zdenek
    Machac, Pavel
    Skarnitzl, Radek
    Matousek, Jindrich
    2012 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2012, : 187 - 192
  • [44] Joint Prosodic and Segmental Unit Selection Speech Synthesis
    Clark, Robert A. J.
    King, Simon
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1312 - 1315
  • [45] Development and Evaluation of Polish Speech Corpus for Unit Selection Speech Synthesis Systems
    Demenko, G.
    Bachan, J.
    Moebius, B.
    Klessa, K.
    Szymanski, M.
    Grocholewski, S.
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1650 - +
  • [46] Minimum unit selection error training for HMM-based unit selection speech synthesis system
    Ling, Zhen-Hua
    Wang, Ren-Hua
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 3949 - 3952
  • [47] PREDICTING SPECTRAL AND PROSODIC PARAMETERS FOR UNIT SELECTION IN SPEECH SYNTHESIS
    Dong, Minghui
    Li, Haizhou
    2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 133 - 136
  • [48] Unit Selection based Speech Synthesis for Poor Channel Condition
    Cen, Ling
    Dong, Minghui
    Chan, Paul
    Li, Haizhou
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2035 - 2038
  • [49] Phone-Level Embeddings for Unit Selection Speech Synthesis
    Perquin, Antoine
    Lecorve, Gwenole
    Lolive, Damien
    Amsaleg, Laurent
    STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2018, 2018, 11171 : 21 - 31
  • [50] On the Impact of Annotation Errors on Unit-Selection Speech Synthesis
    Matousek, Jindrich
    Tihelka, Daniel
    Smidl, Lubos
    TEXT, SPEECH AND DIALOGUE, TSD 2012, 2012, 7499 : 456 - 463