Recent Advances in Google Real-time HMM-driven Unit Selection Synthesizer

被引:27
作者
Gonzalvo, Xavi [1 ]
Tazari, Siamak [1 ]
Chan, Chun-an [1 ]
Becker, Markus [1 ]
Gutkin, Alexander [1 ]
Silen, Hanna [1 ]
机构
[1] Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA 94043 USA
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
speech synthesis; hybrid approaches; real-time; unit selection; SPEECH SYNTHESIS;
D O I
10.21437/Interspeech.2016-264
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents advances in Google's hidden Markov model (HMM)-driven unit selection speech synthesis system. We describe several improvements to the run-time system; these include minimal latency, high-quality and fast refresh cycle for new voices. Traditionally unit selection synthesizers are limited in terms of the amount of data they can handle and the real applications they are built for. That is even more critical for real life large-scale applications where high-quality is expected and low latency is required given the available computational resources. In this paper we present an optimized engine to handle a large database at runtime, a composite unit search approach for combining diphones and phrase-based units. In addition a new voice building strategy for handling big databases and keeping the building times low is presented.
引用
收藏
页码:2238 / 2242
页数:5
相关论文
共 17 条
  • [1] Towards high-quality next-generation text-to-speech synthesis:: A multidomain approach by automatic domain classification
    Alias, Francesc
    Sevillano, Xavier
    Socoro, Joan Claudi
    Gonzalvo, Xavier
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (07): : 1340 - 1354
  • [2] [Anonymous], 1999, P EUROSPEECH
  • [3] [Anonymous], 2004, P SSW5
  • [4] Conversational speech synthesis and the need for some laughter
    Campbell, Nick
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (04): : 1171 - 1178
  • [5] Phrase splicing and variable substitution using the IBM trainable speech synthesis system
    Donovan, RE
    Franz, M
    Sorensen, JS
    Roukos, S
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 373 - 376
  • [6] Gonzalvo X., 2009, P ICSLP BRIGHT UK, P416
  • [7] Hunt AJ, 1996, INT CONF ACOUST SPEE, P373, DOI 10.1109/ICASSP.1996.541110
  • [8] Klabbers E.A.M., 1997, P COST WORKSH SPEECH, P85
  • [9] Lamel L. F., 1993, P ESCA NATO WORKSH A, P207
  • [10] Ling Z.-H., 2007, P BLIZZ CHALL WORKSH