A Covariance-Tying Technique for HMM-Based Speech Synthesis

被引:10
作者
Oura, Keiichiro [1 ]
Zen, Heiga [1 ]
Nankaku, Yoshihiko [1 ]
Lee, Akinobu [1 ]
Tokuda, Keiichi [1 ]
机构
[1] Nagoya Inst Technol, Dept Comp Sci & Engn, Nagoya, Aichi 4668555, Japan
来源
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2010年 / E93D卷 / 03期
关键词
HMM; speech synthesis; decision tree; context-clustering; MDL criterion; embedded device;
D O I
10.1587/transinf.E93.D.595
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A technique for reducing the footprints of HMM-based speech synthesis systems by tying all covariance matrices of state distributions is described. HMM-based speech synthesis systems usually leave smaller footprints than unit-selection synthesis systems because they store statistics rather than speech waveforms. However, further reduction is essential to put them on embedded devices, which have limited memory. In accordance with the empirical knowledge that covariance matrices have a smaller impact on the quality of synthesized speech than mean vectors, we propose a technique for clustering mean vectors while tying all covariance matrices. Subjective listening test results showed that the proposed technique can shrink the footprints of an HMM-based speech synthesis system while retaining the quality of the synthesized speech.
引用
收藏
页码:595 / 601
页数:7
相关论文
共 50 条
  • [1] Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems
    Oura, Keiichiro
    Zen, Heiga
    Nankaku, Yoshihiko
    Lee, Akinobu
    Tokuda, Keiichi
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1723 - 1726
  • [2] FULL COVARIANCE STATE DURATION MODELING FOR HMM-BASED SPEECH SYNTHESIS
    Lu, Heng
    Wu, Yi-Jian
    Tokuda, Keiichi
    Dai, Li-Rong
    Wang, Ren-Hua
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4033 - +
  • [3] Robustness of HMM-based Speech Synthesis
    Yamagishi, Junichi
    Ling, Zhenhua
    King, Simon
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 581 - 584
  • [4] Arabic HMM-based Speech Synthesis
    Khalil, Krichi Mohamed
    Adnan, Cherif
    2013 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND SOFTWARE APPLICATIONS (ICEESA), 2013, : 450 - 454
  • [5] AN OPTIMIZATION ALGORITHM OF INDEPENDENT MEAN AND VARIANCE PARAMETER TYING STRUCTURES FOR HMM-BASED SPEECH SYNTHESIS
    Takaki, Shinji
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4700 - 4703
  • [6] HMM-Based Speech Synthesis for the Greek Language
    Karabetsos, Sotiris
    Tsiakoulis, Pirros
    Chalamandaris, Aimilios
    Raptis, Spyros
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 349 - 356
  • [7] Analysis of HMM-Based Lombard Speech Synthesis
    Raitio, Tuomo
    Suni, Antti
    Vainio, Martti
    Alku, Paavo
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2792 - +
  • [8] Implementation and evaluation of an HMM-based Korean speech synthesis system
    Kim, SJ
    Kim, JJ
    Hahn, M
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (03): : 1116 - 1119
  • [9] A trainable excitation model for HMM-based speech synthesis
    Maia, R.
    Toda, T.
    Zen, H.
    Nankaku, Y.
    Tokuda, K.
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1125 - +
  • [10] Synthesis and evaluation of conversational characteristics in HMM-based speech synthesis
    Andersson, Sebastian
    Yamagishi, Junichi
    Clark, Robert A. J.
    SPEECH COMMUNICATION, 2012, 54 (02) : 175 - 188