Fast concatenative speech synthesis using pre-fused speech units based on the plural unit selection and fusion method

被引:1
|
作者
Tamura, Masatsune [1 ]
Mizutani, Tatsuya
Kagoshima, Takehiko
机构
[1] Toshiba Co Ltd, Ctr Corp Res & Dev, Multimedia Lab, Kawasaki, Kanagawa 2128582, Japan
[2] Toshiba Co Ltd, Semicond Co, Ome 1988710, Japan
关键词
concatenative speech synthesis; unit selection; unit fusion; offline unit fusion; frequency-weighted VQ;
D O I
10.1093/ietisy/e90-d.2.544
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We have previously developed a concatenative speech synthesizer based on the plural speech unit selection and fusion method that can synthesize stable and human-like speech. In this method, plural speech units for each speech segment are selected using a cost function and fused by averaging pitch-cycle waveforms. This method has a large computational cost, but some platforms require a speech synthesis system that can work within limited hardware resources. In this paper, we propose an offline unit fusion method that reduces the computational cost. In the proposed method, speech units are fused in advance to make a pre-fused speech unit database. At synthesis time, a speech unit for each segment is selected from the pre-fused speech unit database and the speech waveform is synthesized by applying prosodic modification and concatenation without the computationally expensive unit fusion process. We compared several algorithms for constructing the pre-fused speech unit database. From the subjective and objective evaluations, the effectiveness of the proposed method is confirmed by the results that the quality of synthetic speech of the offline unit fusion method with 100 MB database is close to that of the online unit fusion method with 93 MB JP database and is slightly lower to that of the 390 MB US database, while the computational time is reduced by 80%. We also show that the frequency-weighted VQ-based method is effective for construction of the pre-fused speech unit database.
引用
收藏
页码:544 / 553
页数:10
相关论文
共 39 条
  • [1] Concatenative speech synthesis based on the plural unit selection and fusion method
    Mizutani, T
    Kagoshima, T
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (11): : 2565 - 2572
  • [2] Unit-Selection Speech Synthesis Method Using Words as Search Units
    Segi, Hiroyuki
    INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2016, 7 (02) : 53 - 67
  • [3] PERCEPTUAL CLUSTERING BASED UNIT SELECTION OPTIMIZATION FOR CONCATENATIVE TEXT-TO-SPEECH SYNTHESIS
    Jiang, Tao
    Wu, Zhiyong
    Jia, Jia
    Cai, Lianhong
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 64 - 68
  • [4] UNIT SELECTION SPEECH SYNTHESIS USING MULTIPLE SPEECH UNITS AT NON-ADJACENT SEGMENTS FOR PROSODY AND WAVEFORM GENERATION
    Tamura, Masatsune
    Braunschweiler, Norbert
    Kagoshima, Takehiko
    Akamine, Masami
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4802 - 4805
  • [5] Assessing a Speaker for Fast Speech in Unit Selection Speech Synthesis
    Moers, Donata
    Wagner, Petra
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2015 - +
  • [6] LSM-based unit pruning for concatenative speech synthesis
    Bellegarda, Jerome R.
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 521 - 524
  • [7] Segment selection method based on tonal validity evaluation using machine learning for concatenative speech synthesis
    Yoshida, Akihiro
    Mizuno, Hideyuki
    Mano, Kazunori
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4617 - 4620
  • [8] Admissible stopping in Viterbi beam search for unit selection in concatenative speech synthesis
    Sakai, Shinsuke
    Kawahara, Tatsuya
    Nakamura, Satoshi
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4613 - 4616
  • [9] Unit database pruning based on the cost degradation criterion for concatenative speech synthesis
    Nishizawa, Nobuyuki
    Kawai, Hisashi
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 3969 - 3972
  • [10] RECENT IMPROVEMENTS OF PROBABILITY BASED PROSODY MODELS FOR UNIT SELECTION IN CONCATENATIVE TEXT-TO-SPEECH
    Zhang, Wei
    Gu, Liang
    Gao, Yuqing
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3777 - 3780