Fast concatenative speech synthesis using pre-fused speech units based on the plural unit selection and fusion method

被引:1
作者
Tamura, Masatsune [1 ]
Mizutani, Tatsuya
Kagoshima, Takehiko
机构
[1] Toshiba Co Ltd, Ctr Corp Res & Dev, Multimedia Lab, Kawasaki, Kanagawa 2128582, Japan
[2] Toshiba Co Ltd, Semicond Co, Ome 1988710, Japan
关键词
concatenative speech synthesis; unit selection; unit fusion; offline unit fusion; frequency-weighted VQ;
D O I
10.1093/ietisy/e90-d.2.544
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We have previously developed a concatenative speech synthesizer based on the plural speech unit selection and fusion method that can synthesize stable and human-like speech. In this method, plural speech units for each speech segment are selected using a cost function and fused by averaging pitch-cycle waveforms. This method has a large computational cost, but some platforms require a speech synthesis system that can work within limited hardware resources. In this paper, we propose an offline unit fusion method that reduces the computational cost. In the proposed method, speech units are fused in advance to make a pre-fused speech unit database. At synthesis time, a speech unit for each segment is selected from the pre-fused speech unit database and the speech waveform is synthesized by applying prosodic modification and concatenation without the computationally expensive unit fusion process. We compared several algorithms for constructing the pre-fused speech unit database. From the subjective and objective evaluations, the effectiveness of the proposed method is confirmed by the results that the quality of synthetic speech of the offline unit fusion method with 100 MB database is close to that of the online unit fusion method with 93 MB JP database and is slightly lower to that of the 390 MB US database, while the computational time is reduced by 80%. We also show that the frequency-weighted VQ-based method is effective for construction of the pre-fused speech unit database.
引用
收藏
页码:544 / 553
页数:10
相关论文
共 39 条
  • [21] Maximum Likelihood Unit Selection for Corpus-based Speech Synthesis
    Gamboa Rosales, Abubeker
    Rosales, Hamurabi Gamboa
    Hoffmann, Ruediger
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 748 - +
  • [22] Evaluation of Finnish Unit Selection and HMM-based Speech Synthesis
    Silen, Hanna
    Helander, Elina
    Nurminen, Jani
    Gabbouji, Moncef
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1853 - +
  • [23] CUTE: A CONCATENATIVE METHOD FOR VOICE CONVERSION USING EXEMPLAR-BASED UNIT SELECTION
    Jin, Zeyu
    Finkelstein, Adam
    DiVerdi, Stephen
    Lu, Jingwan
    Mysore, Gautham J.
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5660 - 5664
  • [24] Minimum unit selection error training for HMM-based unit selection speech synthesis system
    Ling, Zhen-Hua
    Wang, Ren-Hua
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 3949 - 3952
  • [25] High quality Arabic text-to-speech synthesis using unit selection
    Abdelmalek, Raja
    Mnasri, Zied
    2016 13TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2016, : 1 - 5
  • [26] A NOVEL UNIT SELECTION METHOD FOR CONCATENATION SPEECH SYSTEM USING SIMILARITY MEASURE
    Zhang, Ran
    Tao, Jianhua
    Li, Ya
    Wen, Zhengqi
    2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
  • [27] Unit-Selection Speech Synthesis Adjustments for Audiobook-Based Voices
    Vit, Jakub
    Matousek, Jindrich
    TEXT, SPEECH, AND DIALOGUE, 2016, 9924 : 335 - 342
  • [28] Learning and Modeling Unit Embeddings for Improving HMM-based Unit Selection Speech Synthesis
    Zhou, Xiao
    Ling, Zhen-Hua
    Zhou, Zhi-Ping
    Dai, Li-Rong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2509 - 2513
  • [29] Learning and Modeling Unit Embeddings Using Deep Neural Networks for Unit-Selection-Based Mandarin Speech Synthesis
    Zhou, Xiao
    Ling, Zhen-Hua
    Dai, Li-Rong
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (03)
  • [30] Optimal weight tuning method for unit selection cost functions in syllable based text-to-speech synthesis
    Narendra, N. P.
    Rao, K. Sreenivasa
    APPLIED SOFT COMPUTING, 2013, 13 (02) : 773 - 781