A Comparison of Speech Synthesis Systems Based on GPR, HMM, and DNN with a Small Amount of Training Data

被引:0
|
作者
Koriyama, Tomoki [1 ]
Kobayashi, Takao [1 ]
机构
[1] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Tokyo, Japan
关键词
statistical parametric speech synthesis; Gaussian process regression (GPR); HMM; deep neural network (DNN);
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we evaluate a framework of statistical parametric speech synthesis based on Gaussian process regression (GPR) and compare it with those based on hidden Markov model (HMM) and deep neural network (DNN). Recently, for the purpose of improving the performance of HMM-based speech synthesis, novel frameworks using deep architectures have been proposed and have shown their effectiveness. GPR-based speech synthesis is also an alternative framework to HMM-based one, in which the frame-level acoustic features are predicted from frame-level linguistic features, as in DNN-based one. First we examine the clustering level of speech segments such as state, phone, mora, and accent phrase, used for GPR-based synthesis. Then we compare the modeling architecture and performance of GPR with DNN and HMM for statistical parametric speech synthesis. Experimental results show that the GPR-based speech synthesis system gives higher performance than both HMM- and DNN-based ones under the condition using a relatively small size training data of around 40 minutes.
引用
收藏
页码:3496 / 3500
页数:5
相关论文
共 50 条
  • [1] Performance Evaluation of HMM-Based Style Classification with a Small Amount of Training Data
    Tachibana, Makoto
    Kawashima, Keigo
    Yamagishi, Junichi
    Kobayashi, Takao
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 569 - 572
  • [2] An HMM/DNN Comparison for Synchronized Text-to-speech and Tongue Motion Synthesis
    Le Maguer, Sebastien
    Steiner, Ingmar
    Hewer, Alexander
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 239 - 243
  • [3] DNN-Based Speech Synthesis: Importance of Input Features and Training Data
    Lazaridis, Alexandros
    Potard, Blaise
    Garner, Philip N.
    SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 193 - 200
  • [4] Hierarchical English Emphatic Speech Synthesis Based on HMM with Limited Training Data
    Meng, Fanbo
    Wu, Zhiyong
    Meng, Helen
    Jia, Jia
    Cai, Lianhong
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 466 - 469
  • [5] Design and Implementation of Burmese Speech Synthesis System Based on HMM-DNN
    Liu, Mengyuan
    Yang, Jian
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 79 - 83
  • [6] Effects of Training Strategies and the Amount of Speech Data on the Quality of Speech Synthesis
    Vladar, Lukas
    Matousek, Jindrich
    TEXT, SPEECH, AND DIALOGUE, TSD 2024, PT II, 2024, 15049 : 94 - 104
  • [7] Contaminated speech training methods for robust DNN-HMM distant speech recognition
    Ravanelli, Mirco
    Omologo, Maurizio
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 756 - 760
  • [8] A Bilingual Speech Synthesis System of Standard Malay and Indonesian Based on HMM-DNN
    Chen, Feng
    Yang, Jian
    Zhao, Lixuan
    2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020), 2020, : 181 - 186
  • [9] Mismatched Training Data Enhancement for Automatic Recognition of Children's Speech using DNN-HMM
    Qian, Mengjie
    McLoughlin, Ian
    Guo, Wu
    Dai, Lirong
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [10] Comparison of syllable-based and phoneme-based DNN-HMM in Japanese Speech Recognition
    Seki, Hiroshi
    Yamamoto, Kazumasa
    Nakagawa, Seiichi
    2014 INTERNATIONAL CONFERENCE OF ADVANCED INFORMATICS: CONCEPT, THEORY AND APPLICATION (ICAICTA), 2014, : 249 - 254