WAVELET-BASED DECOMPOSITION OF F0 AS A SECONDARY TASK FOR DNN-BASED SPEECH SYNTHESIS WITH MULTI-TASK LEARNING

被引:0
作者
Ribeiro, Manuel Sam [1 ]
Watts, Oliver [1 ]
Yamagishi, Junichi [1 ,2 ]
Clark, Robert A. J. [1 ,3 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9YL, Midlothian, Scotland
[2] Natl Inst Informat, Tokyo, Japan
[3] Google, Mountain View, CA USA
来源
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS | 2016年
关键词
speech synthesis; f0; modelling; deep neural network; multi-task learning; continuous wavelet transform;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We investigate two wavelet-based decomposition strategies of the f0 signal and their usefulness as a secondary task for speech synthesis using multi-task deep neural networks (MTL-DNN). The first decomposition strategy uses a static set of scales for all utterances in the training data. We propose a second strategy, where the scale of the mother wavelet is dynamically adjusted to the rate of each utterance. This approach is able to capture f0 variations related to the syllable, word, clitic-group, and phrase units. This method also constrains the wavelet components to be within the frequency range that previous experiments have shown to be more natural. These two strategies are evaluated as a secondary task in multi-task deep neural networks (MTL-DNNs). Results indicate that on an expressive dataset there is a strong preference for the systems using multi-task learning when compared to the baseline system.
引用
收藏
页码:5525 / 5529
页数:5
相关论文
共 22 条
  • [1] [Anonymous], 2013, TRASP 2013 TOOLS RES
  • [2] [Anonymous], 2013, 8 ISCA WORKSH SPEECH
  • [3] Braunschweiler N., 2011, INTERSPEECH, P1821
  • [4] Braunschweiler N, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, P2222
  • [5] Multitask learning
    Caruana, R
    [J]. MACHINE LEARNING, 1997, 28 (01) : 41 - 75
  • [6] Collobert R., 2008, P 25 INT C MACH LEAR, P160, DOI [10.1145/1390156.1390177, DOI 10.1145/1390156.1390177]
  • [7] Farouk M.H., 2014, Application of wavelets in speech processing
  • [8] Fernandez R., 2014, P ANN C INT SPEECH C
  • [9] Hu. W., 2014, AC SPEECH SIGN PROC
  • [10] Measuring a decade of progress in Text-to-Speech
    King, Simon
    [J]. LOQUENS, 2014, 1 (01):