Low-dimensional representation of spectral envelope using deep auto-encoder for speech synthesis

被引:0
作者
Choi, Heejin [1 ]
Kim, Jaeseok [1 ]
Park, Jinuk [1 ]
Kim, Juntae [1 ]
Hahn, Minsoo [1 ]
机构
[1] Korea Adv Inst Sci & Technol, 291 Daehak Ro, Daejeon, South Korea
来源
ICMSCE 2018: PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON MECHATRONICS SYSTEMS AND CONTROL ENGINEERING | 2015年
关键词
Statistical parametric speech synthesis; Deep auto-encoder; Spectral envelope; Vocoder; NEURAL-NETWORKS;
D O I
10.1145/3185066.3185088
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a deep auto-encoder structure to extract robust spectral features for statistical parametric speech synthesis. The technique allows us to compress the low-dimensional features from high dimensional spectral envelope without degradation for full-band speech in a data-driven way. We carried out a subjective evaluation and found that the optimum auto-encoder architecture. Experimental results showed that an analysis-by-synthesis using the proposed auto-encoder has lower reconstruction error of spectral envelope than conventional mel-cepstral analysis in narrow-band as well as full-band. Our results confirm that the proposed method increases the quality of synthesized speech in text-to-speech experiments.
引用
收藏
页码:107 / 111
页数:5
相关论文
共 26 条
[1]  
[Anonymous], 2004, 5 ISCA WORKSH SPEECH
[2]  
[Anonymous], 2014, P ICASSP
[3]  
[Anonymous], 2013, P ICLR
[4]   Remaking speech [J].
Dudley, H .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1939, 11 (02) :169-177
[5]  
Fernandez R, 2014, INTERSPEECH, P805
[6]  
Fernandez R, 2013, INT CONF ACOUST SPEE, P6885, DOI 10.1109/ICASSP.2013.6638996
[7]  
Gehring J, 2013, INT CONF ACOUST SPEE, P3377, DOI 10.1109/ICASSP.2013.6638284
[8]   Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].
Hinton, Geoffrey ;
Deng, Li ;
Yu, Dong ;
Dahl, George E. ;
Mohamed, Abdel-rahman ;
Jaitly, Navdeep ;
Senior, Andrew ;
Vanhoucke, Vincent ;
Patrick Nguyen ;
Sainath, Tara N. ;
Kingsbury, Brian .
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97
[9]  
Kang SY, 2013, INT CONF ACOUST SPEE, P8012, DOI 10.1109/ICASSP.2013.6639225
[10]  
Kavukcuoglu K., 2016, WAVENET GENERATIVE M