A Pitch-Synchronous Speech Analysis and Synthesis Method for DNN-SPSS System
被引:0
作者:
Kim, Jin-Seob
论文数: 0引用数: 0
h-index: 0
机构:
Yonsei Univ, Dept Elect & Elect Engn, Seoul, South KoreaYonsei Univ, Dept Elect & Elect Engn, Seoul, South Korea
Kim, Jin-Seob
[1
]
论文数: 引用数:
h-index:
机构:
Joo, Young-Sun
[1
]
Kang, Hong-Goo
论文数: 0引用数: 0
h-index: 0
机构:
Yonsei Univ, Dept Elect & Elect Engn, Seoul, South KoreaYonsei Univ, Dept Elect & Elect Engn, Seoul, South Korea
Kang, Hong-Goo
[1
]
Jang, Inseon
论文数: 0引用数: 0
h-index: 0
机构:
Elect & Telecommun Res Inst, Realist Broadcasting Media Res Dept, Daejeon, South KoreaYonsei Univ, Dept Elect & Elect Engn, Seoul, South Korea
Jang, Inseon
[2
]
Ahn, ChungHyun
论文数: 0引用数: 0
h-index: 0
机构:
Elect & Telecommun Res Inst, Realist Broadcasting Media Res Dept, Daejeon, South KoreaYonsei Univ, Dept Elect & Elect Engn, Seoul, South Korea
Ahn, ChungHyun
[2
]
Seo, Jeongil
论文数: 0引用数: 0
h-index: 0
机构:
Elect & Telecommun Res Inst, Realist Broadcasting Media Res Dept, Daejeon, South KoreaYonsei Univ, Dept Elect & Elect Engn, Seoul, South Korea
Seo, Jeongil
[2
]
机构:
[1] Yonsei Univ, Dept Elect & Elect Engn, Seoul, South Korea
[2] Elect & Telecommun Res Inst, Realist Broadcasting Media Res Dept, Daejeon, South Korea
来源:
2016 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP)
|
2016年
关键词:
Deep neural newtork (DNN);
statistical parametric speech synthesis (SPSS);
pitch-synchronous;
glottal closure instants (GCIs);
DEEP NEURAL-NETWORKS;
D O I:
暂无
中图分类号:
TM [电工技术];
TN [电子技术、通信技术];
学科分类号:
0808 ;
0809 ;
摘要:
This paper proposes a pitch-synchronous deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. The pitch-synchronous frames defined by the locations of glottal closure instants (GCIs) are used to extract speech parameters, which significantly reduce coupling effects between vocal tract and excitation signals. As a result, the distribution of spectral parameters within the same context of phonetic classes becomes more uniform, which improves a model trainability especially for a small-scaled DNN framework. Although the effectiveness of pitch-synchronous approach has been proven in other applications, it is not trivial to integrate the method into the typical DNN-based SPSS systems that have regularized structures, i.e. fixed frame rate and fixed dimension of features. In this paper, we design a new DNN-based SPSS system that pitch-synchronously trains and generates speech parameters. Objective and subjective test results verify the superiority of the proposed system compared to the conventional approach.