Improved voicing decision using glottal activity features for statistical parametric speech synthesis

被引:6
作者
Adiga, Nagaraj [1 ]
Khonglah, Banriskhem K. [1 ]
Prasanna, S. R. Mahadeva [1 ]
机构
[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati 781039, India
关键词
Glottal activity features; Statistical parametric speech synthesis; Voicing decision; Support vector machine; EPOCH EXTRACTION; F0; CLASSIFICATION;
D O I
10.1016/j.dsp.2017.09.007
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A method to improve voicing decision using glottal activity features proposed for statistical parametric speech synthesis. In existing methods, voicing decision relies mostly on fundamental frequency FO, which may result in errors when the prediction is inaccurate. Even though FO is a glottal activity feature, other features that characterize this activity may help in improving the voicing decision. The glottal activity features used in this work are the strength of excitation (SoE), normalized autocorrelation peak strength (NAPS), and higher-order statistics (HOS). These features obtained from approximated source signals like zero-frequency filtered signal and integrated linear prediction residual. To improve voicing decision and to avoid heuristic threshold for classification, glottal activity features are trained using different statistical learning methods such as the k-nearest neighbor, support vector machine (SVM), and deep belief network. The voicing decision works best with SVM classifier, and its effectiveness is tested using the statistical parametric speech synthesis. The glottal activity features SoE, NAPS, and HOS modeled along with FO and Mel-cepstral coefficients in Hidden Markov model and deep neural network to get the voicing decision. The objective and subjective evaluations demonstrate that the proposed method improves the naturalness of synthetic speech. (C) 2017 Elsevier Inc. All rights reserved.
引用
收藏
页码:131 / 143
页数:13
相关论文
共 47 条
[1]   Detection of Glottal Activity Using Different Attributes of Source Information [J].
Adiga, Nagaraj ;
Prasanna, S. R. M. .
IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (11) :2107-2111
[2]  
Ananthapadmanabha T.V., 1984, STLQPSR23 ROY I TECH
[3]  
[Anonymous], 2004, 5 ISCA WORKSH SPEECH
[4]  
[Anonymous], 1995, Speech coding and synthesis
[5]  
[Anonymous], 2011, INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association
[6]  
[Anonymous], 2000, P 6 INT C SPOK LANG
[7]  
Arifianto D, 2007, INT CONF ACOUST SPEE, P749
[8]   PATTERN-RECOGNITION APPROACH TO VOICED UNVOICED SILENCE CLASSIFICATION WITH APPLICATIONS TO SPEECH RECOGNITION [J].
ATAL, BS ;
RABINER, LR .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1976, 24 (03) :201-212
[9]  
Bagshaw P.C., ENHANCED PITCH TRACK
[10]  
Boersma P., 2018, Praat: doing phonetics by computer (Version 5.3) Computer software