Fusion of Spectral and Prosodic Information using Combined Error Optimization for Keyword Spotting

被引：0

作者：

Pandey, Laxmi ^{[1
]}

Chaudhary, Kuldeep ^{[1
]}

Hegde, Rajesh M. ^{[1
]}

机构：

[1] Indian Inst Technol, Dept Elect Engn, Kanpur, Uttar Pradesh, India

来源：

2017 TWENTY-THIRD NATIONAL CONFERENCE ON COMMUNICATIONS (NCC) | 2017年

关键词：

HIDDEN MARKOV-MODELS; SPEECH RECOGNITION;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Incorporating prosodic information with spectral information at the feature level is challenging. In this paper, a method for feature level fusion of spectral and prosodic information is proposed. A pitch contour is first extracted from the frame blocked segments of the speech signal. These speech segments obtained herein are labeled as high pitch and low pitch segments. Both spectral and prosodic features are extracted from each segment class. An integrated feature set is obtained by concatenating spectral and prosodic features from each of these classes. In the next stage of fusion, the high and low pitch labeled features are further combined using joint error optimization approach. This optimization approach assumes that the mean of the high pitch segments can be obtained by an affine transformation on the mean of the low pitch segments. The parameters of the affine transformation are obtained using the gradient descent approach. The final integrated feature set is obtained after normalization of both sets of features thus obtained. This integrated feature set is used in a Hidden Markov Modeling (HMM) framework along with a novel sliding syllable protocol for keyword spotting. Keyword spotting experiments are conducted on the Hindi language database developed for this purpose. Experiments on keyword recognition and keyword spotting are conducted to evaluate the performance of the proposed fusion method. Experimental results obtained in terms of WER and receiver operating characteristics indicate a reasonable improvements over the use of a single feature set like the MFCC.

引用

页数：6

共 17 条

[11] A TUTORIAL ON HIDDEN MARKOV-MODELS AND SELECTED APPLICATIONS IN SPEECH RECOGNITION
RABINER, LR
[J]. PROCEEDINGS OF THE IEEE, 1989, 77 (02) : 257 - 286
[12] Rangarajan Srinivas, 2007, P INT ANTW BELG
[13] Speaker verification using adapted Gaussian mixture models
Reynolds, DA
Quatieri, TF
Dunn, RB
[J]. DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) : 19 - 41
[14] Samudravijaya K, 2003, WORKSH SPOK LANG PRO
[15] Schuster-Bockler Benjamin, 2007, Curr Protoc Bioinformatics, VAppendix 3, p3A, DOI 10.1002/0471250953.bia03as18
[16] Combining lexical, syntactic and prosodic cues for improved online dialog act tagging
Sridhar, Vivek Kumar Rangarajan
Bangalore, Srinivas
Narayanan, Shrikanth
[J]. COMPUTER SPEECH AND LANGUAGE, 2009, 23 (04) : 407 - 422
[17] Wei Han, 2006, 2006 IEEE International Symposium on Circuits and Systems (IEEE Cat. No. 06CH37717C)

← 1 2 →