Fusion of Spectral and Prosodic Information using Combined Error Optimization for Keyword Spotting

被引:0
作者
Pandey, Laxmi [1 ]
Chaudhary, Kuldeep [1 ]
Hegde, Rajesh M. [1 ]
机构
[1] Indian Inst Technol, Dept Elect Engn, Kanpur, Uttar Pradesh, India
来源
2017 TWENTY-THIRD NATIONAL CONFERENCE ON COMMUNICATIONS (NCC) | 2017年
关键词
HIDDEN MARKOV-MODELS; SPEECH RECOGNITION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Incorporating prosodic information with spectral information at the feature level is challenging. In this paper, a method for feature level fusion of spectral and prosodic information is proposed. A pitch contour is first extracted from the frame blocked segments of the speech signal. These speech segments obtained herein are labeled as high pitch and low pitch segments. Both spectral and prosodic features are extracted from each segment class. An integrated feature set is obtained by concatenating spectral and prosodic features from each of these classes. In the next stage of fusion, the high and low pitch labeled features are further combined using joint error optimization approach. This optimization approach assumes that the mean of the high pitch segments can be obtained by an affine transformation on the mean of the low pitch segments. The parameters of the affine transformation are obtained using the gradient descent approach. The final integrated feature set is obtained after normalization of both sets of features thus obtained. This integrated feature set is used in a Hidden Markov Modeling (HMM) framework along with a novel sliding syllable protocol for keyword spotting. Keyword spotting experiments are conducted on the Hindi language database developed for this purpose. Experiments on keyword recognition and keyword spotting are conducted to evaluate the performance of the proposed fusion method. Experimental results obtained in terms of WER and receiver operating characteristics indicate a reasonable improvements over the use of a single feature set like the MFCC.
引用
收藏
页数:6
相关论文
共 17 条
  • [11] A TUTORIAL ON HIDDEN MARKOV-MODELS AND SELECTED APPLICATIONS IN SPEECH RECOGNITION
    RABINER, LR
    [J]. PROCEEDINGS OF THE IEEE, 1989, 77 (02) : 257 - 286
  • [12] Rangarajan Srinivas, 2007, P INT ANTW BELG
  • [13] Speaker verification using adapted Gaussian mixture models
    Reynolds, DA
    Quatieri, TF
    Dunn, RB
    [J]. DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) : 19 - 41
  • [14] Samudravijaya K, 2003, WORKSH SPOK LANG PRO
  • [15] Schuster-Bockler Benjamin, 2007, Curr Protoc Bioinformatics, VAppendix 3, p3A, DOI 10.1002/0471250953.bia03as18
  • [16] Combining lexical, syntactic and prosodic cues for improved online dialog act tagging
    Sridhar, Vivek Kumar Rangarajan
    Bangalore, Srinivas
    Narayanan, Shrikanth
    [J]. COMPUTER SPEECH AND LANGUAGE, 2009, 23 (04) : 407 - 422
  • [17] Wei Han, 2006, 2006 IEEE International Symposium on Circuits and Systems (IEEE Cat. No. 06CH37717C)