A novel Wake-Up-Word speech recognition system, Wake-Up-Word recognition task, technology and evaluation

被引:26
作者
Kepuska, Z. [1 ]
Klein, T. B. [1 ]
机构
[1] Florida Inst Technol, Dept Elect & Comp Engn, Melbourne, FL 32901 USA
基金
美国国家科学基金会;
关键词
Wake-Up-Word; Speech recognition; Hidden Markov Models; Support Vector Machines; Mel-scale cepstral coefficients; Linear prediction spectrum; Enhanced spectrum; HTK; Microsoft SAPI;
D O I
10.1016/j.na.2009.06.089
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Wake-Up-Word (WUW) is a new paradigm in speech recognition (SR) that is not yet widely recognized. This paper defines and investigates WUW speech recognition, describes details of this novel solution and the technology that implements it. WUW SR is defined as detection of a single word or phrase when spoken in the alerting context of requesting attention, while rejecting all other words, phrases, sounds, noises and other acoustic events and the same word or phrase spoken in non-alerting context with virtually 100% accuracy. In order to achieve this accuracy, the following innovations were accomplished: (1) Hidden Markov Model triple scoring with Support Vector Machine classification, (2) Combining multiple speech feature streams: Mel-scale Filtered Cepstral Coefficients (MFCCs), Linear Prediction Coefficients (LPC)-smoothed MFCCs, and Enhanced MFCC, and (3) Improved Voice Activity Detector with Support Vector Machines. WUW detection and recognition performance is 2514%, or 26 times better than HTK for the same training & testing data, and 2271%, or 24 times better than Microsoft SAPI 5.1 recognizer. The out-of-vocabulary rejection performance is over 65,233%, or 653 times better than HTK, and 5900% to 42,900%, or 60 to 430 times better than the Microsoft SAPI 5.1 recognizer. This solution that utilizes a new recognition paradigm applies not only to WUW task but also to any general Speech Recognition tasks. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:E2772 / E2789
页数:18
相关论文
共 23 条
[21]   A TUTORIAL ON HIDDEN MARKOV-MODELS AND SELECTED APPLICATIONS IN SPEECH RECOGNITION [J].
RABINER, LR .
PROCEEDINGS OF THE IEEE, 1989, 77 (02) :257-286
[22]  
ROHLICEK JR, 1989, CONTINUOUS HIDDEN MA, V1, P627
[23]   Development of a Sign Language Dialogue System for a Healing Dialogue Robot [J].
Huang, Xuan ;
Wu, Bo ;
Kameda, Hiroyuki .
2021 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS DASC/PICOM/CBDCOM/CYBERSCITECH 2021, 2021, :867-872