Deep Neural Network Frontend for Continuous EMG-based Speech Recognition

被引:22
作者
Wand, Michael [1 ]
Schmidhuber, Jurgen
机构
[1] USI, Ist Dalle Molle Studi Intelligenza Artificiale, Swiss AI Lab IDSIA, Manno Lugano, Switzerland
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
Silent Speech interface; Deep Neural Networks; Electromyography; EMG-based Speech Recognition;
D O I
10.21437/Interspeech.2016-340
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We report on a Deep Neural Network frontend for a continuous speech recognizer based on Surface Electromyography (EMG). Speech data is obtained by facial electrodes capturing the electric activity generated by the articulatory muscles, thus allowing speech processing without making use of the acoustic signal. The electromyographic signal is preprocessed and fed into the neural network, which is trained on framewise targets; the output layer activations are further processed by a Hidden Markov sequence classifier. We show that such a neural network frontend can be trained on EMG data and yields substantial improvements over previous systems, despite the fact that the available amount of data is very small, just amounting to a few tens of sentences: on the EMG-UKA corpus, we obtain average evaluation set Word Error Rate improvements of more than 32% relative on context-independent phone models and 13% relative on versatile Bundled Phonetic feature (BDPF) models, compared to a conventional system using Gaussian Mixture Models. In particular, on simple context-independent phone models, the new system yields results which are almost as good as with BDPF models, which were specifically designed to cope with small amounts of training data.
引用
收藏
页码:3032 / 3036
页数:5
相关论文
共 32 条
[1]  
[Anonymous], P ICASSP
[2]  
[Anonymous], P INTERSPEECH
[3]  
[Anonymous], 1994, Connectionist Speech Recognition: A Hybrid Approach
[4]  
BAHL LR, 1991, INT CONF ACOUST SPEE, P185, DOI 10.1109/ICASSP.1991.150308
[5]   Silent speech interfaces [J].
Denby, B. ;
Schultz, T. ;
Honda, K. ;
Hueber, T. ;
Gilbert, J. M. ;
Brumberg, J. S. .
SPEECH COMMUNICATION, 2010, 52 (04) :270-287
[6]  
Diener L., 2015, P INTERSPEECH
[7]  
Fernández S, 2007, LECT NOTES COMPUT SC, V4669, P220
[8]  
Freitas Joao, 2014, 7th International Conference on Bio-Inspired Systems and Signal Processing (BIOSIGNALS 2014). Proceedings, P13
[9]  
Gonzalez J. A., 2016, COMPUTER SP IN PRESS
[10]   Reducing the dimensionality of data with neural networks [J].
Hinton, G. E. ;
Salakhutdinov, R. R. .
SCIENCE, 2006, 313 (5786) :504-507