EMG-based speech recognition using dimensionality reduction methods

被引:13
作者
Ratnovsky, Anat [1 ]
Malayev, Sarit [1 ,2 ]
Ratnovsky, Shahar [3 ,4 ]
Naftali, Sara [1 ]
Rabin, Neta [5 ]
机构
[1] Afeka Tel Aviv Acad Coll Engn, Sch Med Engn, Tel Aviv, Israel
[2] Tel Aviv Univ, Sch Neurosci, Tel Aviv, Israel
[3] Tel Aviv Univ, Sch Elect Engn, Tel Aviv, Israel
[4] Tel Aviv Univ, Sch Comp Sci, Tel Aviv, Israel
[5] Tel Aviv Univ, Dept Ind Engn, Tel Aviv, Israel
关键词
Electromyography; Speech recognition; Automatic speech recognition; Machine learning algorithms; Feature extraction; Principal component analysis; PERFORMANCE; SIGNALS;
D O I
10.1007/s12652-021-03315-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic speech recognition is the main form of man-machine communication. Recently, several studies have shown the ability to automatically recognize speech based on electromyography (EMG) signals of the facial muscles using machine learning methods. The objective of this study was to utilize machine learning methods for automatic identification of speech based on EMG signals. EMG signals from three facial muscles were measured from four healthy female subjects while pronouncing seven different words 50 times. Short time Fourier transform features were extracted from the EMG data. Principle component analysis (PCA) and locally linear embedding (LLE) methods were applied and compared for reducing the dimensions of the EMG data. K-nearest-neighbors was used to examine the ability to identify different word sets of a subject based on his own dataset, and to identify words of one subject based on another subject's dataset, utilizing an affine transformation for aligning between the reduced feature spaces of two subjects. The PCA and LLE achieved average recognizing rate of 81% for five words-sets in the single-subject approach. The best average recognition success rates for three and five words-sets were 88.8% and 74.6%, respectively, for the multi-subject classification approach. Both the PCA and LLE achieved satisfactory classification rates for both the single-subject and multi-subject approaches. The multi-subject classification approach enables robust classification of words recorded from a new subject based on another subject's dataset and thus can be applicable for people who have lost their ability to speak.
引用
收藏
页码:597 / 607
页数:11
相关论文
共 35 条
[1]  
[Anonymous], 2005, P 38 ANN HAW INT C S
[2]   Small-vocabulary speech recognition using surface electromyography [J].
Betts, Bradley J. ;
Binsted, Kim ;
Jorgensen, Charles .
INTERACTING WITH COMPUTERS, 2006, 18 (06) :1242-1259
[3]  
Chan ADC, 2002, P ANN INT IEEE EMBS, P72, DOI 10.1109/IEMBS.2002.1134393
[4]   Myo-electric signals to augment speech recognition [J].
Chan, ADC ;
Englehart, K ;
Hudgins, B ;
Lovely, DF .
MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2001, 39 (04) :500-504
[5]   Silent speech interfaces [J].
Denby, B. ;
Schultz, T. ;
Honda, K. ;
Hueber, T. ;
Gilbert, J. M. ;
Brumberg, J. S. .
SPEECH COMMUNICATION, 2010, 52 (04) :270-287
[6]   A Near Real-Time Automatic Speaker Recognition Architecture for Voice-Based User Interface [J].
Dhakal, Parashar ;
Damacharla, Praveen ;
Javaid, Ahmad Y. ;
Devabhaktuni, Vijay .
MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2019, 1 (01) :504-520
[7]   Surface electromyographic and electroglottographic studies in normal subjects under two swallow conditions: Normal and during the Mendelsohn manuever [J].
Ding, RY ;
Larson, CR ;
Logemann, JA ;
Rademaker, AW .
DYSPHAGIA, 2002, 17 (01) :1-12
[8]  
Jollife Ian T, 1986, Principal Component Analysis, P129, DOI DOI 10.1007/B98835
[9]   A speech recognition system based on electromyography for the rehabilitation of dysarthric patients: A Thai syllable study [J].
Jong, Nida Sae ;
Phukpattaranont, Pornchai .
BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2019, 39 (01) :234-245
[10]  
Jorgensen C, 2003, IEEE IJCNN, P3128