Non-Invasive Silent Phoneme Recognition Using Microwave Signals

被引:16
作者
Birkholz, Peter [1 ]
Stone, Simon [1 ]
Wolf, Klaus [2 ]
Plettemeier, Dirk [2 ]
机构
[1] Tech Univ Dresden, Inst Acoust & Speech Commun, D-01062 Dresden, Germany
[2] Tech Univ Dresden, Inst Commun Technol, D-01062 Dresden, Germany
关键词
Silent-speech interface; SPEECH RECOGNITION; SENSOR DATA; COMMUNICATION; INTERFACE; BRAIN;
D O I
10.1109/TASLP.2018.2865609
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Besides the recognition of audible speech, there is currently an increasing interest in the recognition of silent speech, which has a range of novel applications. A major obstacle for a wide spread of silent-speech technology is the lack of measurement methods for speech movements that are convenient, non-invasive, portable, and robust at the same time. Therefore, as an alternative to established methods, we examined to what extent different phonemes can be discriminated from the electromagnetic transmission and reflection properties of the vocal tract. To this end, we attached two Vivaldi antennas on the cheek and below the chin of two subjects. While the subjects produced 25 phonemes in multiple phonetic contexts each, we measured the electromagnetic transmission spectra from one antenna to the other, and the reflection spectra for each antenna (radar), in a frequency band from 2-12 GHz. Two classification methods (k-nearest neighbors and linear discriminant analysis) were trained to predict the phoneme identity from the spectral data. With linear discriminant analysis, cross-validated phoneme recognition rates of 93% and 85% were achieved for the two subjects. Although these results are speaker- and session-dependent, they suggest that electromagnetic transmission and reflection measurements of the vocal tract have great potential for future silent-speech interfaces.
引用
收藏
页码:2404 / 2411
页数:8
相关论文
共 39 条
  • [1] [Anonymous], 2015, 2015 INT JOINT C NEU
  • [2] [Anonymous], 2009, NIPS WORKSH DEEP LEA
  • [3] Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217
  • [4] Modeling Consonant-Vowel Coarticulation for Articulatory Speech Synthesis
    Birkholz, Peter
    [J]. PLOS ONE, 2013, 8 (04):
  • [5] Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces
    Bocquelet, Florent
    Hueber, Thomas
    Girin, Laurent
    Savariaux, Christophe
    Yvert, Blaise
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2016, 12 (11)
  • [6] DNN-based Ultrasound-to-Speech Conversion for a Silent Speech Interface
    Csapo, Temas Gabor
    Grosz, Tamas
    Gosztolya, Gabor
    Toth, Laszlo
    Marko, Alexandra
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3672 - 3676
  • [7] Silent speech interfaces
    Denby, B.
    Schultz, T.
    Honda, K.
    Hueber, T.
    Gilbert, J. M.
    Brumberg, J. S.
    [J]. SPEECH COMMUNICATION, 2010, 52 (04) : 270 - 287
  • [8] Denby B., 2011, P 9 INT SEM SPEECH P, P89
  • [9] Diener L, 2016, IEEE ENG MED BIO, P888, DOI 10.1109/EMBC.2016.7590843
  • [10] Ultrawideband Speech Sensing
    Eid, Ahmed M.
    Wallace, Jon W.
    [J]. IEEE ANTENNAS AND WIRELESS PROPAGATION LETTERS, 2009, 8 : 1414 - 1417