A pilot study on augmented speech communication based on Electro-Magnetic Articulography

被引:4
作者
Heracleous, Panikos [1 ]
Badin, Pierre [2 ]
Bailly, Gerard [2 ]
Hagita, Norihiro [1 ]
机构
[1] Intelligent Robot & Commun Labs, ATR, Kyoto 6190288, Japan
[2] Univ Grenoble, CNRS, GIPSA Lab, Speech & Cognit Dept,UMR 5216, F-38402 St Martin Dheres, France
关键词
Augmented speech; Electro-Magnetic Articulography (EMA); Automatic speech recognition; Hidden Markov model (HMMs); Fusion; Noise robustness; RECOGNITION; SYSTEMS;
D O I
10.1016/j.patrec.2011.02.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech is the most natural form of communication for human beings. However, in situations where audio speech is not available because of disability or adverse environmental condition, people may resort to alternative methods such as augmented speech, that is, audio speech supplemented or replaced by other modalities, such as audiovisual speech, or Cued Speech. This article introduces augmented speech communication based on Electro-Magnetic Articulography (EMA). Movements of the tongue, lips, and jaw are tracked by EMA and are used as features to create hidden Markov models (HMMs). In addition, automatic phoneme recognition experiments are conducted to examine the possibility of recognizing speech only from articulation, that is, without any audio information. The results obtained are promising, which confirm that phonetic features characterizing articulation are as discriminating as those characterizing acoustics (except for voicing). This article also describes experiments conducted in noisy environments using fused audio and EMA parameters. It has been observed that when EMA parameters are fused with noisy audio speech, the recognition rate increases significantly as compared with using noisy audio speech only. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:1119 / 1125
页数:7
相关论文
共 27 条
  • [1] Adjoudani A., 1996, Speechreading by Humans and Machines, P461
  • [2] Can you 'read' tongue movements? Evaluation of the contribution of tongue display to speech understanding
    Badin, Pierre
    Tarabalka, Yuliya
    Elisei, Frederic
    Bailly, Gerard
    [J]. SPEECH COMMUNICATION, 2010, 52 (06) : 493 - 503
  • [3] Chen TH, 2001, IEEE SIGNAL PROC MAG, V18, P9
  • [4] CORNETT RO, 1967, AM ANN DEAF, V112, P3
  • [5] Development of a (silent) speech recognition system for patients following laryngectomy
    Fagan, M. J.
    Ell, S. R.
    Gilbert, J. M.
    Sarrazin, E.
    Chapman, P. M.
    [J]. MEDICAL ENGINEERING & PHYSICS, 2008, 30 (04) : 419 - 425
  • [6] SPEAKER-INDEPENDENT ISOLATED WORD RECOGNITION USING DYNAMIC FEATURES OF SPEECH SPECTRUM
    FURUI, S
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (01): : 52 - 59
  • [7] Gillick L., 1989, P ICASSP, P532
  • [8] Hennecke M.E., 1996, SPEECHREADING HUMANS, P331
  • [9] Unvoiced speech recognition using tissue-conductive acoustic sensor
    Heracleous, Panikos
    Kaino, Tomomi
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2007,
  • [10] Lip Shape and Hand Position Fusion for Automatic Vowel Recognition in Cued Speech for French
    Heracleous, Panikos
    Aboutabit, Noureddine
    Beautemps, Denis
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2009, 16 (05) : 339 - 342