Geometrical-based lip-reading using template probabilistic multi-dimension dynamic time warping

被引:14
作者
Ibrahim, M. Z. [1 ]
Mulvaney, D. J. [2 ]
机构
[1] Univ Malaysia Pahang, Fac Elect & Elect Engn, Pahang 26300, Malaysia
[2] Univ Loughborough, Sch Elect Elect & Syst Engn, Loughborough LE11 3TU, Leics, England
关键词
Lip reading; Lip geometry; Mouth detection; Skin segmentation; Convex hull; Multi dimension dynamic time warping; Template probabilistic; OpenCV; SPEECH; RECOGNITION; EXTRACTION; FACE; INFORMATION; FEATURES; EYE;
D O I
10.1016/j.jvcir.2015.04.013
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
By identifying lip movements and characterizing their associations with speech sounds, the performance of speech recognition systems can be improved, particularly when operating in noisy environments. In this paper, we present a geometrical-based automatic lip reading system that extracts the lip region from images using conventional techniques, but the contour itself is extracted using a novel application of a combination of border following and convex hull approaches. Classification is carried out using an enhanced dynamic time warping technique that has the ability to operate in multiple dimensions and a template probability technique that is able to compensate for differences in the way words are uttered in the training set. The performance of the new system has been assessed in recognition of the English digits 0 to 9 as available in the CUAVE database. The experimental results obtained from the new approach compared favorably with those of existing lip reading approaches, achieving a word recognition accuracy of up to 71% with the visual information being obtained from estimates of lip height, width and their ratio. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:219 / 233
页数:15
相关论文
共 67 条
  • [1] Albiol A, 2001, IEEE IMAGE PROC, P122, DOI 10.1109/ICIP.2001.958968
  • [2] Aleksic P.S., 2005, HDB IMAGE VIDEO PROC
  • [3] [Anonymous], MPEG 4 FACIAL ANIMAT
  • [4] [Anonymous], 1993, PRENTICE HALL SIGNAL
  • [5] [Anonymous], 2008, Learning OpenCV
  • [6] [Anonymous], 2007, INFORM RETRIEVAL MUS
  • [7] [Anonymous], 2003, PROC GRAPHICON
  • [8] Aron J., 2011, NEW SCI, V212, P24, DOI [10.1016/S0262-4079(11)62647-X, DOI 10.1016/S0262-4079(11)62647-X]
  • [9] Real time face and mouth recognition using radial basis function neural networks
    Balasubramanian, M.
    Palanivel, S.
    Ramalingam, V.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 6879 - 6888
  • [10] Benhaim E., 2013, INT C AC SPEECH SIGN