Visual Speech Recognition Using Weighted Dynamic Time Warping

被引:4
作者
Lee, Kyungsun [1 ]
Keum, Minseok [1 ]
Han, David K. [2 ]
Ko, Hanseok [1 ]
机构
[1] Korea Univ, Sch Elect Engn, Seoul, South Korea
[2] Off Naval Res, Arlington, VA 22217 USA
来源
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2015年 / E98D卷 / 07期
关键词
visual speech recognition; lip reading;
D O I
10.1587/transinf.2015EDL8002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
It is unclear whether Hidden Markov Model (HMM) or Dynamic Time Warping (DTW) mapping is more appropriate for visual speech recognition when only small data samples are available. In this letter, the two approaches are compared in terms of sensitivity to the amount of training samples and computing time with the objective of determining the tipping point. The limited training data problem is addressed by exploiting a straightforward template matching via weighted-DTW. The proposed framework is a refined DTW by adjusting the warping paths with judicially injected weights to ensure a smooth diagonal path for accurate alignment without added computational load. The proposed WDTW is evaluated on three databases (two in the public domain and one developed in-house) for visual recognition performance. Subsequent experiments indicate that the proposed WDTW significantly enhances the recognition rate compared to the DTW and HMM based algorithms, especially under limited data samples.
引用
收藏
页码:1430 / 1433
页数:4
相关论文
共 12 条
[1]  
[Anonymous], 2005, PROC IEEE COMPUT SOC
[2]  
Eng-Jon Ong, 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), P958, DOI 10.1109/ICCVW.2011.6130355
[3]   Weighted dynamic time warping for time series classification [J].
Jeong, Young-Seon ;
Jeong, Myong K. ;
Omitaomu, Olufemi A. .
PATTERN RECOGNITION, 2011, 44 (09) :2231-2240
[4]   Robust endpoint detection and energy normalization for real-time speech and speaker recognition [J].
Li, Q ;
Zheng, JS ;
Tsai, A ;
Zhou, QR .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (03) :146-157
[5]   Extraction of visual features for lipreading [J].
Matthews, I ;
Cootes, TF ;
Bangham, JA ;
Cox, S ;
Harvey, R .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (02) :198-213
[6]  
Newman J.L., 2010, INT C AUD VIS SPEECH, P1
[7]  
Niu ZH, 2006, INT C PATT RECOG, P1216
[8]   Multiresolution gray-scale and rotation invariant texture classification with local binary patterns [J].
Ojala, T ;
Pietikäinen, M ;
Mäenpää, T .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (07) :971-987
[9]   A TUTORIAL ON HIDDEN MARKOV-MODELS AND SELECTED APPLICATIONS IN SPEECH RECOGNITION [J].
RABINER, LR .
PROCEEDINGS OF THE IEEE, 1989, 77 (02) :257-286
[10]   DYNAMIC-PROGRAMMING ALGORITHM OPTIMIZATION FOR SPOKEN WORD RECOGNITION [J].
SAKOE, H ;
CHIBA, S .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1978, 26 (01) :43-49