A ROBUST AND REAL-TIME VISUAL SPEECH RECOGNITION FOR SMARTPHONE APPLICATION

被引:0
作者
Song, Min Gyu [1 ]
Tariquzzamani, Md [1 ]
Kim, Jin Young [1 ]
Hwang, Seong Taek [2 ]
Chi, Seung Ho [3 ]
机构
[1] Chonnam Natl Univ, Sch Elect & Comp Engn, Kwangju 500757, South Korea
[2] Samsung Elect, Multimedia Lab, IT Ctr, Commun Res Ctr, Suwon 442600, South Korea
[3] Dongshin Univ, Informat Ctr, Dept Comp Sci, Naju 520714, Chonnam, South Korea
来源
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL | 2012年 / 8卷 / 04期
关键词
Visual speech recognition; Lip localization; K-means clustering; Histogram matching; Lip folding; RASTA filter; FEATURE-EXTRACTION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual speech recognition (VSR) is one prospective complementary approach for speech recognition under very noisy environments, especially in mobile phone circumstances. In implementing visual speech recognition on a smartphone, the two main issues of real-time responsiveness and robustness conflict with each other. In this paper we proposed and implemented a robust visual speech recognition system that performs in real-time. First, we devised a robust and fast lip detection method based on eye-detection, which is not vulnerable to changes in illumination. The pair of eyes was determined based on image binarization and a coupled-eye validation method. Then the lip region was estimated by geometric lip candidate detection and k-means clustering. Second, to cope with the problem of lighting-dependent visual speech recognition performance, we combined the previous methods of lip-folding and RASTA filtering and introduced a modified histogram equalization, in which a mapping function was calculated for the first frame image and fixed through the following images. Third, the visual speech recognition system with 32 control words was implemented on a smartphone with code optimization. It was shown to work in real-time with promising results.
引用
收藏
页码:2837 / 2853
页数:17
相关论文
共 26 条
[1]   EFFECTIVENESS OF LINEAR PREDICTION CHARACTERISTICS OF SPEECH WAVE FOR AUTOMATIC SPEAKER IDENTIFICATION AND VERIFICATION [J].
ATAL, BS .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 55 (06) :1304-1312
[2]   Graphical model architectures for speech recognition [J].
Bilmes, JA ;
Bartels, C .
IEEE SIGNAL PROCESSING MAGAZINE, 2005, 22 (05) :89-100
[3]   A maximum A posteriori approach to speaker adaptation using the trended hidden Markov model [J].
Chengalvarayan, R ;
Deng, L .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (05) :549-557
[4]   A review of speech-based bimodal recognition [J].
Chibelushi, CC ;
Deravi, F ;
Mason, JSD .
IEEE TRANSACTIONS ON MULTIMEDIA, 2002, 4 (01) :23-37
[5]   Audio-Visual Speech Modeling for Continuous Speech Recognition [J].
Dupont, Stephane ;
Luettin, Juergen .
IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) :141-151
[6]  
Eyeno N., 1993, P IEEE INT C AC SPEE, P557
[7]   Robust distributed speech recognition using speech enhancement [J].
Flynn, Ronan ;
Jones, Edward .
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2008, 54 (03) :1267-1273
[8]   CEPSTRAL ANALYSIS TECHNIQUE FOR AUTOMATIC SPEAKER VERIFICATION [J].
FURUI, S .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1981, 29 (02) :254-272
[9]  
Gonzalez R. C., 1992, DIGITAL IMAGE PROCES, V2nd
[10]  
Gowdy JN, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P993