Audio-visual person authentication using lip-motion from orientation maps

被引:29
作者
Faraj, Maycel-Isaac [1 ]
Bigun, Josef [1 ]
机构
[1] Halmstad Univ, Sch Informat Sci Comp & Elect Engn, SE-30118 Halmstad, Sweden
基金
新加坡国家研究基金会;
关键词
audio-visual recognition; biometrics; biometric recognition; speaker verification; speaker authentication; person identification; lip-movements; motion; structure tensor; orientation; optical flow; hidden Markov model; Gaussian Markov model;
D O I
10.1016/j.patrec.2007.02.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a new identity authentication technique by a synergetic use of lip-motion and speech. The lip-motion is defined as the distribution of apparent velocities in the movement of brightness patterns in an image and is estimated by computing the velocity components of the structure tensor by I D processing, in 2D manifolds. Since the velocities are computed without extracting the speaker's lip-contours, more robust visual features can be obtained in comparison to motion features extracted from lip-contours. The motion estimations are performed in a rectangular lip-region, which affords increased computational efficiency. A person authentication implementation based on lip-movements and speech is presented along with experiments exhibiting a recognition rate of 98%. Besides its value in authentication, the technique can be used naturally to evaluate the "liveness" of someone speaking as it can be used in text-prompted dialogue. The XM2VTS database was used for performance quantification as it is currently the largest publicly available database ( 300 persons) containing both lip-motion and speech. Comparisons with other techniques are presented. (c) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:1368 / 1382
页数:15
相关论文
共 39 条
[1]  
Bigun ES, 1997, LECT NOTES COMPUT SC, V1206, P291, DOI 10.1007/BFb0016008
[2]   MULTIDIMENSIONAL ORIENTATION ESTIMATION WITH APPLICATIONS TO TEXTURE ANALYSIS AND OPTICAL-FLOW [J].
BIGUN, J ;
GRANLUND, GH ;
WIKLUND, J .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1991, 13 (08) :775-790
[3]  
Bigun J., 1987, Proceedings of the First International Conference on Computer Vision (Cat. No.87CH2465-3), P433
[4]  
Bigun J, 2006, VISION DIRECTION
[5]   PERSON IDENTIFICATION USING MULTIPLE CUES [J].
BRUNELLI, R ;
FALAVIGNA, D .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1995, 17 (10) :955-966
[6]   A review of speech-based bimodal recognition [J].
Chibelushi, CC ;
Deravi, F ;
Mason, JSD .
IEEE TRANSACTIONS ON MULTIMEDIA, 2002, 4 (01) :23-37
[7]  
Dieckmann U, 1997, LECT NOTES COMPUT SC, V1206, P301, DOI 10.1007/BFb0016009
[8]  
Duc B, 1997, INT CONF ACOUST SPEE, P3053, DOI 10.1109/ICASSP.1997.595436
[9]   Audio-Visual Speech Modeling for Continuous Speech Recognition [J].
Dupont, Stephane ;
Luettin, Juergen .
IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) :141-151
[10]  
FARAJ MI, 2006, IEEE COMP SOC C COMP, P37