Local Ordinal Contrast Pattern Histograms for Spatiotemporal, Lip-Based Speaker Authentication

被引:43
作者
Chan, Chi Ho [1 ]
Goswami, Budhaditya [1 ]
Kittler, Josef [1 ]
Christmas, William [1 ]
机构
[1] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England
关键词
Biometrics; dynamic texture; lip; ordinal contrast; spatiotemporal; speaker verification; texture descriptor; MOTION FEATURES; TEXTURE;
D O I
10.1109/TIFS.2011.2175920
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Lip region deformation during speech contains biometric information and is termed visual speech. This biometric information can be interpreted as being genetic or behavioral depending on whether static or dynamic features are extracted. In this paper, we use a texture descriptor called local ordinal contrast pattern (LOCP) with a dynamic texture representation called three orthogonal planes to represent both the appearance and dynamics features observed in visual speech. This feature representation, when used in standard speaker verification engines, is shown to improve the performance of the lip-biometric trait compared to the state-of-the-art. The best baseline state-of-the-art performance was a half total error rate (HTER) of 13.35% for the XM2VTS database. We obtained HTER of less than 1%. The resilience of the LOCP texture descriptor to random image noise is also investigated. Finally, the effect of the amount of video information on speaker verification performance suggests that with the proposed approach, speaker identity can be verified with amuch shorter biometric trait record than the length normally required for voice-based biometrics. In summary, the performance obtained is remarkable and suggests that there is enough discriminative information in the mouth-region to enable its use as a primary biometric trait.
引用
收藏
页码:602 / 612
页数:11
相关论文
共 33 条
[1]   Lips tracking biometrics for speaker recognition [J].
Abdulla, Waleed H. ;
Yu, Paul W. T. ;
Calverly, Paul .
INTERNATIONAL JOURNAL OF BIOMETRICS, 2009, 1 (03) :288-306
[2]  
[Anonymous], 2004, COMBINING PATTERN CL, DOI DOI 10.1002/0471660264
[3]  
[Anonymous], 1999, 2 INT C AUD VID BAS
[4]  
[Anonymous], BIOMETRICS THEORY AP
[5]  
Bengio S., 2001, EVALUATION BIOMETRIC
[6]  
Broun CC, 2002, INT CONF ACOUST SPEE, P685
[7]   Discriminative analysis of lip motion features for speaker identification and speech-reading [J].
Cetinguel, H. Ertan ;
Yemez, Yuecel ;
Erzin, Engin ;
Tekalp, A. Murat .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2006, 15 (10) :2879-2891
[8]   Multimodal speaker/speech recognition using lip motion, lip texture and audio [J].
Cetingul, H. E. ;
Erzin, E. ;
Yemez, Y. ;
Tekalp, A. M. .
SIGNAL PROCESSING, 2006, 86 (12) :3549-3558
[9]   The use of lip motion for biometric speaker identification [J].
Çetingül, HE ;
Yemez, Y ;
Erzin, E ;
Tekalp, AM .
PROCEEDINGS OF THE IEEE 12TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, 2004, :148-151
[10]  
Chan C., 2008, THESIS U SURREY SURR