Some observations on computer lip-reading: moving from the dream to the reality

被引:2
作者
Bear, Helen L. [1 ]
Owen, Gari [2 ]
Harvey, Richard [1 ]
Theobald, Barry-John [1 ]
机构
[1] Univ E Anglia, Norwich NR4 7TJ, Norfolk, England
[2] Annwvyn Solut, Kent BR1 3DW, England
来源
OPTICS AND PHOTONICS FOR COUNTERTERRORISM, CRIME FIGHTING, AND DEFENCE X; AND OPTICAL MATERIALS AND BIOMATERIALS IN SECURITY AND DEFENCE SYSTEMS TECHNOLOGY XI | 2014年 / 9253卷
关键词
Lip-reading; speech recognition; pattern recognition; VISUAL INTELLIGIBILITY; PERCEPTION;
D O I
10.1117/12.2067464
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In the quest for greater computer lip-reading performance there are a number of tacit assumptions which are either present in the datasets (high resolution for example) or in the methods (recognition of spoken visual units called "visemes" for example). Here we review these and other assumptions and show the surprising result that computer lip-reading is not heavily constrained by video resolution, pose, lighting and other practical factors. However, the working assumption that visemes, which are the visual equivalent of phonemes, are the best unit for recognition does need further examination. We conclude that visemes, which were defined over a century ago, are unlikely to be optimal for a modern computer lip-reading system.
引用
收藏
页数:10
相关论文
共 30 条
[1]  
[Anonymous], 2004, ICMI'04-Sixth International Conference on Multimodal Interfaces, DOI [DOI 10.1145/1027933.1027972, 10.1145/1027933.1027972]
[2]  
[Anonymous], LIP READING PRINCIPL
[3]  
[Anonymous], 1998, Perceiving talking faces: From speech perception to a behavioral principle, MIT Press/Bradford Books series in cognitive psychology
[4]  
[Anonymous], THESIS
[5]  
Association I.P., 1999, HDB INT PHON ASS GUI
[6]  
Bear H., 2014, IEEE INT C IM PROC
[7]   VISUAL INTELLIGIBILITY OF CONSONANTS - LIPREADING SCREENING-TEST WITH IMPLICATIONS FOR AURAL REHABILITATION [J].
BINNIE, CA ;
JACKSON, PL ;
MONTGOMERY, AA .
JOURNAL OF SPEECH AND HEARING DISORDERS, 1976, 41 (04) :530-539
[8]  
Bowden R., 2013, SPIE SECURITY DEFENC
[9]  
Bowden R., 2012, SPIE, V8546
[10]  
Bozkurt, 2007, 3DTV C, P1