HMM-based Automatic Visual Speech Segmentation Using Facial Data

被引:0
作者
Musti, Utpala [1 ]
Toutios, Asterios [1 ]
Ouni, Slim [1 ]
Colotte, Vincent [1 ]
Wrobel-Dautcourt, Brigitte [1 ]
Berger, Marie-Odile [1 ]
机构
[1] Univ Nancy 2, LORIA, UMR 7503, F-54506 Vandoeuvre Les Nancy, France
来源
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 | 2010年
关键词
facial speech; speech segmentation; forced alignment; coarticulation; MODELS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe automatic visual speech segmentation using facial data captured by a stereo-vision technique. The segmentation is performed using an HMM-based forced alignment mechanism widely used in automatic speech recognition. The idea is based on the assumption that using visual speech data alone for the training might capture the uniqueness in the facial component of speech articulation, asynchrony (time lags) in visual and acoustic speech segments and significant coarticulation effects. This should provide valuable information that helps to show the extent to which a phoneme may affect surrounding phonemes visually. This should provide information valuable in labeling the visual speech segments based on dominant coarticulatory contexts.
引用
收藏
页码:1401 / 1404
页数:4
相关论文
共 13 条
[1]  
Barker J, 1999, P 14 INT C PHON SCI
[2]  
GOVOKHINA O, 2007, P 6 ISCA WORKSH SPEE
[3]  
Kent, 1977, Journal of Phonetics, V5, P115, DOI DOI 10.1016/S0095-4470(19)31123-4
[4]   Accurate automatic visible speech synthesis of arbitrary 3D models based on concatenation of diviseme motion capture data [J].
Ma, JY ;
Cole, R ;
Pellom, B ;
Ward, W ;
Wise, B .
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2004, 15 (05) :485-500
[5]  
Massaro DW, 2006, LECT NOTES COMPUT SC, V4061, P809
[6]  
Minnis S., 2000, INT C SPOK LANG PROC, V2, P759
[7]   Tracking talking faces with shape and appearance models [J].
Odisio, M ;
Bailly, G ;
Elisei, F .
SPEECH COMMUNICATION, 2004, 44 (1-4) :63-82
[8]  
Pelachaud C., 1991, COMPUTER ANIMATION, V91, P15
[9]  
Robert V., 2005, AVSP, P65
[10]   VISUAL CONTRIBUTION TO SPEECH INTELLIGIBILITY IN NOISE [J].
SUMBY, WH ;
POLLACK, I .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1954, 26 (02) :212-215