Reconstructing Tongue Movements from Audio and Video

被引:0
作者
Kjellstrom, Hedvig [1 ]
Engwall, Olov [2 ]
Balter, Olle [3 ]
机构
[1] Swedish Def Res Agcy, Div Sensor Technol, SE-16490 Stockholm, Sweden
[2] KTH, CSC, Ctr Speech Technol CTT, SE-10044 Stockholm, Sweden
[3] KTH, CSC, Human Comp Interact Grp, SE-10044 Stockholm, Sweden
来源
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 | 2006年
基金
瑞典研究理事会;
关键词
audio-visual to articulatory inversion;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an approach to articulatory inversion using audio and video of the user's face, requiring no special markers. The video is stabilized with respect to the face, and the mouth region cropped out. The mouth image is projected into a learned independent component subspace to obtain a low-dimensional representation of the mouth appearance. The inversion problem is treated as one of regression; a non-linear regressor using relevance vector machines is trained with a dataset of simultaneous images of a subject's face, acoustic features and positions of magnetic coils glued to the subjects's tongue. The results show the benefit of using both cues for inversion. We envisage the inversion method to be part of a pronunciation training system with articulatory feedback.
引用
收藏
页码:2238 / +
页数:2
相关论文
共 23 条
[1]  
BAILLY G, 2002, INT C SPOK LANG PROC, P1913
[2]  
BESKOW J, 2003, ICPHS, P431
[3]  
BREGLER C, 1994, INT CONF ACOUST SPEE, P669, DOI 10.1109/ICASSP.1994.389567
[4]   Audio-Visual Speech Modeling for Continuous Speech Recognition [J].
Dupont, Stephane ;
Luettin, Juergen .
IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) :141-151
[5]   Combining MRI, EMA and EPG measurements in a three-dimensional tongue model [J].
Engwall, O .
SPEECH COMMUNICATION, 2003, 41 (2-3) :303-329
[6]  
ENGWALL O, 2003, EUROSPEECH, P49
[7]  
ENGWALL O, 2005, INTERSPEECH, P3205
[8]  
ENGWALL O, BEHAV INFOR IN PRESS
[9]  
Hyvärinen A, 2001, INDEPENDENT COMPONENT ANALYSIS: PRINCIPLES AND PRACTICE, P71
[10]  
JIANG J, 2000, ICSLP