Audio-Visual Emotion Recognition Based on a DBN Model with Constrained Asynchrony

被引:3
作者
Chen, Danqi [1 ]
Jiang, Dongmei [1 ]
Ravyse, Ilse [2 ]
Sahli, Hichem [2 ]
机构
[1] Northwestern Polytech Univ, VUB NPU Joint Res Grp AVSP, Xian 710072, Peoples R China
[2] Vrije Univ Brussel, VUB NPU Joint Res Grp AVSP, Interdisciplinary Inst Broadband Technol IBBT, ETRO Dept, B-1050 Brussels, Belgium
来源
PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON IMAGE AND GRAPHICS (ICIG 2009) | 2009年
基金
中国国家自然科学基金;
关键词
audio visual multi-stream; asynchronous DBN model;
D O I
10.1109/ICIG.2009.120
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents an audio visual multi-stream DBN model (Asy_DBN) for emotion recognition with constraint asynchrony, in which audio state and visual state transit individually in their corresponding stream but the transition is constrained by the allowed maximum audio visual asynchrony. Emotion recognition experiments of Asy_DBN with different asynchrony constraints are carried out on an audio visual speech database of four emotions, and compared with the single stream HMM, state synchronous HMM (Syn_HMM) and state synchronous DBN model, as well the state asynchronous DBN model without asynchrony constraint. Results show that by setting the appropriate maximum asynchrony constraint between audio and visual streams, the proposed audio visual asynchronous DBN model gets the highest emotion recognition performance, with an improvement of 15% over Syn_HMM.
引用
收藏
页码:912 / 916
页数:5
相关论文
共 21 条
[1]  
[Anonymous], P INT PITTSB PA SEPT
[2]  
BILMES J, 2002, IEEE INT C AC SPEECH
[3]  
Busso C., 2004, P 6 INT C MULT INT, P205
[4]  
Gowdy JN, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P993
[5]  
Hou Y, 2007, LECT NOTES COMPUT SC, V4678, P340
[6]   Toward detecting emotions in spoken dialogs [J].
Lee, CM ;
Narayanan, SS .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (02) :293-303
[7]  
Livescu K, 2007, INT CONF ACOUST SPEE, P621
[8]  
Pal P., 2006, P IEEE INT C AC SPEE, V2, P721
[9]   Toward an affect-sensitive multimodal human-computer interaction [J].
Pantic, M ;
Rothkrantz, LJM .
PROCEEDINGS OF THE IEEE, 2003, 91 (09) :1370-1390
[10]   Audiovisual discrimination between laughter and speech [J].
Petridis, Stavros ;
Pantic, Maja .
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, :5117-5120