Spatial-Temporal Recurrent Neural Network for Emotion Recognition

被引：283

作者：

Zhang, Tong ^{[1
,2
]}

Zheng, Wenming ^{[3
]}

Cui, Zhen ^{[4
]}

Zong, Yuan ^{[3
]}

Li, Yang ^{[1
,2
]}

机构：

[1] Southeast Univ, Key Lab Child Dev & Learning Sci, Minist Educ, Nanjing 210096, Jiangsu, Peoples R China

[2] Southeast Univ, Dept Informat Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China

[3] Southeast Univ, Res Ctr Learning Sci, Minist Educ, Key Lab Child Dev & Learning Sci, Nanjing 210096, Jiangsu, Peoples R China

[4] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Jiangsu, Peoples R China

来源：

IEEE TRANSACTIONS ON CYBERNETICS | 2019年 / 49卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Electroencephalogram (EEG) emotion recognition; emotion recognition; facial expression recognition; spatial- temporal recurrent neural network (STRNN);

D O I：

10.1109/TCYB.2017.2788081

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we propose a novel deep learning framework, called spatial-temporal recurrent neural network (STRNN), to integrate the feature learning from both spatial and temporal information of signal sources into a unified spatial-temporal dependency model. In STRNN, to capture those spatially co-occurrent variations of human emotions, a multidirectional recurrent neural network (RNN) layer is employed to capture long-range contextual cues by traversing the spatial regions of each temporal slice along different directions. Then a hi-directional temporal RNN layer is further used to learn the discriminative features characterizing the temporal dependencies of the sequences, where sequences are produced from the spatial RNN layer. To further select those salient regions with more discriminative ability for emotion recognition, we impose sparse projection onto those hidden states of spatial and temporal domains to improve the model discriminant ability. Consequently, the proposed two-layer RNN model provides an effective way to make use of both spatial and temporal dependencies of the input signals for emotion recognition. Experimental results on the public emotion datasets of electroencephalogram and facial expression demonstrate the proposed STRNN method is more competitive over those state-of-the-art methods.

引用

页码：839 / 847

页数：9

共 36 条

[1]

Arik S.O., 2017, CONVOLUTIONAL RECURR

[2] A tutorial on Support Vector Machines for pattern recognition [J].

Burges, CJC .

DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167

[3] Video Based Emotion Recognition Using CNN and BRNN [J].

Cai, Youyi ;

Zheng, Wenming ;

Zhang, Tong ;

Li, Qiang ;

Cui, Zhen ;

Ye, Jiayin .

PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 :679-691

[4]

Choi K, 2017, INT CONF ACOUST SPEE, P2392, DOI 10.1109/ICASSP.2017.7952585

[5]

Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714

[6]

Duan RN, 2013, I IEEE EMBS C NEUR E, P81, DOI 10.1109/NER.2013.6695876

[7]

Duda R. O., 2012, PATTERN CLASSIFICATI

[8]

Graves A, 2014, PR MACH LEARN RES, V32, P1764

[9]

Graves A, 2013, INT CONF ACOUST SPEE, P6645, DOI 10.1109/ICASSP.2013.6638947

[10] Canonical correlation analysis: An overview with application to learning methods [J].

Hardoon, DR ;

Szedmak, S ;

Shawe-Taylor, J .

NEURAL COMPUTATION, 2004, 16 (12) :2639-2664

← 1 2 3 4 →