Spatio-Temporal Encoder-Decoder Fully Convolutional Network for Video-Based Dimensional Emotion Recognition

被引:20
|
作者
Du, Zhengyin [1 ]
Wu, Suowei [2 ]
Huang, Di [1 ]
Li, Weixin [3 ]
Wang, Yunhong [3 ]
机构
[1] Beihang Univ, Beijing Adv Innovat Ctr Big Data & Brain Comp, Sch Comp Sci & Engn, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
[2] Beihang Univ, Beijing Adv Innovat Ctr Big Data & Brain Comp, Sino French Engineer Sch, Beijing 100191, Peoples R China
[3] Beihang Univ, Beijing Adv Innovat Ctr Big Data & Brain Comp, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Emotion recognition; Convolution; Decoding; Feature extraction; Videos; Visualization; Task analysis; Dimensional emotion recognition; spatio-temporal fully convolutional network; temporal hourglass CNN; temporal intermediate supervision; EXPRESSION RECOGNITION;
D O I
10.1109/TAFFC.2019.2940224
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video-based dimensional emotion recognition aims to map human affect into the dimensional emotion space based on visual signals, which is a fundamental challenge in affective computing and human-computer interaction. In this paper, we present a novel encoder-decoder framework to tackle this problem. It adopts a fully convolutional design with the cascaded 2D convolution based spatial encoder and 1D convolution based temporal encoder-decoder for joint spatio-temporal modeling. In particular, to address the key issue of capturing discriminative long-term dynamic dependency, our temporal model, referred to as Temporal Hourglass Convolutional Neural Network (TH-CNN), extracts contextual relationship through integrating both low-level encoded and high-level decoded clues. Temporal Intermediate Supervision (TIS) is then introduced to enhance affective representations generated by TH-CNN under a multi-resolution strategy, which guides TH-CNN to learn macroscopic long-term trend and refined short-term fluctuations progressively. Furthermore, thanks to TH-CNN and TIS, knowledge learnt from the intermediate layers also makes it possible to offer customized solutions to different applications by adjusting the decoder depth. Extensive experiments are conducted on three benchmark databases (RECOLA, SEWA and OMG) and superior results are shown compared to state-of-the-art methods, which indicates the effectiveness of the proposed approach.
引用
收藏
页码:565 / 578
页数:14
相关论文
共 50 条
  • [41] Sequential sEMG Recognition With Knowledge Transfer and Dynamic Graph Network Based on Spatio-Temporal Feature Extraction Network
    Li, Zhilin
    Chen, Xianghe
    Li, Jie
    Bai, Zhongfei
    Ji, Hongfei
    Liu, Lingyu
    Jin, Lingjing
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2025, 29 (02) : 887 - 899
  • [42] Spatio-temporal convolutional emotional attention network for spotting macro- and micro-expression intervals in long video sequences
    Pan, Hang
    Xie, Lun
    Wang, Zhiliang
    PATTERN RECOGNITION LETTERS, 2022, 162 : 89 - 96
  • [43] Channel attention convolutional aggregation network based on video-level features for EEG emotion recognition
    Feng, Xin
    Cong, Ping
    Dong, Lin
    Xin, Yongxian
    Miao, Fengbo
    Xin, Ruihao
    COGNITIVE NEURODYNAMICS, 2024, 18 (04) : 1689 - 1707
  • [44] STGATE: Spatial-temporal graph attention network with a transformer encoder for EEG-based emotion recognition
    Li, Jingcong
    Pan, Weijian
    Huang, Haiyun
    Pan, Jiahui
    Wang, Fei
    FRONTIERS IN HUMAN NEUROSCIENCE, 2023, 17
  • [45] Three-dimensional feature maps and convolutional neural network-based emotion recognition
    Zheng, Xiangwei
    Yu, Xiaomei
    Yin, Yongqiang
    Li, Tiantian
    Yan, Xiaoyan
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (11) : 6312 - 6336
  • [46] An adversarial discriminative temporal convolutional network for EEG-based cross-domain emotion recognition
    He, Zhipeng
    Zhong, Yongshi
    Pan, Jiahui
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 141
  • [47] Spatio-Temporal Transformer with Kolmogorov-Arnold Network for Skeleton-Based Hand Gesture Recognition
    Han, Pengcheng
    He, Xin
    Matsumaru, Takafumi
    Dutta, Vibekananda
    SENSORS, 2025, 25 (03)
  • [48] Human Emotion Recognition Based on Spatio-Temporal Facial Features Using HOG-HOF and VGG-LSTM
    Chouhayebi, Hajar
    Mahraz, Mohamed Adnane
    Riffi, Jamal
    Tairi, Hamid
    Alioua, Nawal
    COMPUTERS, 2024, 13 (04)
  • [49] Multi-Channel EEG Based Emotion Recognition Using Temporal Convolutional Network and Broad Learning System
    Jia, Xue
    Zhang, Tong
    Chen, C. L. Philip
    Liu, Zhulin
    Chen, Long
    Wen, Guihua
    Hu, Bin
    2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 2452 - 2457
  • [50] STSNet: a novel spatio-temporal-spectral network for subject-independent EEG-based emotion recognition
    Li, Rui
    Ren, Chao
    Zhang, Sipo
    Yang, Yikun
    Zhao, Qiqi
    Hou, Kechen
    Yuan, Wenjie
    Zhang, Xiaowei
    Hu, Bin
    HEALTH INFORMATION SCIENCE AND SYSTEMS, 2023, 11 (01)