Spatio-Temporal Encoder-Decoder Fully Convolutional Network for Video-Based Dimensional Emotion Recognition

被引:20
|
作者
Du, Zhengyin [1 ]
Wu, Suowei [2 ]
Huang, Di [1 ]
Li, Weixin [3 ]
Wang, Yunhong [3 ]
机构
[1] Beihang Univ, Beijing Adv Innovat Ctr Big Data & Brain Comp, Sch Comp Sci & Engn, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
[2] Beihang Univ, Beijing Adv Innovat Ctr Big Data & Brain Comp, Sino French Engineer Sch, Beijing 100191, Peoples R China
[3] Beihang Univ, Beijing Adv Innovat Ctr Big Data & Brain Comp, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Emotion recognition; Convolution; Decoding; Feature extraction; Videos; Visualization; Task analysis; Dimensional emotion recognition; spatio-temporal fully convolutional network; temporal hourglass CNN; temporal intermediate supervision; EXPRESSION RECOGNITION;
D O I
10.1109/TAFFC.2019.2940224
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video-based dimensional emotion recognition aims to map human affect into the dimensional emotion space based on visual signals, which is a fundamental challenge in affective computing and human-computer interaction. In this paper, we present a novel encoder-decoder framework to tackle this problem. It adopts a fully convolutional design with the cascaded 2D convolution based spatial encoder and 1D convolution based temporal encoder-decoder for joint spatio-temporal modeling. In particular, to address the key issue of capturing discriminative long-term dynamic dependency, our temporal model, referred to as Temporal Hourglass Convolutional Neural Network (TH-CNN), extracts contextual relationship through integrating both low-level encoded and high-level decoded clues. Temporal Intermediate Supervision (TIS) is then introduced to enhance affective representations generated by TH-CNN under a multi-resolution strategy, which guides TH-CNN to learn macroscopic long-term trend and refined short-term fluctuations progressively. Furthermore, thanks to TH-CNN and TIS, knowledge learnt from the intermediate layers also makes it possible to offer customized solutions to different applications by adjusting the decoder depth. Extensive experiments are conducted on three benchmark databases (RECOLA, SEWA and OMG) and superior results are shown compared to state-of-the-art methods, which indicates the effectiveness of the proposed approach.
引用
收藏
页码:565 / 578
页数:14
相关论文
共 50 条
  • [1] A Novel Spatio-Temporal 3D Convolutional Encoder-Decoder Network for Dynamic Saliency Prediction
    Li, Hao
    Qi, Fei
    Shi, Guangming
    IEEE ACCESS, 2021, 9 : 36328 - 36341
  • [2] Multiple Spatio-temporal Feature Learning for Video-based Emotion Recognition in the Wild
    Lu, Cheng
    Zheng, Wenming
    Li, Chaolong
    Tang, Chuangao
    Liu, Suyuan
    Yan, Simeng
    Zong, Yuan
    ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 646 - 652
  • [3] Video-based driver emotion recognition using hybrid deep spatio-temporal feature learning
    Varma, Harshit
    Ganapathy, Nagarajan
    Deserno, Thomas M.
    MEDICAL IMAGING 2022: IMAGING INFORMATICS FOR HEALTHCARE, RESEARCH, AND APPLICATIONS, 2022, 12037
  • [4] FBSTCNet: A Spatio-Temporal Convolutional Network Integrating Power and Connectivity Features for EEG-Based Emotion Decoding
    Huang, Weichen
    Wang, Wenlong
    Li, Yuanqing
    Wu, Wei
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (04) : 1906 - 1918
  • [5] Spatio-Temporal Representation of an Electoencephalogram for Emotion Recognition Using a Three-Dimensional Convolutional Neural Network
    Cho, Jungchan
    Hwang, Hyoseok
    SENSORS, 2020, 20 (12) : 1 - 18
  • [6] Spatio-Temporal PM2.5 Forecasting in Thailand Using Encoder-Decoder Networks
    Sirisumpun, Natch
    Wongwailikhit, Kritchart
    Painmanakul, Pisut
    Vateekul, Peerapon
    IEEE ACCESS, 2023, 11 : 69601 - 69613
  • [7] Exploring Spatio–Temporal Graph Convolution for Video-Based Human–Object Interaction Recognition
    Wang, Ning
    Zhu, Guangming
    Li, Hongsheng
    Feng, Mingtao
    Zhao, Xia
    Ni, Lan
    Shen, Peiyi
    Mei, Lin
    Zhang, Liang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 5814 - 5827
  • [8] SSTD: A Novel Spatio-Temporal Demographic Network for EEG-Based Emotion Recognition
    Li, Rui
    Ren, Chao
    Li, Chen
    Zhao, Nan
    Lu, Dawei
    Zhang, Xiaowei
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2023, 10 (01) : 376 - 387
  • [9] Dynamic Hand Gesture Recognition Using Improved Spatio-Temporal Graph Convolutional Network
    Song, Jae-Hun
    Kong, Kyeongbo
    Kang, Suk-Ju
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6227 - 6239
  • [10] A novel spatio-temporal convolutional neural framework for multimodal emotion recognition
    Sharafi, Masoumeh
    Yazdchi, Mohammadreza
    Rasti, Reza
    Nasimi, Fahimeh
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 78