Multistage Spatio-Temporal Networks for Robust Sketch Recognition

被引:13
作者
Li, Hanhui [1 ]
Jiang, Xudong [2 ]
Guan, Boliang [3 ]
Wang, Ruomei [3 ]
Thalmann, Nadia Magnenat [4 ]
机构
[1] Sun Yat Sen Univ, Sch Intelligent Syst Engn, Shenzhen Campus, Shenzhen 518107, Peoples R China
[2] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore
[3] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China
[4] Univ Geneva, MIRALab, CH-1227 Geneva, Switzerland
基金
中国国家自然科学基金;
关键词
Convolutional neural networks; Image recognition; Stroke (medical condition); Network architecture; Image segmentation; Feature extraction; Recurrent neural networks; Sketch recognition; spatio-temporal feature; multi-modal networks; feature fusion; DEEP; FUSION;
D O I
10.1109/TIP.2022.3160240
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sketch recognition relies on two types of information, namely, spatial contexts like the local structures in images and temporal contexts like the orders of strokes. Existing methods usually adopt convolutional neural networks (CNNs) to model spatial contexts, and recurrent neural networks (RNNs) for temporal contexts. However, most of them combine spatial and temporal features with late fusion or single-stage transformation, which is prone to losing the informative details in sketches. To tackle this problem, we propose a novel framework that aims at the multi-stage interactions and refinements of spatial and temporal features. Specifically, given a sketch represented by a stroke array, we first generate a temporal-enriched image (TEI), which is a pseudo-color image retaining the temporal order of strokes, to overcome the difficulty of CNNs in leveraging temporal information. We then construct a dual-branch network, in which a CNN branch and a RNN branch are adopted to process the stroke array and the TEI respectively. In the early stages of our network, considering the limited ability of RNNs in capturing spatial structures, we utilize multiple enhancement modules to enhance the stroke features with the TEI features. While in the last stage of our network, we propose a spatio-temporal enhancement module that refines stroke features and TEI features in a joint feature space. Furthermore, a bidirectional temporal-compatible unit that adaptively merges features in opposite temporal orders, is proposed to help RNNs tackle abrupt strokes. Comprehensive experimental results on QuickDraw and TU-Berlin demonstrate that the proposed method is a robust and efficient solution for sketch recognition.
引用
收藏
页码:2683 / 2694
页数:12
相关论文
共 50 条
  • [31] A Novel Spatio-Temporal Field for Emotion Recognition Based on EEG Signals
    Li, Wei
    Zhang, Zhen
    Hou, Bowen
    Li, Xiaoyu
    IEEE SENSORS JOURNAL, 2021, 21 (23) : 26941 - 26950
  • [32] MSTP-Net: Multiscale Spatio-Temporal Parallel Networks for Human Motion Prediction
    Chen, Lujing
    Liu, Rui
    Zhang, Wei
    Hou, Yaqing
    Zhang, Qiang
    Zhou, Dongsheng
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 3318 - 3331
  • [33] Histogram of Fuzzy Local Spatio-Temporal Descriptors for Video Action Recognition
    Zuo, Zheming
    Yang, Longzhi
    Liu, Yonghuai
    Chao, Fei
    Song, Ran
    Qu, Yanpeng
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (06) : 4059 - 4067
  • [34] Learning Spatio-Temporal Representations for Action Recognition: A Genetic Programming Approach
    Liu, Li
    Shao, Ling
    Li, Xuelong
    Lu, Ke
    IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (01) : 158 - 170
  • [35] Emotion Recognition From Full-Body Motion Using Multiscale Spatio-Temporal Network
    Wang, Tao
    Liu, Shuang
    He, Feng
    Dai, Weina
    Du, Minghao
    Ke, Yufeng
    Ming, Dong
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (03) : 898 - 912
  • [36] 3DCANN: A Spatio-Temporal Convolution Attention Neural Network for EEG Emotion Recognition
    Liu, Shuaiqi
    Wang, Xu
    Zhao, Ling
    Li, Bing
    Hu, Weiming
    Yu, Jie
    Zhang, Yu-Dong
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (11) : 5321 - 5331
  • [37] Spatio-Temporal Action Localization for Human Action Recognition in Large Dataset
    Megrhi, Sameh
    Jmal, Marwa
    Beghdadi, Azeddine
    Mseddi, Wided
    VIDEO SURVEILLANCE AND TRANSPORTATION IMAGING APPLICATIONS 2015, 2015, 9407
  • [38] Multimodal spatio-temporal framework for real-world affect recognition
    Raut, Karishma
    Kulkarni, Sujata
    Sawant, Ashwini
    International Journal of Intelligent Networks, 2024, 5 : 340 - 350
  • [39] Making Sense of Spatio-Temporal Preserving Representations for EEG-Based Human Intention Recognition
    Zhang, Dalin
    Yao, Lina
    Chen, Kaixuan
    Wang, Sen
    Chang, Xiaojun
    Liu, Yunhao
    IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (07) : 3033 - 3044
  • [40] Spatio-temporal feature classifier
    Wang, Yun
    Liu, Suxing
    Open Automation and Control Systems Journal, 2015, 7 (01): : 1 - 7