Multistage Spatio-Temporal Networks for Robust Sketch Recognition

被引:13
|
作者
Li, Hanhui [1 ]
Jiang, Xudong [2 ]
Guan, Boliang [3 ]
Wang, Ruomei [3 ]
Thalmann, Nadia Magnenat [4 ]
机构
[1] Sun Yat Sen Univ, Sch Intelligent Syst Engn, Shenzhen Campus, Shenzhen 518107, Peoples R China
[2] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore
[3] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China
[4] Univ Geneva, MIRALab, CH-1227 Geneva, Switzerland
基金
中国国家自然科学基金;
关键词
Convolutional neural networks; Image recognition; Stroke (medical condition); Network architecture; Image segmentation; Feature extraction; Recurrent neural networks; Sketch recognition; spatio-temporal feature; multi-modal networks; feature fusion; DEEP; FUSION;
D O I
10.1109/TIP.2022.3160240
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sketch recognition relies on two types of information, namely, spatial contexts like the local structures in images and temporal contexts like the orders of strokes. Existing methods usually adopt convolutional neural networks (CNNs) to model spatial contexts, and recurrent neural networks (RNNs) for temporal contexts. However, most of them combine spatial and temporal features with late fusion or single-stage transformation, which is prone to losing the informative details in sketches. To tackle this problem, we propose a novel framework that aims at the multi-stage interactions and refinements of spatial and temporal features. Specifically, given a sketch represented by a stroke array, we first generate a temporal-enriched image (TEI), which is a pseudo-color image retaining the temporal order of strokes, to overcome the difficulty of CNNs in leveraging temporal information. We then construct a dual-branch network, in which a CNN branch and a RNN branch are adopted to process the stroke array and the TEI respectively. In the early stages of our network, considering the limited ability of RNNs in capturing spatial structures, we utilize multiple enhancement modules to enhance the stroke features with the TEI features. While in the last stage of our network, we propose a spatio-temporal enhancement module that refines stroke features and TEI features in a joint feature space. Furthermore, a bidirectional temporal-compatible unit that adaptively merges features in opposite temporal orders, is proposed to help RNNs tackle abrupt strokes. Comprehensive experimental results on QuickDraw and TU-Berlin demonstrate that the proposed method is a robust and efficient solution for sketch recognition.
引用
收藏
页码:2683 / 2694
页数:12
相关论文
共 50 条
  • [1] Spatio-Temporal Attention Networks for Action Recognition and Detection
    Li, Jun
    Liu, Xianglong
    Zhang, Wenxuan
    Zhang, Mingyuan
    Song, Jingkuan
    Sebe, Nicu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (11) : 2990 - 3001
  • [2] A Novel Recognition and Classification Approach for Motor Imagery Based on Spatio-Temporal Features
    Lv, Renjie
    Chang, Wenwen
    Yan, Guanghui
    Nie, Wenchao
    Zheng, Lei
    Guo, Bin
    Sadiq, Muhammad Tariq
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2025, 29 (01) : 210 - 223
  • [3] Improved Spatio-Temporal Convolutional Neural Networks for Traffic Police Gestures Recognition
    Wu, Zhixuan
    Ma, Nan
    Cheung, Yiu-ming
    Li, Jiahong
    He, Qin
    Yao, Yongqiang
    Zhang, Guoping
    2020 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS 2020), 2020, : 109 - 115
  • [4] Learning Dynamic Spatio-Temporal Relations for Human Activity Recognition
    Liu, Zhenyu
    Yao, Yaqiang
    Liu, Yan
    Zhu, Yuening
    Tao, Zhenchao
    Wang, Lei
    Feng, Yuhong
    IEEE ACCESS, 2020, 8 : 130340 - 130352
  • [5] Efficient spatio-temporal network for action recognition
    Su, Yanxiong
    Zhao, Qian
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2024, 21 (05)
  • [6] MSA-GCN: Multistage Spatio-Temporal Aggregation Graph Convolutional Networks for Traffic Flow Prediction
    Feng, Ji
    Huang, Jiashuang
    Guo, Chang
    Shi, Zhenquan
    MATHEMATICS, 2024, 12 (21)
  • [7] DMSTG: Dynamic Multiview Spatio-Temporal Networks for Traffic Forecasting
    Diao, Zulong
    Wang, Xin
    Zhang, Dafang
    Xie, Gaogang
    Chen, Jianguo
    Pei, Changhua
    Meng, Xuying
    Xie, Kun
    Zhang, Guangxing
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (06) : 6865 - 6880
  • [8] STIRNet: A Spatio-Temporal Network for Air Formation Targets Intention Recognition
    Zhang, Chenhao
    Zhou, Yan
    Li, Hongquan
    Xu, Ying
    Qin, Yishuai
    Lei, Liang
    IEEE ACCESS, 2024, 12 : 44998 - 45010
  • [9] Phase Space Reconstruction Driven Spatio-Temporal Feature Learning for Dynamic Facial Expression Recognition
    Wang, Shanmin
    Shuai, Hui
    Liu, Qingshan
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (03) : 1466 - 1476
  • [10] Projection transform on spatio-temporal context for action recognition
    Wanru Xu
    Zhenjiang Miao
    Qiang Zhang
    Multimedia Tools and Applications, 2015, 74 : 7711 - 7728