Multistage Spatio-Temporal Networks for Robust Sketch Recognition

被引:13
作者
Li, Hanhui [1 ]
Jiang, Xudong [2 ]
Guan, Boliang [3 ]
Wang, Ruomei [3 ]
Thalmann, Nadia Magnenat [4 ]
机构
[1] Sun Yat Sen Univ, Sch Intelligent Syst Engn, Shenzhen Campus, Shenzhen 518107, Peoples R China
[2] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore
[3] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China
[4] Univ Geneva, MIRALab, CH-1227 Geneva, Switzerland
基金
中国国家自然科学基金;
关键词
Convolutional neural networks; Image recognition; Stroke (medical condition); Network architecture; Image segmentation; Feature extraction; Recurrent neural networks; Sketch recognition; spatio-temporal feature; multi-modal networks; feature fusion; DEEP; FUSION;
D O I
10.1109/TIP.2022.3160240
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sketch recognition relies on two types of information, namely, spatial contexts like the local structures in images and temporal contexts like the orders of strokes. Existing methods usually adopt convolutional neural networks (CNNs) to model spatial contexts, and recurrent neural networks (RNNs) for temporal contexts. However, most of them combine spatial and temporal features with late fusion or single-stage transformation, which is prone to losing the informative details in sketches. To tackle this problem, we propose a novel framework that aims at the multi-stage interactions and refinements of spatial and temporal features. Specifically, given a sketch represented by a stroke array, we first generate a temporal-enriched image (TEI), which is a pseudo-color image retaining the temporal order of strokes, to overcome the difficulty of CNNs in leveraging temporal information. We then construct a dual-branch network, in which a CNN branch and a RNN branch are adopted to process the stroke array and the TEI respectively. In the early stages of our network, considering the limited ability of RNNs in capturing spatial structures, we utilize multiple enhancement modules to enhance the stroke features with the TEI features. While in the last stage of our network, we propose a spatio-temporal enhancement module that refines stroke features and TEI features in a joint feature space. Furthermore, a bidirectional temporal-compatible unit that adaptively merges features in opposite temporal orders, is proposed to help RNNs tackle abrupt strokes. Comprehensive experimental results on QuickDraw and TU-Berlin demonstrate that the proposed method is a robust and efficient solution for sketch recognition.
引用
收藏
页码:2683 / 2694
页数:12
相关论文
共 50 条
  • [41] Modelling spatio-temporal ageing phenomena with deep Generative Adversarial Networks
    Papadopoulos, Stavros
    Dimitriou, Nikolaos
    Drosou, Anastasios
    Tzovaras, Dimitrios
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2021, 94
  • [42] Beyond Appearance: Multi-Frame Spatio-Temporal Context Memory Networks for Efficient and Robust Video Object Segmentation
    Dang, Jisheng
    Zheng, Huicheng
    Xu, Xiaohao
    Wang, Longguang
    Guo, Yulan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 4853 - 4866
  • [43] Spatio-Temporal Crime Prediction with Temporally Hierarchical Convolutional Neural Networks
    Ilhan, Fatih
    Tekin, Selim F.
    Aksoy, Bilgin
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [44] Braille letter reading: A benchmark for spatio-temporal pattern recognition on neuromorphic hardware
    Muller-Cleve, Simon F.
    Fra, Vittorio
    Khacef, Lyes
    Pequeno-Zurro, Alejandro
    Klepatsch, Daniel
    Forno, Evelina
    Ivanovich, Diego G.
    Rastogi, Shavika
    Urgese, Gianvito
    Zenke, Friedemann
    Bartolozzi, Chiara
    FRONTIERS IN NEUROSCIENCE, 2022, 16
  • [45] Fluxformer: Flow-Guided Duplex Attention Transformer via Spatio-Temporal Clustering for Action Recognition
    Hong, Younggi
    Kim, Min Ju
    Lee, Isack
    Yoo, Seok Bong
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (10) : 6411 - 6418
  • [46] View Invariant Spatio-Temporal Descriptor for Action Recognition from Skeleton Sequences
    Venkata Subbareddy K.
    Nirmala Devi L.
    IEEE Transactions on Artificial Intelligence, 2023, 4 (06): : 1399 - 1412
  • [47] Integrally Cooperative Spatio-Temporal Feature Representation of Motion Joints for Action Recognition
    Chao, Xin
    Hou, Zhenjie
    Liang, Jiuzhen
    Yang, Tianjin
    SENSORS, 2020, 20 (18) : 1 - 22
  • [48] A Dual Pipeline With Spatio-Temporal Attention Fusion Approach for Human Activity Recognition
    Wang, Xiaodong
    Li, Ying
    Fang, Aiqing
    He, Pei
    Guo, Yangming
    IEEE SENSORS JOURNAL, 2024, 24 (15) : 25150 - 25162
  • [49] Capturing the spatio-temporal continuity for video semantic segmentation
    Chen, Xin
    Wu, Aming
    Han, Yahong
    IET IMAGE PROCESSING, 2019, 13 (14) : 2813 - 2820
  • [50] Robust and Compatible Video Watermarking via Spatio-Temporal Enhancement and Multiscale Pyramid Attention
    Chen, Luan
    Wang, Chengyou
    Zhou, Xiao
    Qin, Zhiliang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1548 - 1561