Multistage Spatio-Temporal Networks for Robust Sketch Recognition

被引：13

作者：

Li, Hanhui ^{[1
]}

Jiang, Xudong ^{[2
]}

Guan, Boliang ^{[3
]}

Wang, Ruomei ^{[3
]}

Thalmann, Nadia Magnenat ^{[4
]}

机构：

[1] Sun Yat Sen Univ, Sch Intelligent Syst Engn, Shenzhen Campus, Shenzhen 518107, Peoples R China

[2] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore

[3] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China

[4] Univ Geneva, MIRALab, CH-1227 Geneva, Switzerland

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2022年 / 31卷

基金：

中国国家自然科学基金;

关键词：

Convolutional neural networks; Image recognition; Stroke (medical condition); Network architecture; Image segmentation; Feature extraction; Recurrent neural networks; Sketch recognition; spatio-temporal feature; multi-modal networks; feature fusion; DEEP; FUSION;

D O I：

10.1109/TIP.2022.3160240

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Sketch recognition relies on two types of information, namely, spatial contexts like the local structures in images and temporal contexts like the orders of strokes. Existing methods usually adopt convolutional neural networks (CNNs) to model spatial contexts, and recurrent neural networks (RNNs) for temporal contexts. However, most of them combine spatial and temporal features with late fusion or single-stage transformation, which is prone to losing the informative details in sketches. To tackle this problem, we propose a novel framework that aims at the multi-stage interactions and refinements of spatial and temporal features. Specifically, given a sketch represented by a stroke array, we first generate a temporal-enriched image (TEI), which is a pseudo-color image retaining the temporal order of strokes, to overcome the difficulty of CNNs in leveraging temporal information. We then construct a dual-branch network, in which a CNN branch and a RNN branch are adopted to process the stroke array and the TEI respectively. In the early stages of our network, considering the limited ability of RNNs in capturing spatial structures, we utilize multiple enhancement modules to enhance the stroke features with the TEI features. While in the last stage of our network, we propose a spatio-temporal enhancement module that refines stroke features and TEI features in a joint feature space. Furthermore, a bidirectional temporal-compatible unit that adaptively merges features in opposite temporal orders, is proposed to help RNNs tackle abrupt strokes. Comprehensive experimental results on QuickDraw and TU-Berlin demonstrate that the proposed method is a robust and efficient solution for sketch recognition.

引用

页码：2683 / 2694

页数：12

共 50 条

[41] Modelling spatio-temporal ageing phenomena with deep Generative Adversarial Networks
Papadopoulos, Stavros
Dimitriou, Nikolaos
Drosou, Anastasios
Tzovaras, Dimitrios
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2021, 94
[42] Beyond Appearance: Multi-Frame Spatio-Temporal Context Memory Networks for Efficient and Robust Video Object Segmentation
Dang, Jisheng
Zheng, Huicheng
Xu, Xiaohao
Wang, Longguang
Guo, Yulan
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 4853 - 4866
[43] Spatio-Temporal Crime Prediction with Temporally Hierarchical Convolutional Neural Networks
Ilhan, Fatih
Tekin, Selim F.
Aksoy, Bilgin
2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
[44] Braille letter reading: A benchmark for spatio-temporal pattern recognition on neuromorphic hardware
Muller-Cleve, Simon F.
Fra, Vittorio
Khacef, Lyes
Pequeno-Zurro, Alejandro
Klepatsch, Daniel
Forno, Evelina
Ivanovich, Diego G.
Rastogi, Shavika
Urgese, Gianvito
Zenke, Friedemann
Bartolozzi, Chiara
FRONTIERS IN NEUROSCIENCE, 2022, 16
[45] Fluxformer: Flow-Guided Duplex Attention Transformer via Spatio-Temporal Clustering for Action Recognition
Hong, Younggi
Kim, Min Ju
Lee, Isack
Yoo, Seok Bong
IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (10) : 6411 - 6418
[46] View Invariant Spatio-Temporal Descriptor for Action Recognition from Skeleton Sequences
Venkata Subbareddy K.
Nirmala Devi L.
IEEE Transactions on Artificial Intelligence, 2023, 4 (06): : 1399 - 1412
[47] Integrally Cooperative Spatio-Temporal Feature Representation of Motion Joints for Action Recognition
Chao, Xin
Hou, Zhenjie
Liang, Jiuzhen
Yang, Tianjin
SENSORS, 2020, 20 (18) : 1 - 22
[48] A Dual Pipeline With Spatio-Temporal Attention Fusion Approach for Human Activity Recognition
Wang, Xiaodong
Li, Ying
Fang, Aiqing
He, Pei
Guo, Yangming
IEEE SENSORS JOURNAL, 2024, 24 (15) : 25150 - 25162
[49] Capturing the spatio-temporal continuity for video semantic segmentation
Chen, Xin
Wu, Aming
Han, Yahong
IET IMAGE PROCESSING, 2019, 13 (14) : 2813 - 2820
[50] Robust and Compatible Video Watermarking via Spatio-Temporal Enhancement and Multiscale Pyramid Attention
Chen, Luan
Wang, Chengyou
Zhou, Xiao
Qin, Zhiliang
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1548 - 1561

← 1 2 3 4 5 →