Multistage Spatio-Temporal Networks for Robust Sketch Recognition

被引：13

作者：

Li, Hanhui ^{[1
]}

Jiang, Xudong ^{[2
]}

Guan, Boliang ^{[3
]}

Wang, Ruomei ^{[3
]}

Thalmann, Nadia Magnenat ^{[4
]}

机构：

[1] Sun Yat Sen Univ, Sch Intelligent Syst Engn, Shenzhen Campus, Shenzhen 518107, Peoples R China

[2] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore

[3] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China

[4] Univ Geneva, MIRALab, CH-1227 Geneva, Switzerland

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2022年 / 31卷

基金：

中国国家自然科学基金;

关键词：

Convolutional neural networks; Image recognition; Stroke (medical condition); Network architecture; Image segmentation; Feature extraction; Recurrent neural networks; Sketch recognition; spatio-temporal feature; multi-modal networks; feature fusion; DEEP; FUSION;

D O I：

10.1109/TIP.2022.3160240

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Sketch recognition relies on two types of information, namely, spatial contexts like the local structures in images and temporal contexts like the orders of strokes. Existing methods usually adopt convolutional neural networks (CNNs) to model spatial contexts, and recurrent neural networks (RNNs) for temporal contexts. However, most of them combine spatial and temporal features with late fusion or single-stage transformation, which is prone to losing the informative details in sketches. To tackle this problem, we propose a novel framework that aims at the multi-stage interactions and refinements of spatial and temporal features. Specifically, given a sketch represented by a stroke array, we first generate a temporal-enriched image (TEI), which is a pseudo-color image retaining the temporal order of strokes, to overcome the difficulty of CNNs in leveraging temporal information. We then construct a dual-branch network, in which a CNN branch and a RNN branch are adopted to process the stroke array and the TEI respectively. In the early stages of our network, considering the limited ability of RNNs in capturing spatial structures, we utilize multiple enhancement modules to enhance the stroke features with the TEI features. While in the last stage of our network, we propose a spatio-temporal enhancement module that refines stroke features and TEI features in a joint feature space. Furthermore, a bidirectional temporal-compatible unit that adaptively merges features in opposite temporal orders, is proposed to help RNNs tackle abrupt strokes. Comprehensive experimental results on QuickDraw and TU-Berlin demonstrate that the proposed method is a robust and efficient solution for sketch recognition.

引用

页码：2683 / 2694

页数：12

共 50 条

[1] Spatio-Temporal Attention Networks for Action Recognition and Detection
Li, Jun
Liu, Xianglong
Zhang, Wenxuan
Zhang, Mingyuan
Song, Jingkuan
Sebe, Nicu
IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (11) : 2990 - 3001
[2] A Novel Recognition and Classification Approach for Motor Imagery Based on Spatio-Temporal Features
Lv, Renjie
Chang, Wenwen
Yan, Guanghui
Nie, Wenchao
Zheng, Lei
Guo, Bin
Sadiq, Muhammad Tariq
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2025, 29 (01) : 210 - 223
[3] Improved Spatio-Temporal Convolutional Neural Networks for Traffic Police Gestures Recognition
Wu, Zhixuan
Ma, Nan
Cheung, Yiu-ming
Li, Jiahong
He, Qin
Yao, Yongqiang
Zhang, Guoping
2020 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS 2020), 2020, : 109 - 115
[4] Learning Dynamic Spatio-Temporal Relations for Human Activity Recognition
Liu, Zhenyu
Yao, Yaqiang
Liu, Yan
Zhu, Yuening
Tao, Zhenchao
Wang, Lei
Feng, Yuhong
IEEE ACCESS, 2020, 8 : 130340 - 130352
[5] Efficient spatio-temporal network for action recognition
Su, Yanxiong
Zhao, Qian
JOURNAL OF REAL-TIME IMAGE PROCESSING, 2024, 21 (05)
[6] MSA-GCN: Multistage Spatio-Temporal Aggregation Graph Convolutional Networks for Traffic Flow Prediction
Feng, Ji
Huang, Jiashuang
Guo, Chang
Shi, Zhenquan
MATHEMATICS, 2024, 12 (21)
[7] DMSTG: Dynamic Multiview Spatio-Temporal Networks for Traffic Forecasting
Diao, Zulong
Wang, Xin
Zhang, Dafang
Xie, Gaogang
Chen, Jianguo
Pei, Changhua
Meng, Xuying
Xie, Kun
Zhang, Guangxing
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (06) : 6865 - 6880
[8] STIRNet: A Spatio-Temporal Network for Air Formation Targets Intention Recognition
Zhang, Chenhao
Zhou, Yan
Li, Hongquan
Xu, Ying
Qin, Yishuai
Lei, Liang
IEEE ACCESS, 2024, 12 : 44998 - 45010
[9] Phase Space Reconstruction Driven Spatio-Temporal Feature Learning for Dynamic Facial Expression Recognition
Wang, Shanmin
Shuai, Hui
Liu, Qingshan
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (03) : 1466 - 1476
[10] Projection transform on spatio-temporal context for action recognition
Wanru Xu
Zhenjiang Miao
Qiang Zhang
Multimedia Tools and Applications, 2015, 74 : 7711 - 7728

← 1 2 3 4 5 →