Multistage Spatio-Temporal Networks for Robust Sketch Recognition

被引：13

作者：

Li, Hanhui ^{[1
]}

Jiang, Xudong ^{[2
]}

Guan, Boliang ^{[3
]}

Wang, Ruomei ^{[3
]}

Thalmann, Nadia Magnenat ^{[4
]}

机构：

[1] Sun Yat Sen Univ, Sch Intelligent Syst Engn, Shenzhen Campus, Shenzhen 518107, Peoples R China

[2] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore

[3] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China

[4] Univ Geneva, MIRALab, CH-1227 Geneva, Switzerland

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2022年 / 31卷

基金：

中国国家自然科学基金;

关键词：

Convolutional neural networks; Image recognition; Stroke (medical condition); Network architecture; Image segmentation; Feature extraction; Recurrent neural networks; Sketch recognition; spatio-temporal feature; multi-modal networks; feature fusion; DEEP; FUSION;

D O I：

10.1109/TIP.2022.3160240

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Sketch recognition relies on two types of information, namely, spatial contexts like the local structures in images and temporal contexts like the orders of strokes. Existing methods usually adopt convolutional neural networks (CNNs) to model spatial contexts, and recurrent neural networks (RNNs) for temporal contexts. However, most of them combine spatial and temporal features with late fusion or single-stage transformation, which is prone to losing the informative details in sketches. To tackle this problem, we propose a novel framework that aims at the multi-stage interactions and refinements of spatial and temporal features. Specifically, given a sketch represented by a stroke array, we first generate a temporal-enriched image (TEI), which is a pseudo-color image retaining the temporal order of strokes, to overcome the difficulty of CNNs in leveraging temporal information. We then construct a dual-branch network, in which a CNN branch and a RNN branch are adopted to process the stroke array and the TEI respectively. In the early stages of our network, considering the limited ability of RNNs in capturing spatial structures, we utilize multiple enhancement modules to enhance the stroke features with the TEI features. While in the last stage of our network, we propose a spatio-temporal enhancement module that refines stroke features and TEI features in a joint feature space. Furthermore, a bidirectional temporal-compatible unit that adaptively merges features in opposite temporal orders, is proposed to help RNNs tackle abrupt strokes. Comprehensive experimental results on QuickDraw and TU-Berlin demonstrate that the proposed method is a robust and efficient solution for sketch recognition.

引用

页码：2683 / 2694

页数：12

共 50 条

[31] A Novel Spatio-Temporal Field for Emotion Recognition Based on EEG Signals
Li, Wei
Zhang, Zhen
Hou, Bowen
Li, Xiaoyu
IEEE SENSORS JOURNAL, 2021, 21 (23) : 26941 - 26950
[32] MSTP-Net: Multiscale Spatio-Temporal Parallel Networks for Human Motion Prediction
Chen, Lujing
Liu, Rui
Zhang, Wei
Hou, Yaqing
Zhang, Qiang
Zhou, Dongsheng
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 3318 - 3331
[33] Histogram of Fuzzy Local Spatio-Temporal Descriptors for Video Action Recognition
Zuo, Zheming
Yang, Longzhi
Liu, Yonghuai
Chao, Fei
Song, Ran
Qu, Yanpeng
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (06) : 4059 - 4067
[34] Learning Spatio-Temporal Representations for Action Recognition: A Genetic Programming Approach
Liu, Li
Shao, Ling
Li, Xuelong
Lu, Ke
IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (01) : 158 - 170
[35] Emotion Recognition From Full-Body Motion Using Multiscale Spatio-Temporal Network
Wang, Tao
Liu, Shuang
He, Feng
Dai, Weina
Du, Minghao
Ke, Yufeng
Ming, Dong
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (03) : 898 - 912
[36] 3DCANN: A Spatio-Temporal Convolution Attention Neural Network for EEG Emotion Recognition
Liu, Shuaiqi
Wang, Xu
Zhao, Ling
Li, Bing
Hu, Weiming
Yu, Jie
Zhang, Yu-Dong
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (11) : 5321 - 5331
[37] Spatio-Temporal Action Localization for Human Action Recognition in Large Dataset
Megrhi, Sameh
Jmal, Marwa
Beghdadi, Azeddine
Mseddi, Wided
VIDEO SURVEILLANCE AND TRANSPORTATION IMAGING APPLICATIONS 2015, 2015, 9407
[38] Multimodal spatio-temporal framework for real-world affect recognition
Raut, Karishma
Kulkarni, Sujata
Sawant, Ashwini
International Journal of Intelligent Networks, 2024, 5 : 340 - 350
[39] Making Sense of Spatio-Temporal Preserving Representations for EEG-Based Human Intention Recognition
Zhang, Dalin
Yao, Lina
Chen, Kaixuan
Wang, Sen
Chang, Xiaojun
Liu, Yunhao
IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (07) : 3033 - 3044
[40] Spatio-temporal feature classifier
Wang, Yun
Liu, Suxing
Open Automation and Control Systems Journal, 2015, 7 (01): : 1 - 7

← 1 2 3 4 5 →