Joint spatial-temporal attention for action recognition

被引：25

作者：

Yu, Tingzhao ^{[1
,2
]}

Guo, Chaoxu ^{[1
,2
]}

Wang, Lingfeng ^{[1
]}

Gu, Huxiang ^{[1
]}

Xiang, Shiming ^{[1
]}

Pan, Chunhong ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Sch Comp & Control Engn, Beijing 101408, Peoples R China

来源：

PATTERN RECOGNITION LETTERS | 2018年 / 112卷

基金：

中国国家自然科学基金;

关键词：

Action recognition; Spatial-Temporal attention; Two-Stage; REPRESENTATION;

D O I：

10.1016/j.patrec.2018.07.034

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose a novel high-level action representation using joint spatial-temporal attention model, with application to video-based human action recognition. Specifically, to extract robust motion representations of videos, a new spatial attention module based on 3D convolution is proposed, which can pay attention to the salient parts of the spatial areas. For better dealing with long-duration videos, a new bidirectional LSTM based temporal attention module is introduced, which aims to focus on the key video cubes instead of the key video frames of a given video. The spatial-temporal attention network can be jointly trained via a two-stage strategy, which enables us to simultaneously explore the correlation both in spatial and temporal domain. Experimental results on benchmark action recognition datasets demonstrate the effectiveness of our network. (c) 2018 Elsevier B.V. All rights reserved.

引用

页码：226 / 233

页数：8

共 50 条

[41] STCAM: Spatial-Temporal and Channel Attention Module for Dynamic Facial Expression Recognition
Chen, Weicong
Zhang, Dong
Li, Ming
Lee, Dah-Jye
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (01) : 800 - 810
[42] Human action recognition via multi-task learning base on spatial-temporal feature
Guo, Wenzhong
Chen, Guolong
INFORMATION SCIENCES, 2015, 320 : 418 - 428
[43] Two-Stream Collaborative Learning With Spatial-Temporal Attention for Video Classification
Peng, Yuxin
Zhao, Yunzhen
Zhang, Junchao
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (03) : 773 - 786
[44] MSAHTA: Mixed Spatial Attention and Hierarchical Temporal Aggregation for Action Recognition
Feng, Jinyuan
Yang, Dan
Ge, Yongxin
Qin, Xiaolei
Chen, Yida
Wang, Yuangan
2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019), 2019, : 775 - 782
[45] Action Recognition by Fusing Spatial-Temporal Appearance and The Local Distribution of Interest Points
Lu, Mengmeng
Zhang, Liang
PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION ENGINEERING, 2014, 111 : 75 - 78
[46] ST-HViT: spatial-temporal hierarchical vision transformer for action recognition
Xia, Limin
Fu, Weiye
PATTERN ANALYSIS AND APPLICATIONS, 2025, 28 (01)
[47] Focal and Global Spatial-Temporal Transformer for Skeleton-Based Action Recognition
Gao, Zhimin
Wang, Peitao
Lv, Pei
Jiang, Xiaoheng
Liu, Qidong
Wang, Pichao
Xu, Mingliang
Li, Wanqing
COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 155 - 171
[48] Pyramid Spatial-Temporal Graph Transformer for Skeleton-Based Action Recognition
Chen, Shuo
Xu, Ke
Jiang, Xinghao
Sun, Tanfeng
APPLIED SCIENCES-BASEL, 2022, 12 (18):
[49] STST: Spatial-Temporal Specialized Transformer for Skeleton-based Action Recognition
Zhang, Yuhan
Wu, Bo
Li, Wen
Duan, Lixin
Gan, Chuang
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3229 - 3237
[50] Human Action Recognition by Fusion of Convolutional Neural Networks and spatial-temporal Information
Li, Weisheng
Ding, Yahui
8TH INTERNATIONAL CONFERENCE ON INTERNET MULTIMEDIA COMPUTING AND SERVICE (ICIMCS2016), 2016, : 255 - 259

← 1 2 3 4 5 →