Joint spatial-temporal attention for action recognition

被引:25
|
作者
Yu, Tingzhao [1 ,2 ]
Guo, Chaoxu [1 ,2 ]
Wang, Lingfeng [1 ]
Gu, Huxiang [1 ]
Xiang, Shiming [1 ]
Pan, Chunhong [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Comp & Control Engn, Beijing 101408, Peoples R China
基金
中国国家自然科学基金;
关键词
Action recognition; Spatial-Temporal attention; Two-Stage; REPRESENTATION;
D O I
10.1016/j.patrec.2018.07.034
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel high-level action representation using joint spatial-temporal attention model, with application to video-based human action recognition. Specifically, to extract robust motion representations of videos, a new spatial attention module based on 3D convolution is proposed, which can pay attention to the salient parts of the spatial areas. For better dealing with long-duration videos, a new bidirectional LSTM based temporal attention module is introduced, which aims to focus on the key video cubes instead of the key video frames of a given video. The spatial-temporal attention network can be jointly trained via a two-stage strategy, which enables us to simultaneously explore the correlation both in spatial and temporal domain. Experimental results on benchmark action recognition datasets demonstrate the effectiveness of our network. (c) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:226 / 233
页数:8
相关论文
共 50 条
  • [31] Improved SSD using deep multi-scale attention spatial-temporal features for action recognition
    Zhou, Shuren
    Qiu, Jia
    Solanki, Arun
    MULTIMEDIA SYSTEMS, 2022, 28 (06) : 2123 - 2131
  • [32] Multi-Branch Spatial-Temporal Network for Action Recognition
    Wang, Yingying
    Li, Wei
    Tao, Ran
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (10) : 1556 - 1560
  • [33] A SPATIAL-TEMPORAL CONSTRAINT-BASED ACTION RECOGNITION METHOD
    Han, Tingting
    Yao, Hongxun
    Zhang, Yanhao
    Xu, Pengfei
    2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 2767 - 2771
  • [34] Hierarchy Spatial-Temporal Transformer for Action Recognition in Short Videos
    Cai, Guoyong
    Cai, Yumeng
    FUZZY SYSTEMS AND DATA MINING VI, 2020, 331 : 760 - 774
  • [35] Spatial-temporal injection network: exploiting auxiliary losses for action recognition with apparent difference and self-attention
    Cao, Haiwen
    Wu, Chunlei
    Lu, Jing
    Wu, Jie
    Wang, Leiquan
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 1173 - 1180
  • [36] Attention module-based spatial-temporal graph convolutional networks for skeleton-based action recognition
    Kong, Yinghui
    Li, Li
    Zhang, Ke
    Ni, Qiang
    Han, Jungong
    JOURNAL OF ELECTRONIC IMAGING, 2019, 28 (04)
  • [37] Spatial-Temporal Transformer Network for Continuous Action Recognition in Industrial Assembly
    Huang, Jianfeng
    Liu, Xiang
    Hu, Huan
    Tang, Shanghua
    Li, Chenyang
    Zhao, Shaoan
    Lin, Yimin
    Wang, Kai
    Liu, Zhaoxiang
    Lian, Shiguo
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT X, ICIC 2024, 2024, 14871 : 114 - 130
  • [38] A Novel Action Recognition Scheme Based on Spatial-Temporal Pyramid Model
    Zhao, Hengying
    Xiang, Xinguang
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT II, 2018, 10736 : 212 - 221
  • [39] Deep Fusion of Skeleton Spatial-Temporal and Dynamic Information for Action Recognition
    Gao, Song
    Zhang, Dingzhuo
    Tang, Zhaoming
    Wang, Hongyan
    SENSORS, 2024, 24 (23)
  • [40] A Channel-Wise Spatial-Temporal Aggregation Network for Action Recognition
    Wang, Huafeng
    Xia, Tao
    Li, Hanlin
    Gu, Xianfeng
    Lv, Weifeng
    Wang, Yuehai
    MATHEMATICS, 2021, 9 (24)