Learning Attention-Enhanced Spatiotemporal Representation for Action Recognition

被引：11

作者：

Shi, Zhensheng ^{[1
]}

Cao, Liangjie ^{[1
]}

Guan, Cheng ^{[1
]}

Zheng, Haiyong ^{[1
]}

Gu, Zhaorui ^{[1
]}

Yu, Zhibin ^{[1
]}

Zheng, Bing ^{[1
]}

机构：

[1] Ocean Univ China, Dept Elect Engn, Qingdao 266100, Peoples R China

来源：

IEEE ACCESS | 2020年 / 8卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Action recognition; video understanding; spatiotemporal representation; visual attention; 3D-CNN; residual learning;

D O I：

10.1109/ACCESS.2020.2968024

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Learning spatiotemporal features via 3D-CNN (3D Convolutional Neural Network) models has been regarded as an effective approach for action recognition. In this paper, we explore visual attention mechanism for video analysis and propose a novel 3D-CNN model, dubbed AE-I3D (Attention-Enhanced Inflated-3D Network), for learning attention-enhanced spatiotemporal representation. The contribution of our AE-I3D is threefold: First, we inflate soft attention in spatiotemporal scope for 3D videos, and adopt softmax to generate probability distribution of attentional features in a feedforward 3D-CNN architecture; Second, we devise an AE-Res (Attention-Enhanced Residual learning) module, which learns attention-enhanced features in a two-branch residual learning way, also the AE-Res module is lightweight and flexible, so that can be easily embedded into many 3D-CNN architectures; Finally, we embed multiple AE-Res modules into an I3D (Inflated-3D) network, yielding our AE-I3D model, which can be trained in an end-to-end, video-level manner. Different from previous attention networks, our method inflates residual attention from 2D image to 3D video for 3D attention residual learning to enhance spatiotemporal representation. We use RGB-only video data for evaluation on three benchmarks: UCF101, HMDB51, and Kinetics. The experimental results demonstrate that our AE-I3D is effective with competitive performance.

引用

页码：16785 / 16794

页数：10

共 50 条

[31] A Deep Learning Network for Action Recognition Incorporating Temporal Attention Mechanism
Liu, Yue
Zhang, Lei
Xin, Shan
Zhang, Yu
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (IEEE-ROBIO 2021), 2021, : 1576 - 1581
[32] Discriminative Relational Representation Learning for RGB-D Action Recognition
Kong, Yu
Fu, Yun
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (06) : 2856 - 2865
[33] Collaboratively Self-Supervised Video Representation Learning for Action Recognition
Zhang, Jie
Wan, Zhifan
Hu, Lanqing
Lin, Stephen
Wu, Shuzhe
Shan, Shiguang
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2025, 20 : 1895 - 1907
[34] Exploiting Spatiotemporal Features for Action Recognition
Bin Muslim, Usairam
Khan, Muhammad Hassan
Farid, Muhammad Shahid
PROCEEDINGS OF 2021 INTERNATIONAL BHURBAN CONFERENCE ON APPLIED SCIENCES AND TECHNOLOGIES (IBCAST), 2021, : 613 - 619
[35] Representation Learning of Temporal Dynamics for Skeleton-Based Action Recognition
Du, Yong
Fu, Yun
Wang, Liang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (07) : 3010 - 3022
[36] Deep Video Understanding: Representation Learning, Action Recognition, and Language Generation
Mei, Tao
PROCEEDINGS OF THE 1ST WORKSHOP AND CHALLENGE ON COMPREHENSIVE VIDEO UNDERSTANDING IN THE WILD (COVIEW'18), 2018, : 1 - 1
[37] Nonnegative Component Representation with Hierarchical Dictionary Learning Strategy for Action Recognition
Wang, Jianhong
Zhang, Pinzheng
Luo, Linmin
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (04): : 1259 - 1263
[38] Multi-Group Multi-Attention: Towards Discriminative Spatiotemporal Representation
Shi, Zhensheng
Cao, Liangjie
Guan, Cheng
Liang, Ju
Li, Qianqian
Gu, Zhaorui
Zheng, Haiyong
Zheng, Bing
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2057 - 2066
[39] Spatiotemporal decoupling attention transformer for 3D skeleton-based driver action recognition
Xu, Zhuoyan
Xu, Jingke
COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (04)
[40] Learning SpatioTemporal and Motion Features in a Unified 2D Network for Action Recognition
Wang, Mengmeng
Xing, Jiazheng
Su, Jing
Chen, Jun
Liu, Yong
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3347 - 3362

← 1 2 3 4 5 →