Learning Attention-Enhanced Spatiotemporal Representation for Action Recognition

被引:11
|
作者
Shi, Zhensheng [1 ]
Cao, Liangjie [1 ]
Guan, Cheng [1 ]
Zheng, Haiyong [1 ]
Gu, Zhaorui [1 ]
Yu, Zhibin [1 ]
Zheng, Bing [1 ]
机构
[1] Ocean Univ China, Dept Elect Engn, Qingdao 266100, Peoples R China
来源
IEEE ACCESS | 2020年 / 8卷 / 08期
基金
中国国家自然科学基金;
关键词
Action recognition; video understanding; spatiotemporal representation; visual attention; 3D-CNN; residual learning;
D O I
10.1109/ACCESS.2020.2968024
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Learning spatiotemporal features via 3D-CNN (3D Convolutional Neural Network) models has been regarded as an effective approach for action recognition. In this paper, we explore visual attention mechanism for video analysis and propose a novel 3D-CNN model, dubbed AE-I3D (Attention-Enhanced Inflated-3D Network), for learning attention-enhanced spatiotemporal representation. The contribution of our AE-I3D is threefold: First, we inflate soft attention in spatiotemporal scope for 3D videos, and adopt softmax to generate probability distribution of attentional features in a feedforward 3D-CNN architecture; Second, we devise an AE-Res (Attention-Enhanced Residual learning) module, which learns attention-enhanced features in a two-branch residual learning way, also the AE-Res module is lightweight and flexible, so that can be easily embedded into many 3D-CNN architectures; Finally, we embed multiple AE-Res modules into an I3D (Inflated-3D) network, yielding our AE-I3D model, which can be trained in an end-to-end, video-level manner. Different from previous attention networks, our method inflates residual attention from 2D image to 3D video for 3D attention residual learning to enhance spatiotemporal representation. We use RGB-only video data for evaluation on three benchmarks: UCF101, HMDB51, and Kinetics. The experimental results demonstrate that our AE-I3D is effective with competitive performance.
引用
收藏
页码:16785 / 16794
页数:10
相关论文
共 50 条
  • [11] HUMAN ACTION REPRESENTATION AND RECOGNITION: AN APPROACH TO A HISTOGRAM OF SPATIOTEMPORAL TEMPLATES
    Ahsan, Sk Md. Masudul
    Tan, Joo Kooi
    Kim, Hyoungseop
    Ishikawa, Seiji
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2015, 11 (06): : 1855 - 1867
  • [12] Better Deep Visual Attention with Reinforcement Learning in Action Recognition
    Wang, Gang
    Wang, Wenmin
    Wang, Jingzhuo
    Bu, Yaohua
    2017 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2017,
  • [13] Learning hierarchical video representation for action recognition
    Li Q.
    Qiu Z.
    Yao T.
    Mei T.
    Rui Y.
    Luo J.
    International Journal of Multimedia Information Retrieval, 2017, 6 (1) : 85 - 98
  • [14] Local motion feature extraction and spatiotemporal attention mechanism for action recognition
    Song, Xiaogang
    Zhang, Dongdong
    Liang, Li
    He, Min
    Hei, Xinhong
    VISUAL COMPUTER, 2024, 40 (11) : 7747 - 7759
  • [15] UNSUPERVISED MOTION REPRESENTATION ENHANCED NETWORK FOR ACTION RECOGNITION
    Yang, Xiaohang
    Kong, Lingtong
    Yang, Jie
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2445 - 2449
  • [16] Learning Spatiotemporal-Selected Representations in Videos for Action Recognition
    Zhang, Jiachao
    Tong, Ying
    Jiao, Liangbao
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2023, 32 (12)
  • [17] Action Recognition Using Visual Attention with Reinforcement Learning
    Li, Hongyang
    Chen, Jun
    Hu, Ruimin
    Yu, Mei
    Chen, Huafeng
    Xu, Zengmin
    MULTIMEDIA MODELING, MMM 2019, PT II, 2019, 11296 : 365 - 376
  • [18] Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
    Xiang, Wangmeng
    Li, Chao
    Wang, Biao
    Wei, Xihan
    Hua, Xian-Sheng
    Zhang, Lei
    COMPUTER VISION - ECCV 2022, PT III, 2022, 13663 : 627 - 644
  • [19] Residual attention unit for action recognition
    Liao, Zhongke
    Hu, Haifeng
    Zhang, Junxuan
    Yin, Chang
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2019, 189
  • [20] An Improved Attention-Based Spatiotemporal-Stream Model for Action Recognition in Videos
    Liu, Dan
    Ji, Yunfeng
    Ye, Mao
    Gan, Yan
    Zhang, Jianwei
    IEEE ACCESS, 2020, 8 : 61462 - 61470