Action-aware Masking Network with Group-based Attention for Temporal Action Localization

被引:3
|
作者
Kang, Tae-Kyung [1 ]
Lee, Gun-Hee [2 ]
Jin, Kyung-Min [1 ]
Lee, Seong-Whan [1 ]
机构
[1] Korea Univ, Dept Artificial Intelligence, Seoul, South Korea
[2] Korea Univ, Dept Comp Sci & Engn, Seoul, South Korea
关键词
D O I
10.1109/WACV56688.2023.00600
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Temporal Action Localization (TAL) is a significant and challenging task that searches for subtle human activities in an untrimmed video. To extract snippet-level video features, existing TAL methods commonly use video encoders pre-trained on short-video classification datasets. However, the snippet-level features can incur ambiguity between consecutive frames due to short and poor temporal information, disrupting the precise prediction of action instances. Several methods incorporating temporal relations have been proposed to mitigate this problem; however, they still suffer from poor video features. To address this issue, we propose a novel temporal action localization framework called an Action-aware Masking Network (AMNet). Our method simultaneously refines video features using action-aware attention and considers inherent temporal relations using self-attention and cross-attention mechanisms. First, we present an Action Masking Encoder (AME) that generates an action-aware mask to represent positive characteristics, which is then used to refine snippet-level features to be more salient around actions. Second, we design a Group Attention Module (GAM), which models relations of temporal information and exchanges mutual information by dividing the features into two groups, i.e., long and short-groups. Extensive experiments and ablation studies on two primary benchmark datasets demonstrate the effectiveness of AMNet, and our method achieves state-of-the-art performances on THUMOS-14 and ActivityNet1.3.
引用
收藏
页码:6047 / 6056
页数:10
相关论文
共 50 条
  • [1] Action-Aware Network with Upper and Lower Limit Loss for Weakly-Supervised Temporal Action Localization
    Bi, Mingwen
    Li, Jiaqi
    Liu, Xinliang
    Zhang, Qingchuan
    Yang, Zhenghong
    NEURAL PROCESSING LETTERS, 2023, 55 (04) : 4307 - 4324
  • [2] Action-Aware Network with Upper and Lower Limit Loss for Weakly-Supervised Temporal Action Localization
    Mingwen Bi
    Jiaqi Li
    Xinliang Liu
    Qingchuan Zhang
    Zhenghong Yang
    Neural Processing Letters, 2023, 55 : 4307 - 4324
  • [3] A Temporal-Aware Relation and Attention Network for Temporal Action Localization
    Zhao, Yibo
    Zhang, Hua
    Gao, Zan
    Guan, Weili
    Nie, Jie
    Liu, Anan
    Wang, Meng
    Chen, Shengyong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 4746 - 4760
  • [4] ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization
    He, Bo
    Yang, Xitong
    Kang, Le
    Cheng, Zhiyu
    Zhou, Xin
    Shrivastava, Abhinav
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13915 - 13925
  • [5] SAST: Learning Semantic Action-Aware Spatial-Temporal Features for Efficient Action Recognition
    Wang, Fei
    Wang, Guorui
    Huang, Yunwen
    Chu, Hao
    IEEE ACCESS, 2019, 7 : 164876 - 164886
  • [6] Temporal Relation-Aware Global Attention Network for Temporal Action Detection
    Xu, Weijie
    Tan, Jingwei
    Wang, Shulin
    Yang, Sheng
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT I, ICIC 2024, 2024, 14875 : 257 - 269
  • [7] Complementary Attention Network for Weakly Supervised Temporal Action Localization
    Peng Dou
    Haifeng Hu
    Neural Processing Letters, 2023, 55 : 6713 - 6732
  • [8] Complementary Attention Network for Weakly Supervised Temporal Action Localization
    Dou, Peng
    Hu, Haifeng
    NEURAL PROCESSING LETTERS, 2023, 55 (05) : 6713 - 6732
  • [9] Deep cascaded action attention network for weakly-supervised temporal action localization
    Xia, Hui-fen
    Zhan, Yong-zhao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 29769 - 29787
  • [10] Deep cascaded action attention network for weakly-supervised temporal action localization
    Hui-fen Xia
    Yong-zhao Zhan
    Multimedia Tools and Applications, 2023, 82 : 29769 - 29787