Adversarial Attention Networks for Early Action Recognition

被引:0
|
作者
Zhang, Hong-Bo [1 ]
Pan, Wei-Xiang [1 ]
Du, Ji-Xiang [2 ]
Lei, Qing [3 ,4 ]
Chen, Yan [2 ]
Liu, Jing-Hua [3 ,4 ]
机构
[1] Huaqiao Univ, Dept Comp Sci & Technol, Xiamen 361000, Peoples R China
[2] Huaqiao Univ, Fujian Key Lab Big Data Intelligence & Secur, Xiamen 361000, Peoples R China
[3] Huaqiao Univ, Xiamen Key Lab Comp Vis & Pattern Recognit, Xiamen 361000, Peoples R China
[4] Huaqiao Univ, Fujian Prov Univ, Key Lab Comp Vis & Machine Learning, Xiamen 361000, Peoples R China
来源
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2024年
基金
中国国家自然科学基金;
关键词
Early action recognition; adversarial attention network; cross attention generator; self attention discriminator; feature fusion module;
D O I
10.1109/TETCI.2024.3437240
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Early action recognition endeavors to deduce the ongoing action by observing partial video, presenting a formidable challenge due to limited information available in the initial stages. To tackle this challenge, we introduce an innovative adversarial attention network based on generative adversarial networks. This network leverages the characteristics of both the generator and discriminator to generate unobserved action information from partial video input. The proposed method comprises a cross attention generator, self Attention discriminator, and feature fusion module. The cross attention generator captures temporal relationships in input action sequences, generating discriminative unobserved action information. The self attention discriminator adds global attention to the input sequence, capturing global context information for accurate evaluation of consistency in generated unobserved feature from cross attention generator. Finally, the feature fusion module helps the model obtain richer and more comprehensive feature representations. The proposed method is evaluated through experiments on the HMDB51, UCF101 and Something-Something v2 datasets. Experimental results demonstrate that the proposed approach outperforms existing methods across different observation ratios. Detailed ablation studies confirm the effectiveness of each component in the proposed method.
引用
收藏
页数:14
相关论文
共 14 条
  • [1] HIGHER-ORDER RECURRENT NETWORK WITH SPACE-TIME ATTENTION FOR VIDEO EARLY ACTION RECOGNITION
    Tai, Tsung-Ming
    Fiameni, Giuseppe
    Lee, Cheng-Kuang
    Lanz, Oswald
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1631 - 1635
  • [2] Probabilistic selection of frames for early action recognition in videos
    Saremi, Mehrin
    Yaghmaee, Farzin
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2019, 8 (04) : 253 - 257
  • [3] Probabilistic selection of frames for early action recognition in videos
    Mehrin Saremi
    Farzin Yaghmaee
    International Journal of Multimedia Information Retrieval, 2019, 8 : 253 - 257
  • [4] PREDICTABILITY ANALYZING: DEEP REINFORCEMENT LEARNING FOR EARLY ACTION RECOGNITION
    Chen, Xiaokai
    Gao, Ke
    Caol, Juan
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 958 - 963
  • [5] Dear-Net: Learning Diversities for Skeleton-Based Early Action Recognition
    Wang, Rui
    Liu, Jun
    Ke, Qiuhong
    Peng, Duo
    Lei, Yinjie
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1175 - 1189
  • [6] Early Action Recognition With Category Exclusion Using Policy-Based Reinforcement Learning
    Weng, Junwu
    Jiang, Xudong
    Zheng, Wei-Long
    Yuan, Junsong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (12) : 4626 - 4638
  • [7] Stance Detection with a Multi-Target Adversarial Attention Network
    Sun, Qingying
    Xi, Xuefeng
    Sun, Jiajun
    Wang, Zhongqing
    Xu, Huiyan
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (02)
  • [8] Global Regularizer and Temporal-Aware Cross-Entropy for Skeleton-Based Early Action Recognition
    Ke, Qiuhong
    Liu, Jun
    Bennamoun, Mohammed
    Rahmani, Hossein
    An, Senjian
    Sohel, Ferdous
    Boussaid, Farid
    COMPUTER VISION - ACCV 2018, PT IV, 2019, 11364 : 729 - 745
  • [9] Improved use of descriptors for early recognition of actions in video
    Mehrin Saremi
    Farzin Yaghmaee
    Multimedia Tools and Applications, 2023, 82 : 2617 - 2633
  • [10] Improved use of descriptors for early recognition of actions in video
    Saremi, Mehrin
    Yaghmaee, Farzin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (02) : 2617 - 2633