Adversarial Attention Networks for Early Action Recognition

被引：0

作者：

Zhang, Hong-Bo ^{[1
]}

Pan, Wei-Xiang ^{[1
]}

Du, Ji-Xiang ^{[2
]}

Lei, Qing ^{[3
,4
]}

Chen, Yan ^{[2
]}

Liu, Jing-Hua ^{[3
,4
]}

机构：

[1] Huaqiao Univ, Dept Comp Sci & Technol, Xiamen 361000, Peoples R China

[2] Huaqiao Univ, Fujian Key Lab Big Data Intelligence & Secur, Xiamen 361000, Peoples R China

[3] Huaqiao Univ, Xiamen Key Lab Comp Vis & Pattern Recognit, Xiamen 361000, Peoples R China

[4] Huaqiao Univ, Fujian Prov Univ, Key Lab Comp Vis & Machine Learning, Xiamen 361000, Peoples R China

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2024年

基金：

中国国家自然科学基金;

关键词：

Early action recognition; adversarial attention network; cross attention generator; self attention discriminator; feature fusion module;

D O I：

10.1109/TETCI.2024.3437240

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Early action recognition endeavors to deduce the ongoing action by observing partial video, presenting a formidable challenge due to limited information available in the initial stages. To tackle this challenge, we introduce an innovative adversarial attention network based on generative adversarial networks. This network leverages the characteristics of both the generator and discriminator to generate unobserved action information from partial video input. The proposed method comprises a cross attention generator, self Attention discriminator, and feature fusion module. The cross attention generator captures temporal relationships in input action sequences, generating discriminative unobserved action information. The self attention discriminator adds global attention to the input sequence, capturing global context information for accurate evaluation of consistency in generated unobserved feature from cross attention generator. Finally, the feature fusion module helps the model obtain richer and more comprehensive feature representations. The proposed method is evaluated through experiments on the HMDB51, UCF101 and Something-Something v2 datasets. Experimental results demonstrate that the proposed approach outperforms existing methods across different observation ratios. Detailed ablation studies confirm the effectiveness of each component in the proposed method.

引用

页数：14

共 14 条

[1] HIGHER-ORDER RECURRENT NETWORK WITH SPACE-TIME ATTENTION FOR VIDEO EARLY ACTION RECOGNITION
Tai, Tsung-Ming
Fiameni, Giuseppe
Lee, Cheng-Kuang
Lanz, Oswald
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1631 - 1635
[2] Probabilistic selection of frames for early action recognition in videos
Saremi, Mehrin
Yaghmaee, Farzin
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2019, 8 (04) : 253 - 257
[3] Probabilistic selection of frames for early action recognition in videos
Mehrin Saremi
Farzin Yaghmaee
International Journal of Multimedia Information Retrieval, 2019, 8 : 253 - 257
[4] PREDICTABILITY ANALYZING: DEEP REINFORCEMENT LEARNING FOR EARLY ACTION RECOGNITION
Chen, Xiaokai
Gao, Ke
Caol, Juan
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 958 - 963
[5] Dear-Net: Learning Diversities for Skeleton-Based Early Action Recognition
Wang, Rui
Liu, Jun
Ke, Qiuhong
Peng, Duo
Lei, Yinjie
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1175 - 1189
[6] Early Action Recognition With Category Exclusion Using Policy-Based Reinforcement Learning
Weng, Junwu
Jiang, Xudong
Zheng, Wei-Long
Yuan, Junsong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (12) : 4626 - 4638
[7] Stance Detection with a Multi-Target Adversarial Attention Network
Sun, Qingying
Xi, Xuefeng
Sun, Jiajun
Wang, Zhongqing
Xu, Huiyan
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (02)
[8] Global Regularizer and Temporal-Aware Cross-Entropy for Skeleton-Based Early Action Recognition
Ke, Qiuhong
Liu, Jun
Bennamoun, Mohammed
Rahmani, Hossein
An, Senjian
Sohel, Ferdous
Boussaid, Farid
COMPUTER VISION - ACCV 2018, PT IV, 2019, 11364 : 729 - 745
[9] Improved use of descriptors for early recognition of actions in video
Mehrin Saremi
Farzin Yaghmaee
Multimedia Tools and Applications, 2023, 82 : 2617 - 2633
[10] Improved use of descriptors for early recognition of actions in video
Saremi, Mehrin
Yaghmaee, Farzin
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (02) : 2617 - 2633

← 1 2 →