Video action detection by learning graph-based spatio-temporal interactions

被引：13

作者：

Tomei, Matteo ^{[1
]}

Baraldi, Lorenzo ^{[1
]}

Calderara, Simone ^{[1
,2
]}

Bronzin, Simone ^{[2
]}

Cucchiara, Rita ^{[1
]}

机构：

[1] Univ Modena & Reggio Emilia, Via Pietro Vivarelli 10, I-41125 Modena, Italy

[2] METALIQUID SRL, Via Giosue Carducci 26, I-20123 Milan, Italy

来源：

COMPUTER VISION AND IMAGE UNDERSTANDING | 2021年 / 206卷

关键词：

Video understanding; Action detection; Graph learning;

D O I：

10.1016/j.cviu.2021.103187

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Action Detection is a complex task that aims to detect and classify human actions in video clips. Typically, it has been addressed by processing fine-grained features extracted from a video classification backbone. Recently, thanks to the robustness of object and people detectors, a deeper focus has been added on relationship modeling. Following this line, we propose a graph-based framework to learn high-level interactions between people and objects, in both space and time. In our formulation, spatio-temporal relationships are learned through self-attention on a multi-layer graph structure which can connect entities from consecutive clips, thus considering long-range spatial and temporal dependencies. The proposed module is backbone independent by design and does not require end-to-end training. Extensive experiments are conducted on the AVA dataset, where our model demonstrates state-of-the-art results and consistent improvements over baselines built with different backbones. Code is publicly available at https://github.com/aimagelab/STAGE_action_detection.

引用

页数：9

共 50 条

[1] Spatio-temporal graph-based self-labeling for video anomaly detection
Xing, Meng
Feng, Zhiyong
Su, Yong
Zhang, Yiming
Oh, Changjae
Gribova, Valeriya
Filaretoy, Vladimir Fedorovich
Huang, Deshuang
NEUROCOMPUTING, 2025, 627
[2] Spatio-Temporal Graph-based Semantic Compositional Network for Video Captioning
Li, Shun
Zhang, Ze-Fan
Ji, Yi
Li, Ying
Liu, Chun-Ping
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[3] STEP: Spatio-Temporal Progressive Learning for Video Action Detection
Yang, Xitong
Yang, Xiaodong
Liu, Ming-Yu
Xiao, Fanyi
Davis, Larry
Kautz, Jan
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 264 - 272
[4] Video Relation Detection with Spatio-Temporal Graph
Qian, Xufeng
Zhuang, Yueting
Li, Yimeng
Xiao, Shaoning
Pu, Shiliang
Xiao, Jun
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 84 - 93
[5] Graph-based spatio-temporal region extraction
Galmar, Eric
Huet, Benoit
IMAGE ANALYSIS AND RECOGNITION, PT 1, 2006, 4141 : 236 - 247
[6] Graph-Based Spatio-Temporal Feature Learning for Neuromorphic Vision Sensing
Bi, Yin
Chadha, Aaron
Abbas, Alhabib
Bourtsoulatze, Eirina
Andreopoulos, Yiannis
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 9084 - 9098
[7] Graph-based approach for human action recognition using spatio-temporal features
Ben Aoun, Najib
Mejdoub, Mahmoud
Ben Amar, Chokri
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2014, 25 (02) : 329 - 338
[8] Urban Event Detection from Spatio-temporal IoT Sensor Data Using Graph-Based Machine Learning
Park, Dae-Young
Ko, In-Young
2022 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (IEEE BIGCOMP 2022), 2022, : 234 - 241
[9] Spatio-temporal graph-based CNNs for anomaly detection in weakly-labeled videos
Mu, Huiyu
Sun, Ruizhi
Wang, Miao
Chen, Zeqiu
INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (04)
[10] ENHANCED ACTION TUBELET DETECTOR FOR SPATIO-TEMPORAL VIDEO ACTION DETECTION
Wu, Yutang
Wang, Hanli
Wang, Shuheng
Li, Qinyu
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2388 - 2392

← 1 2 3 4 5 →