A survey on deep learning-based spatio-temporal action detection

被引：2

作者：

Wang, Peng ^{[1
]}

Zeng, Fanwei ^{[2
]}

Qian, Yuntao ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci, Hangzhou 310007, Zhejiang, Peoples R China

[2] Ant Grp, Hangzhou 310007, Zhejiang, Peoples R China

来源：

INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING | 2024年 / 22卷 / 04期

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Computer vision; deep learning; spatio-temporal action detection; SEARCH;

D O I：

10.1142/S0219691323500662

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Spatio-temporal action detection (STAD) aims to classify the actions present in a video and localize them in space and time. It has become a particularly active area of research in computer vision because of its explosively emerging real-world applications, such as autonomous driving, visual surveillance and entertainment. Many efforts have been devoted in recent years to build a robust and effective framework for STAD. This paper provides a comprehensive review of the state-of-the-art deep learning-based methods for STAD. First, a taxonomy is developed to organize these methods. Next, the linking algorithms, which aim to associate the frame- or clip-level detection results together to form action tubes, are reviewed. Then, the commonly used benchmark datasets and evaluation metrics are introduced, and the performance of state-of-the-art models is compared. At last, this paper is concluded, and a set of potential research directions of STAD are discussed.

引用

页数：35

共 113 条

[1] Hybrid Classifiers for Spatio-Temporal Abnormal Behavior Detection, Tracking, and Recognition in Massive Hajj Crowds [J].

Alafif, Tarik ;

Hadi, Anas ;

Allahyani, Manal ;

Alzahrani, Bander ;

Alhothali, Areej ;

Alotaibi, Reem ;

Barnawi, Ahmed .

ELECTRONICS, 2023, 12 (05)

[2] Generative adversarial network based abnormal behavior detection in massive crowd videos: a Hajj case study [J].

Alafif, Tarik ;

Alzahrani, Bander ;

Cao, Yong ;

Alotaibi, Reem ;

Barnawi, Ahmed ;

Chen, Min .

JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 13 (8) :4077-4088

[3] CNN-Based Multiple Path Search for Action Tube Detection in Videos [J].

Alwando, Erick Hendra Putra ;

Chen, Yie-Tarng ;

Fang, Wen-Hsien .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (01) :104-116

[4]

Bhoi A., ARXIV

[5]

Blank M, 2005, IEEE I CONF COMP VIS, P1395

[6]

Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698

[7] Cross-Dataset Action Detection [J].

Cao, Liangliang ;

Liu, Zicheng ;

Huang, Thomas S. .

2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, :1998-2005

[8] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[9] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[10]

Chen G., ARXIV

← 1 2 3 4 5 6 7 8 9 10 →