A survey on deep learning-based spatio-temporal action detection

被引:2
作者
Wang, Peng [1 ]
Zeng, Fanwei [2 ]
Qian, Yuntao [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci, Hangzhou 310007, Zhejiang, Peoples R China
[2] Ant Grp, Hangzhou 310007, Zhejiang, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Computer vision; deep learning; spatio-temporal action detection; SEARCH;
D O I
10.1142/S0219691323500662
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Spatio-temporal action detection (STAD) aims to classify the actions present in a video and localize them in space and time. It has become a particularly active area of research in computer vision because of its explosively emerging real-world applications, such as autonomous driving, visual surveillance and entertainment. Many efforts have been devoted in recent years to build a robust and effective framework for STAD. This paper provides a comprehensive review of the state-of-the-art deep learning-based methods for STAD. First, a taxonomy is developed to organize these methods. Next, the linking algorithms, which aim to associate the frame- or clip-level detection results together to form action tubes, are reviewed. Then, the commonly used benchmark datasets and evaluation metrics are introduced, and the performance of state-of-the-art models is compared. At last, this paper is concluded, and a set of potential research directions of STAD are discussed.
引用
收藏
页数:35
相关论文
共 113 条
[1]   Hybrid Classifiers for Spatio-Temporal Abnormal Behavior Detection, Tracking, and Recognition in Massive Hajj Crowds [J].
Alafif, Tarik ;
Hadi, Anas ;
Allahyani, Manal ;
Alzahrani, Bander ;
Alhothali, Areej ;
Alotaibi, Reem ;
Barnawi, Ahmed .
ELECTRONICS, 2023, 12 (05)
[2]   Generative adversarial network based abnormal behavior detection in massive crowd videos: a Hajj case study [J].
Alafif, Tarik ;
Alzahrani, Bander ;
Cao, Yong ;
Alotaibi, Reem ;
Barnawi, Ahmed ;
Chen, Min .
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 13 (8) :4077-4088
[3]   CNN-Based Multiple Path Search for Action Tube Detection in Videos [J].
Alwando, Erick Hendra Putra ;
Chen, Yie-Tarng ;
Fang, Wen-Hsien .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (01) :104-116
[4]  
Bhoi A., ARXIV
[5]  
Blank M, 2005, IEEE I CONF COMP VIS, P1395
[6]  
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[7]   Cross-Dataset Action Detection [J].
Cao, Liangliang ;
Liu, Zicheng ;
Huang, Thomas S. .
2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, :1998-2005
[8]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[9]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[10]  
Chen G., ARXIV