A Review and Comparative Study of Explainable Deep Learning Models Applied on Action Recognition in Real Time

被引:14
作者
Mahmoudi, Sidi Ahmed [1 ]
Amel, Otmane [1 ]
Stassin, Sedrick [1 ]
Liagre, Margot [1 ]
Benkedadra, Mohamed [1 ,2 ]
Mancas, Matei [2 ]
机构
[1] Univ Mons, Fac Engn, ILIA Lab, B-7000 Mons, Belgium
[2] Univ Mons, Fac Engn, ISIA Lab, B-7000 Mons, Belgium
关键词
action recognition; computer vision; deep learning; explainable artificial intelligence; depth maps; TRACKING;
D O I
10.3390/electronics12092027
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video surveillance and image acquisition systems represent one of the most active research topics in computer vision and smart city domains. The growing concern for public and workers' safety has led to a significant increase in the use of surveillance cameras that provide high-definition images and even depth maps when 3D cameras are available. Consequently, the need for automatic techniques for behavior analysis and action recognition is also increasing for several applications such as dangerous actions detection in railway stations or construction sites, event detection in crowd videos, behavior analysis, optimization in industrial sites, etc. In this context, several computer vision and deep learning solutions have been proposed recently where deep neural networks provided more accurate solutions, but they are not so efficient in terms of explainability and flexibility since they remain adapted for specific situations only. Moreover, the complexity of deep neural architectures requires the use of high computing resources to provide fast and real-time computations. In this paper, we propose a review and a comparative analysis of deep learning solutions in terms of precision, explainability, computation time, memory size, and flexibility. Experimental results are conducted within simulated and real-world dangerous actions in railway construction sites. Thanks to our comparative analysis and evaluation, we propose a personalized approach for dangerous action recognition depending on the type of collected data (image) and users' requirements.
引用
收藏
页数:19
相关论文
共 84 条
[1]  
An JH, 2020, Journal of Physics Conference Series, V1607, P012116, DOI [10.1088/1742-6596/1607/1/012116, 10.1088/1742-6596/1607/1/012116, DOI 10.1088/1742-6596/1607/1/012116]
[2]   The computation of optical flow [J].
Beauchemin, SS ;
Barron, JL .
ACM COMPUTING SURVEYS, 1995, 27 (03) :433-467
[3]  
Benabbas Yassine, 2010, Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR 2010), P4295, DOI 10.1109/ICPR.2010.1044
[4]   Motion Pattern Extraction and Event Detection for Automatic Visual Surveillance [J].
Benabbas, Yassine ;
Ihaddadene, Nacim ;
Djeraba, Chaabane .
EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2011,
[5]  
Bertasius G, 2021, PR MACH LEARN RES, V139
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[8]   Real-time human action recognition based on depth motion maps [J].
Chen, Chen ;
Liu, Kui ;
Kehtarnavaz, Nasser .
JOURNAL OF REAL-TIME IMAGE PROCESSING, 2016, 12 (01) :155-163
[9]  
Contributors M., 2020, OpenMMLabs Next Generation Video Understanding Toolbox and Benchmark
[10]   MARS: Motion-Augmented RGB Stream for Action Recognition [J].
Crasto, Nieves ;
Weinzaepfel, Philippe ;
Alahari, Karteek ;
Schmid, Cordelia .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7874-7883