Violence Detection in Video Using Statistical Features of the Optical Flow and 2D Convolutional Neural Network

被引：0

作者：

Mahmoodi, Javad ^{[1
,2
]}

Nezamabadi-Pour, Hossein ^{[1
]}

机构：

[1] Shahid Bahonar Univ Kerman, Dept Elect Engn, Kerman, Iran

[2] Islamic Azad Univ, Dept Elect Engn, Kerman Branch, Kerman, Iran

来源：

COMPUTATIONAL INTELLIGENCE | 2025年 / 41卷 / 02期

关键词：

2D convolutional neural network; deep learning; feature extraction; optical flow; violence detection; RECOGNITION; HISTOGRAMS; SYSTEMS;

D O I：

10.1111/coin.70034

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The rapid growth of video data has resulted in an increasing need for surveillance and violence detection systems. Although such events occur less frequently than normal activities, developing automated video surveillance systems for violence detection has become essential to minimize labor and time waste. Detecting violent activity in videos is a challenging task due to the variability and diversity of violent behavior, which can involve a wide range of actions, motions, and interactions between people and objects. Currently, researchers employ deep learning models to detect violent behaviors. In fact, a large number of deep learning approaches are based on extracting spatio-temporal information from a video by exploiting a 3D Convolutional Neural Network (CNN). Despite their success, these techniques require a lot more parameters than 2D CNNs and have high computational complexity. Therefore, we focus on exploiting a 2D CNN to encode spatio-temporal information. Actually, statistical features of the optical flow changes are used to give this ability to a 2D CNN. These features are designed to make attention to regions of a video clip with much more motion. Accordingly, the optical flow of an input video is calculated. To determine meaningful changes in the optical flow, the optical flow magnitude of a current frame is compared with its predecessor. After that, statistical features of these changes are extracted to summarize a video clip to a 2D template, which feeds a 2D CNN. Experimental results on four benchmark datasets observe that the suggested strategy outperforms baseline ones. In particular, we make a better estimation of the spatio-temporal features in a video by shortening a video clip into a 2D template.

引用

页数：15

共 52 条

[1]

Abdali Al-Maamoon R., 2019, 2019 2nd Scientific Conference of Computer Sciences (SCCS), P104, DOI 10.1109/SCCS.2019.8852616

[2]

Akti S., 2019, VisionBased Fight Detection from Surveillance Cameras, DOI 10.1109IPTA.2019.8936070

[3]

[Anonymous], 2012, 2012 IEEE COMP SOC C, DOI [DOI 10.1109/CVPRW.2012.6239348, 10.1109/CVPRW.2012.6239348]

[4]

Bellamine I, 2015, 2015 INTELLIGENT SYSTEMS AND COMPUTER VISION (ISCV)

[5] Abnormal behavior recognition for intelligent video surveillance systems: A review [J].

Ben Mabrouk, Amira ;

Zagrouba, Ezzeddine .

EXPERT SYSTEMS WITH APPLICATIONS, 2018, 91 :480-491

[6] Spatio-temporal feature using optical flow based distribution for violence detection [J].

Ben Mabrouk, Amira ;

Zagrouba, Ezzeddine .

PATTERN RECOGNITION LETTERS, 2017, 92 :62-67

[7]

Nievas EB, 2011, LECT NOTES COMPUT SC, V6855, P332, DOI 10.1007/978-3-642-23678-5_39

[8]

Chaudhry R, 2009, PROC CVPR IEEE, P1932, DOI 10.1109/CVPRW.2009.5206821

[9]

Chen Ming-Yu., MoSIFT: Recognizing Human Actions in Surveillance Videos

[10]

Cheng Wen-Huang, 2003, P 5 ACM SIGMM INT WO, P109, DOI DOI 10.1145/973264.973282

← 1 2 3 4 5 6 →