Human Violence Recognition in Video Surveillance in Real-Time

被引：5

作者：

Huillcen Baca, Herwin Alayn ^{[1
]}

Palomino Valdivia, Flor de Luz ^{[1
]}

Soria Solis, Ivan ^{[1
]}

Aquino Cruz, Mario ^{[2
]}

Gutierrez Caceres, Juan Carlos ^{[3
]}

机构：

[1] Jose Maria Arguedas Natl Univ, Apurimac, Peru

[2] Micaela Bastidas Univ, Apurimac, Peru

[3] San Agustin Natl Univ, Arequipa, Peru

来源：

ADVANCES IN INFORMATION AND COMMUNICATION, FICC, VOL 2 | 2023年 / 652卷

关键词：

Human violence recognition; Video surveillance; Real-time; Frame difference; Channel average; Real scenario;

D O I：

10.1007/978-3-031-28073-3_52

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The automatic detection of human violence in video surveillance is an area of great attention due to its application in security, monitoring, and prevention systems. Detecting violence in real time could prevent criminal acts and even save lives. There are many investigations and proposals for the detection of violence in video surveillance; however, most of them focus on effectiveness and not on efficiency. They focus on overcoming the accuracy results of other proposals and not on their applicability in a real scenario and real-time. In this work, we propose an efficient model for recognizing human violence in real-time, based on deep learning, composed of two modules, a spatial attention module (SA) and a temporal attention module (TA). SA extracts spatial features and regions of interest by frame difference of two consecutive frames and morphological dilation. TA extracts temporal features by averaging all three RGB channels in a single channel to have three frames as input to a 2D CNN backbone. The proposal was evaluated in efficiency, accuracy, and real-time. The results showed that our work has the best efficiency compared to other proposals. Accuracy was very close to the result of the best proposal, and latency was very close to real-time. Therefore our model can be applied in real scenarios and in real-time.

引用

页码：783 / 795

页数：13

共 31 条

[1]

Akti S., 2019, P 9 INT C IM PROC TH, P1, DOI DOI 10.1109/IPTA.2019.8936070

[2]

Baca Herwin Alayn Huillcen, 2022, Advances in Information and Communication: Proceedings of the 2022 Future of Information and Communication Conference (FICC). Lecture Notes in Networks and Systems (438), P342, DOI 10.1007/978-3-030-98012-2_26

[3]

Nievas EB, 2011, LECT NOTES COMPUT SC, V6855, P332, DOI 10.1007/978-3-642-23678-5_39

[4]

Bilinski P, 2016, 2016 13TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), P30, DOI 10.1109/AVSS.2016.7738019

[5] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[6]

Cheng M, 2020, Arxiv, DOI arXiv:1911.05913

[7] Machine Cognition of Violence in Videos using Novel Outlier-Resistant VLAD [J].

Deb, Tonmoay ;

Arman, Aziz ;

Firoze, Adnan .

2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, :989-994

[8]

Deniz O, 2014, PROCEEDINGS OF THE 2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, THEORY AND APPLICATIONS (VISAPP 2014), VOL 2, P478

[9] Multi-stream Deep Networks for Person to Person Violence Detection in Videos [J].

Dong, Zhihong ;

Qin, Jie ;

Wang, Yunhong .

PATTERN RECOGNITION (CCPR 2016), PT I, 2016, 662 :517-531

[10] Learning Spatiotemporal Features with 3D Convolutional Networks [J].

Du Tran ;

Bourdev, Lubomir ;

Fergus, Rob ;

Torresani, Lorenzo ;

Paluri, Manohar .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497

← 1 2 3 4 →