CrimeNet: Neural Structured Learning using Vision Transformer for violence detection

被引:21
作者
Rendon-Segador, Fernando J. [1 ]
Alvarez-Garcia, Juan A. [1 ]
Salazar-Gonzalez, Jose L. [1 ]
Tommasi, Tatiana [2 ]
机构
[1] Univ Seville, Dept Lenguajes & Sistemas Informat, Seville, Spain
[2] Politecn Torino & Italian Inst Technol, Turin, Italy
关键词
Deep learning; Neural Structured Learning; Vision Transformer; Violence detection; Adversarial Learning; VIDEO;
D O I
10.1016/j.neunet.2023.01.048
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The state of the art in violence detection in videos has improved in recent years thanks to deep learning models, but it is still below 90% of average precision in the most complex datasets, which may pose a problem of frequent false alarms in video surveillance environments and may cause security guards to disable the artificial intelligence system.In this study, we propose a new neural network based on Vision Transformer (ViT) and Neural Structured Learning (NSL) with adversarial training. This network, called CrimeNet, outperforms previous works by a large margin and reduces practically to zero the false positives. Our tests on the four most challenging violence-related datasets (binary and multi-class) show the effectiveness of CrimeNet, improving the state of the art from 9.4 to 22.17 percentage points in ROC AUC depending on the dataset. In addition, we present a generalisation study on our model by training and testing it on different datasets. The obtained results show that CrimeNet improves over competing methods with a gain of between 12.39 and 25.22 percentage points, showing remarkable robustness.(c) 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:318 / 329
页数:12
相关论文
共 54 条
[1]  
Ainsworth T., 2002, Security Oz, P18
[2]   ViViT: A Video Vision Transformer [J].
Arnab, Anurag ;
Dehghani, Mostafa ;
Heigold, Georg ;
Sun, Chen ;
Lucic, Mario ;
Schmid, Cordelia .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6816-6826
[3]  
Nievas EB, 2011, LECT NOTES COMPUT SC, V6855, P332, DOI 10.1007/978-3-642-23678-5_39
[4]   Neural Graph Learning: Training Neural Networks Using Graphs [J].
Bui, Thang D. ;
Ravi, Sujith ;
Ramavajjala, Vivek .
WSDM'18: PROCEEDINGS OF THE ELEVENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2018, :64-71
[5]  
Chang S., 2021, AAAI CONF ARTIF INTE
[6]  
Charikar M.S., 2002, PROC ACM S THEORY CO, P380, DOI DOI 10.1145/509907.509965
[7]   Memory Enhanced Global-Local Aggregation for Video Object Detection [J].
Chen, Yihong ;
Cao, Yue ;
Hu, Han ;
Wang, Liwei .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10334-10343
[8]   Iterative weak/self-supervised classification framework for abnormal events detection [J].
Degardin, Bruno ;
Proenca, Hugo .
PATTERN RECOGNITION LETTERS, 2021, 145 :50-57
[9]  
Degardin Bruno Manuel, 2020, Weakly and partially supervised learning frameworks for anomaly detection
[10]  
Deniz O, 2014, PROCEEDINGS OF THE 2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, THEORY AND APPLICATIONS (VISAPP 2014), VOL 2, P478