CrimeNet: Neural Structured Learning using Vision Transformer for violence detection

被引：21

作者：

Rendon-Segador, Fernando J. ^{[1
]}

Alvarez-Garcia, Juan A. ^{[1
]}

Salazar-Gonzalez, Jose L. ^{[1
]}

Tommasi, Tatiana ^{[2
]}

机构：

[1] Univ Seville, Dept Lenguajes & Sistemas Informat, Seville, Spain

[2] Politecn Torino & Italian Inst Technol, Turin, Italy

来源：

NEURAL NETWORKS | 2023年 / 161卷

关键词：

Deep learning; Neural Structured Learning; Vision Transformer; Violence detection; Adversarial Learning; VIDEO;

D O I：

10.1016/j.neunet.2023.01.048

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The state of the art in violence detection in videos has improved in recent years thanks to deep learning models, but it is still below 90% of average precision in the most complex datasets, which may pose a problem of frequent false alarms in video surveillance environments and may cause security guards to disable the artificial intelligence system.In this study, we propose a new neural network based on Vision Transformer (ViT) and Neural Structured Learning (NSL) with adversarial training. This network, called CrimeNet, outperforms previous works by a large margin and reduces practically to zero the false positives. Our tests on the four most challenging violence-related datasets (binary and multi-class) show the effectiveness of CrimeNet, improving the state of the art from 9.4 to 22.17 percentage points in ROC AUC depending on the dataset. In addition, we present a generalisation study on our model by training and testing it on different datasets. The obtained results show that CrimeNet improves over competing methods with a gain of between 12.39 and 25.22 percentage points, showing remarkable robustness.(c) 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

引用

页码：318 / 329

页数：12

共 54 条

[1]

Ainsworth T., 2002, Security Oz, P18

[2] ViViT: A Video Vision Transformer [J].

Arnab, Anurag ;

Dehghani, Mostafa ;

Heigold, Georg ;

Sun, Chen ;

Lucic, Mario ;

Schmid, Cordelia .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6816-6826

[3]

Nievas EB, 2011, LECT NOTES COMPUT SC, V6855, P332, DOI 10.1007/978-3-642-23678-5_39

[4] Neural Graph Learning: Training Neural Networks Using Graphs [J].

Bui, Thang D. ;

Ravi, Sujith ;

Ramavajjala, Vivek .

WSDM'18: PROCEEDINGS OF THE ELEVENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2018, :64-71

[5]

Chang S., 2021, AAAI CONF ARTIF INTE

[6]

Charikar M.S., 2002, PROC ACM S THEORY CO, P380, DOI DOI 10.1145/509907.509965

[7] Memory Enhanced Global-Local Aggregation for Video Object Detection [J].

Chen, Yihong ;

Cao, Yue ;

Hu, Han ;

Wang, Liwei .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10334-10343

[8] Iterative weak/self-supervised classification framework for abnormal events detection [J].

Degardin, Bruno ;

Proenca, Hugo .

PATTERN RECOGNITION LETTERS, 2021, 145 :50-57

[9]

Degardin Bruno Manuel, 2020, Weakly and partially supervised learning frameworks for anomaly detection

[10]

Deniz O, 2014, PROCEEDINGS OF THE 2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, THEORY AND APPLICATIONS (VISAPP 2014), VOL 2, P478

← 1 2 3 4 5 6 →