Transformer and Adaptive Threshold Sliding Window for Improving Violence Detection in Videos

被引：0

作者：

Rendon-Segador, Fernando J. ^{[1
]}

Alvarez-Garcia, Juan A. ^{[1
]}

Soria-Morillo, Luis M. ^{[1
]}

机构：

[1] Univ Seville, Dept Lenguajes & Sistemas Informat, Seville 41012, Spain

来源：

SENSORS | 2024年 / 24卷 / 16期

关键词：

deep learning; sliding window; transformer; violence detection; adaptive threshold;

D O I：

10.3390/s24165429

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

This paper presents a comprehensive approach to detect violent events in videos by combining CrimeNet, a Vision Transformer (ViT) model with structured neural learning and adversarial regularization, with an adaptive threshold sliding window model based on the Transformer architecture. CrimeNet demonstrates exceptional performance on all datasets (XD-Violence, UCF-Crime, NTU-CCTV Fights, UBI-Fights, Real Life Violence Situations, MediEval, RWF-2000, Hockey Fights, Violent Flows, Surveillance Camera Fights, and Movies Fight), achieving high AUC ROC and AUC PR values (up to 99% and 100%, respectively). However, the generalization of CrimeNet to cross-dataset experiments posed some problems, resulting in a 20-30% decrease in performance, for instance, training in UCF-Crime and testing in XD-Violence resulted in 70.20% in AUC ROC. The sliding window model with adaptive thresholding effectively solves these problems by automatically adjusting the violence detection threshold, resulting in a substantial improvement in detection accuracy. By applying the sliding window model as post-processing to CrimeNet results, we were able to improve detection accuracy by 10% to 15% in cross-dataset experiments. Future lines of research include improving generalization, addressing data imbalance, exploring multimodal representations, testing in real-world applications, and extending the approach to complex human interactions.

引用

页数：18

共 50 条

[31] Sensor Fault Detection, Localization, and System Reconfiguration with a Sliding Mode Observer and Adaptive Threshold of PMSM
Abderrezak, Aibeche
Madjid, Kidouche
JOURNAL OF POWER ELECTRONICS, 2016, 16 (03) : 1012 - 1024
[32] Violence Detection from Videos using HOG Features
Das, Sunanda
Sarker, Amlan
Mahmud, Tareq
2019 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL INFORMATION AND COMMUNICATION TECHNOLOGY (EICT), 2019,
[33] Detection of Violence in Cartoon Videos Using Visual Features
Khalil, Tahira
Bangash, Javed Iqbal
Khan, Abdul Waheed
Lashari, Saima Anwar
Khan, Abdullah
Ramli, Dzati Athiar
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KSE 2021), 2021, 192 : 4962 - 4971
[34] Adaptive Change Detection for Long-Term Machinery Monitoring Using Incremental Sliding-Window
Teng Wang
Guo-Liang Lu
Jie Liu
Peng Yan
Chinese Journal of Mechanical Engineering, 2017, 30 (06) : 1338 - 1346
[35] Adaptive Change Detection for Long-Term Machinery Monitoring Using Incremental Sliding-Window
Teng Wang
Guo-Liang Lu
Jie Liu
Peng Yan
Chinese Journal of Mechanical Engineering, 2017, 30 : 1338 - 1346
[36] Adaptive Change Detection for Long-Term Machinery Monitoring Using Incremental Sliding-Window
Wang, Teng
Lu, Guo-Liang
Liu, Jie
Yan, Peng
CHINESE JOURNAL OF MECHANICAL ENGINEERING, 2017, 30 (06) : 1338 - 1346
[37] Autocorrelation of gradients based violence detection in surveillance videos
Deepak, K.
Vignesh, L. K. P.
Chandrakala, S.
ICT EXPRESS, 2020, 6 (03): : 155 - 159
[38] CrimeNet: Neural Structured Learning using Vision Transformer for violence detection
Rendon-Segador, Fernando J.
Alvarez-Garcia, Juan A.
Salazar-Gonzalez, Jose L.
Tommasi, Tatiana
NEURAL NETWORKS, 2023, 161 : 318 - 329
[39] Sliding window based CAC for adaptive service in mobile network
Zhao, P
Zhang, HM
13TH IEEE INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS, VOL 1-5, PROCEEDINGS: SAILING THE WAVES OF THE WIRELESS OCEANS, 2002, : 2165 - 2169
[40] Anomaly detection in surveillance videos using Transformer with margin learning
Wang, Dicong
Wu, Kaijun
MULTIMEDIA SYSTEMS, 2024, 30 (05)

← 1 2 3 4 5 →