Transformer and Adaptive Threshold Sliding Window for Improving Violence Detection in Videos

被引：0

作者：

Rendon-Segador, Fernando J. ^{[1
]}

Alvarez-Garcia, Juan A. ^{[1
]}

Soria-Morillo, Luis M. ^{[1
]}

机构：

[1] Univ Seville, Dept Lenguajes & Sistemas Informat, Seville 41012, Spain

来源：

SENSORS | 2024年 / 24卷 / 16期

关键词：

deep learning; sliding window; transformer; violence detection; adaptive threshold;

D O I：

10.3390/s24165429

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

This paper presents a comprehensive approach to detect violent events in videos by combining CrimeNet, a Vision Transformer (ViT) model with structured neural learning and adversarial regularization, with an adaptive threshold sliding window model based on the Transformer architecture. CrimeNet demonstrates exceptional performance on all datasets (XD-Violence, UCF-Crime, NTU-CCTV Fights, UBI-Fights, Real Life Violence Situations, MediEval, RWF-2000, Hockey Fights, Violent Flows, Surveillance Camera Fights, and Movies Fight), achieving high AUC ROC and AUC PR values (up to 99% and 100%, respectively). However, the generalization of CrimeNet to cross-dataset experiments posed some problems, resulting in a 20-30% decrease in performance, for instance, training in UCF-Crime and testing in XD-Violence resulted in 70.20% in AUC ROC. The sliding window model with adaptive thresholding effectively solves these problems by automatically adjusting the violence detection threshold, resulting in a substantial improvement in detection accuracy. By applying the sliding window model as post-processing to CrimeNet results, we were able to improve detection accuracy by 10% to 15% in cross-dataset experiments. Future lines of research include improving generalization, addressing data imbalance, exploring multimodal representations, testing in real-world applications, and extending the approach to complex human interactions.

引用

页数：18

共 50 条

[21] The improving of modular multiplication algorithm basing on sliding window
Chen Jingdong
Fang Xiangyan
SECOND INTERNATIONAL CONFERENCE ON SPACE INFORMATION TECHNOLOGY, PTS 1-3, 2007, 6795
[22] Audiovisual Dependency Attention for Violence Detection in Videos
Pang, Wenfeng
Xie, Wei
He, Qianhua
Li, Yanxiong
Yang, Jichen
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4922 - 4932
[23] An Algorithm for Improving Sliding Window Network Coding in TCP
Karafillis, P.
Fouli, K.
ParandehGheibi, A.
Medard, M.
2013 47TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2013,
[24] FTCF: Full temporal cross fusion network for violence detection in videos
Tan Zhenhua
Xia Zhenche
Wang Pengfei
Ding Chang
Zhai Weichao
APPLIED INTELLIGENCE, 2023, 53 (04) : 4218 - 4230
[25] A Comprehensive Review on Vision-Based Violence Detection in Surveillance Videos
Ullah, Fath U. Min
Obaidat, Mohammad S.
Ullah, Amin
Muhammad, Khan
Hijji, Mohammad
Baik, Sung Wook
ACM COMPUTING SURVEYS, 2023, 55 (10)
[26] FTCF: Full temporal cross fusion network for violence detection in videos
Tan Zhenhua
Xia Zhenche
Wang Pengfei
Ding Chang
Zhai Weichao
Applied Intelligence, 2023, 53 : 4218 - 4230
[27] Violence Detection From Industrial Surveillance Videos Using Deep Learning
Khan, Hamza
Yuan, Xiaohong
Qingge, Letu
Roy, Kaushik
IEEE ACCESS, 2025, 13 : 15363 - 15375
[28] Vision Transformer-Based Tailing Detection in Videos
Lee, Jaewoo
Lee, Sungjun
Cho, Wonki
Siddiqui, Zahid Ali
Park, Unsang
APPLIED SCIENCES-BASEL, 2021, 11 (24):
[29] Approaching Optimal Duplicate Detection in a Sliding Window
Geraud-Stewart, Remi
Lombard-Platet, Marius
Naccache, David
COMPUTING AND COMBINATORICS (COCOON 2020), 2020, 12273 : 64 - 84
[30] An adaptive threshold packet acquisition scheme in gliding window
Wu, Yucheng
Chen, Tingting
Chen, Nin
DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES B-APPLICATIONS & ALGORITHMS, 2007, 14 : 1504 - 1507

← 1 2 3 4 5 →