Transformer and Adaptive Threshold Sliding Window for Improving Violence Detection in Videos

被引:0
|
作者
Rendon-Segador, Fernando J. [1 ]
Alvarez-Garcia, Juan A. [1 ]
Soria-Morillo, Luis M. [1 ]
机构
[1] Univ Seville, Dept Lenguajes & Sistemas Informat, Seville 41012, Spain
关键词
deep learning; sliding window; transformer; violence detection; adaptive threshold;
D O I
10.3390/s24165429
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
This paper presents a comprehensive approach to detect violent events in videos by combining CrimeNet, a Vision Transformer (ViT) model with structured neural learning and adversarial regularization, with an adaptive threshold sliding window model based on the Transformer architecture. CrimeNet demonstrates exceptional performance on all datasets (XD-Violence, UCF-Crime, NTU-CCTV Fights, UBI-Fights, Real Life Violence Situations, MediEval, RWF-2000, Hockey Fights, Violent Flows, Surveillance Camera Fights, and Movies Fight), achieving high AUC ROC and AUC PR values (up to 99% and 100%, respectively). However, the generalization of CrimeNet to cross-dataset experiments posed some problems, resulting in a 20-30% decrease in performance, for instance, training in UCF-Crime and testing in XD-Violence resulted in 70.20% in AUC ROC. The sliding window model with adaptive thresholding effectively solves these problems by automatically adjusting the violence detection threshold, resulting in a substantial improvement in detection accuracy. By applying the sliding window model as post-processing to CrimeNet results, we were able to improve detection accuracy by 10% to 15% in cross-dataset experiments. Future lines of research include improving generalization, addressing data imbalance, exploring multimodal representations, testing in real-world applications, and extending the approach to complex human interactions.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] The improving of modular multiplication algorithm basing on sliding window
    Chen Jingdong
    Fang Xiangyan
    SECOND INTERNATIONAL CONFERENCE ON SPACE INFORMATION TECHNOLOGY, PTS 1-3, 2007, 6795
  • [22] Audiovisual Dependency Attention for Violence Detection in Videos
    Pang, Wenfeng
    Xie, Wei
    He, Qianhua
    Li, Yanxiong
    Yang, Jichen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4922 - 4932
  • [23] An Algorithm for Improving Sliding Window Network Coding in TCP
    Karafillis, P.
    Fouli, K.
    ParandehGheibi, A.
    Medard, M.
    2013 47TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2013,
  • [24] FTCF: Full temporal cross fusion network for violence detection in videos
    Tan Zhenhua
    Xia Zhenche
    Wang Pengfei
    Ding Chang
    Zhai Weichao
    APPLIED INTELLIGENCE, 2023, 53 (04) : 4218 - 4230
  • [25] A Comprehensive Review on Vision-Based Violence Detection in Surveillance Videos
    Ullah, Fath U. Min
    Obaidat, Mohammad S.
    Ullah, Amin
    Muhammad, Khan
    Hijji, Mohammad
    Baik, Sung Wook
    ACM COMPUTING SURVEYS, 2023, 55 (10)
  • [26] FTCF: Full temporal cross fusion network for violence detection in videos
    Tan Zhenhua
    Xia Zhenche
    Wang Pengfei
    Ding Chang
    Zhai Weichao
    Applied Intelligence, 2023, 53 : 4218 - 4230
  • [27] Violence Detection From Industrial Surveillance Videos Using Deep Learning
    Khan, Hamza
    Yuan, Xiaohong
    Qingge, Letu
    Roy, Kaushik
    IEEE ACCESS, 2025, 13 : 15363 - 15375
  • [28] Vision Transformer-Based Tailing Detection in Videos
    Lee, Jaewoo
    Lee, Sungjun
    Cho, Wonki
    Siddiqui, Zahid Ali
    Park, Unsang
    APPLIED SCIENCES-BASEL, 2021, 11 (24):
  • [29] Approaching Optimal Duplicate Detection in a Sliding Window
    Geraud-Stewart, Remi
    Lombard-Platet, Marius
    Naccache, David
    COMPUTING AND COMBINATORICS (COCOON 2020), 2020, 12273 : 64 - 84
  • [30] An adaptive threshold packet acquisition scheme in gliding window
    Wu, Yucheng
    Chen, Tingting
    Chen, Nin
    DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES B-APPLICATIONS & ALGORITHMS, 2007, 14 : 1504 - 1507