Automatic Detection of Violence in Video Scenes

被引:1
作者
Eitta, Ahmed Abo [1 ]
Barghash, Toka [1 ]
Nafea, Yousef [1 ]
Gomaa, Walid [1 ,2 ]
机构
[1] Egypt Japan Univ Sci & Technol, Cyber Phys Syst Lab, Alexandria, Egypt
[2] Alexandria Univ, Fac Engn, Alexandria, Egypt
来源
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2021年
关键词
Violence detection; automatic feature extraction; transfer learning; convolutional neural networks; support vector machines; random forests;
D O I
10.1109/IJCNN52387.2021.9533669
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Violence detection from video stream using machine learning is a very rising field of inspection due to its great contribution to achieving peace and security and saving people's lives by automatically early-detecting violent acts and alarming those responsible to interfere. Knowing its significance, A Convolutional Neural Network (CNN) has been designed and implemented to tackle this problem, along with two classical classifiers, namely, support vector machine and random forest, in order to detect violence in video streams. The CNN is used both as a classifier and as a feature extractor whose outputs are fed into the other two classifiers. A data structure called `packets' was developed to refer to the input of our CNN network, packets are used to train the model on short clips where a packet consists of 15 sampled frames that make up one second of a video. So the input to the CNN is a subsampling of 3D volume of video frames. The problem is posed as a binary classification. Using four different datasets one of which is a combination of YouTube videos collected and annotated by the authors mixed with a dataset found on kaggle but also filtered by the authors. In addition, three other benchmark datasets is used. Our model was trained using supervised learning on a set of normal and violent videos, and tested using three separate classifiers whose results are compared together and then compared to other state-of-theart approaches. Finally, transfer learning was implemented via cross validating models that where trained on a certain dataset and tested on another.
引用
收藏
页数:8
相关论文
共 26 条
  • [1] Silhouette-based gesture and action recognition via modeling trajectories on Riemannian shape manifolds
    Abdelkader, Mohamed F.
    Abd-Almageed, Wael
    Srivastava, Anuj
    Chellappa, Rama
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2011, 115 (03) : 439 - 455
  • [2] Vision-based Fight Detection from Surveillance Cameras
    Aktt, Seymanur
    Tataroglu, Gozde Ayse
    Ekenel, Hazun Kemal
    [J]. 2019 NINTH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA), 2019,
  • [3] Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning
    Ali, Saad
    Shah, Mubarak
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (02) : 288 - 303
  • [4] AMOS B, 2015, OPENFACE FACE RECOGN, DOI DOI 10.5281/ZEN0D0.32041
  • [5] [Anonymous], DETECTING VIOLENT CR
  • [6] [Anonymous], 2 JOINT IEEE INT WOR, DOI DOI 10.1109/VSPETS.2005.1570899
  • [7] AYTAR Y, 2011, TABULA RASA MODEL TR, V11, P2252
  • [8] Nievas EB, 2011, LECT NOTES COMPUT SC, V6855, P332, DOI 10.1007/978-3-642-23678-5_39
  • [9] Bilinski P, 2016, 2016 13TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), P30, DOI 10.1109/AVSS.2016.7738019
  • [10] ELESAWY MAE, 2019, REAL LIFE VIOLENCE S