Deep Learning for Activity Recognition Using Audio and Video

被引:10
|
作者
Reinolds, Francisco [1 ]
Neto, Cristiana [2 ,3 ]
Machado, Jose [2 ,3 ]
机构
[1] Univ Minho, Dept Informat, P-4710057 Braga, Portugal
[2] Univ Minho, Dept Informat, Algoritmi Res Ctr, P-4710057 Braga, Portugal
[3] Univ Minho, Intelligent Syst Associate Lab, LASI, P-4800058 Guimaraes, Portugal
关键词
action recognition; violence detection; real-time video stream; neural networks; audio classifiers; video classifiers;
D O I
10.3390/electronics11050782
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Neural networks have established themselves as powerhouses in what concerns several types of detection, ranging from human activities to their emotions. Several types of analysis exist, and the most popular and successful is video. However, there are other kinds of analysis, which, despite not being used as often, are still promising. In this article, a comparison between audio and video analysis is drawn in an attempt to classify violence detection in real-time streams. This study, which followed the CRISP-DM methodology, made use of several models available through PyTorch in order to test a diverse set of models and achieve robust results. The results obtained proved why video analysis has such prevalence, with the video classification handily outperforming its audio classification counterpart. Whilst the audio models attained on average 76% accuracy, video models secured average scores of 89%, showing a significant difference in performance. This study concluded that the applied methods are quite promising in detecting violence, using both audio and video.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Audio-Video Based Multimodal Emotion Recognition Using SVMs and Deep Learning
    Sun, Bo
    Xu, Qihua
    He, Jun
    Yu, Lejun
    Li, Liandong
    Wei, Qinglan
    PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 621 - 631
  • [2] Audio Recognition Using Deep Learning for Edge Devices
    Kulkarni, Aditya
    Jabade, Vaishali
    Patil, Aniket
    ADVANCES IN COMPUTING AND DATA SCIENCES (ICACDS 2022), PT II, 2022, 1614 : 186 - 198
  • [3] Video-Based Human Activity Recognition Using Deep Learning Approaches
    Surek, Guilherme Augusto Silva
    Seman, Laio Oriel
    Stefenon, Stefano Frizzo
    Mariani, Viviana Cocco
    Coelho, Leandro dos Santos
    SENSORS, 2023, 23 (14)
  • [4] ABNORMAL ACTIVITY RECOGNITION USING DEEP LEARNING IN STREAMING VIDEO FOR INDOOR APPLICATION
    Kumar, Dhananjay
    Sailaja, Srinivasan Ramapriya
    2021 ITU KALEIDOSCOPE CONFERENCE: CONNECTING PHYSICAL AND VIRTUAL WORLDS (ITU K), 2021, : 67 - 74
  • [5] Speech Emotion Recognition Using Deep Learning on audio recordings
    Suganya, S.
    Charles, E. Y. A.
    2019 19TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER - 2019), 2019,
  • [6] Audio-visual speech recognition using deep learning
    Noda, Kuniaki
    Yamaguchi, Yuki
    Nakadai, Kazuhiro
    Okuno, Hiroshi G.
    Ogata, Tetsuya
    APPLIED INTELLIGENCE, 2015, 42 (04) : 722 - 737
  • [7] Audio-visual speech recognition using deep learning
    Kuniaki Noda
    Yuki Yamaguchi
    Kazuhiro Nakadai
    Hiroshi G. Okuno
    Tetsuya Ogata
    Applied Intelligence, 2015, 42 : 722 - 737
  • [8] A SURVEY ON VIDEO FACE RECOGNITION USING DEEP LEARNING
    Mustapha, Muhammad Firdaus
    Mohamad, Nur Maisarah
    Hamid, Siti Haslini A. B.
    Malik, Mohd Azry Abdul
    Noor, Mohd Rahimie M. D.
    JOURNAL OF QUALITY MEASUREMENT AND ANALYSIS, 2022, 18 (01): : 49 - 62
  • [9] Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning
    Mocanu, Bogdan
    Tapu, Ruxandra
    Zaharia, Titus
    IMAGE AND VISION COMPUTING, 2023, 133
  • [10] Video-Audio Emotion Recognition Based on Feature Fusion Deep Learning Method
    Song, Yanan
    Cai, Yuanyang
    Tan, Lizhe
    2021 IEEE INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2021, : 611 - 616