Deep Learning for Activity Recognition Using Audio and Video

被引：10

作者：

Reinolds, Francisco ^{[1
]}

Neto, Cristiana ^{[2
,3
]}

Machado, Jose ^{[2
,3
]}

机构：

[1] Univ Minho, Dept Informat, P-4710057 Braga, Portugal

[2] Univ Minho, Dept Informat, Algoritmi Res Ctr, P-4710057 Braga, Portugal

[3] Univ Minho, Intelligent Syst Associate Lab, LASI, P-4800058 Guimaraes, Portugal

来源：

ELECTRONICS | 2022年 / 11卷 / 05期

关键词：

action recognition; violence detection; real-time video stream; neural networks; audio classifiers; video classifiers;

D O I：

10.3390/electronics11050782

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Neural networks have established themselves as powerhouses in what concerns several types of detection, ranging from human activities to their emotions. Several types of analysis exist, and the most popular and successful is video. However, there are other kinds of analysis, which, despite not being used as often, are still promising. In this article, a comparison between audio and video analysis is drawn in an attempt to classify violence detection in real-time streams. This study, which followed the CRISP-DM methodology, made use of several models available through PyTorch in order to test a diverse set of models and achieve robust results. The results obtained proved why video analysis has such prevalence, with the video classification handily outperforming its audio classification counterpart. Whilst the audio models attained on average 76% accuracy, video models secured average scores of 89%, showing a significant difference in performance. This study concluded that the applied methods are quite promising in detecting violence, using both audio and video.

引用

页数：13

共 50 条

[1] Audio-Video Based Multimodal Emotion Recognition Using SVMs and Deep Learning
Sun, Bo
Xu, Qihua
He, Jun
Yu, Lejun
Li, Liandong
Wei, Qinglan
PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 621 - 631
[2] Audio Recognition Using Deep Learning for Edge Devices
Kulkarni, Aditya
Jabade, Vaishali
Patil, Aniket
ADVANCES IN COMPUTING AND DATA SCIENCES (ICACDS 2022), PT II, 2022, 1614 : 186 - 198
[3] Video-Based Human Activity Recognition Using Deep Learning Approaches
Surek, Guilherme Augusto Silva
Seman, Laio Oriel
Stefenon, Stefano Frizzo
Mariani, Viviana Cocco
Coelho, Leandro dos Santos
SENSORS, 2023, 23 (14)
[4] ABNORMAL ACTIVITY RECOGNITION USING DEEP LEARNING IN STREAMING VIDEO FOR INDOOR APPLICATION
Kumar, Dhananjay
Sailaja, Srinivasan Ramapriya
2021 ITU KALEIDOSCOPE CONFERENCE: CONNECTING PHYSICAL AND VIRTUAL WORLDS (ITU K), 2021, : 67 - 74
[5] Speech Emotion Recognition Using Deep Learning on audio recordings
Suganya, S.
Charles, E. Y. A.
2019 19TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER - 2019), 2019,
[6] Audio-visual speech recognition using deep learning
Noda, Kuniaki
Yamaguchi, Yuki
Nakadai, Kazuhiro
Okuno, Hiroshi G.
Ogata, Tetsuya
APPLIED INTELLIGENCE, 2015, 42 (04) : 722 - 737
[7] Audio-visual speech recognition using deep learning
Kuniaki Noda
Yuki Yamaguchi
Kazuhiro Nakadai
Hiroshi G. Okuno
Tetsuya Ogata
Applied Intelligence, 2015, 42 : 722 - 737
[8] A SURVEY ON VIDEO FACE RECOGNITION USING DEEP LEARNING
Mustapha, Muhammad Firdaus
Mohamad, Nur Maisarah
Hamid, Siti Haslini A. B.
Malik, Mohd Azry Abdul
Noor, Mohd Rahimie M. D.
JOURNAL OF QUALITY MEASUREMENT AND ANALYSIS, 2022, 18 (01): : 49 - 62
[9] Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning
Mocanu, Bogdan
Tapu, Ruxandra
Zaharia, Titus
IMAGE AND VISION COMPUTING, 2023, 133
[10] Video-Audio Emotion Recognition Based on Feature Fusion Deep Learning Method
Song, Yanan
Cai, Yuanyang
Tan, Lizhe
2021 IEEE INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2021, : 611 - 616

← 1 2 3 4 5 →