Deep Learning for Activity Recognition Using Audio and Video

被引:10
作者
Reinolds, Francisco [1 ]
Neto, Cristiana [2 ,3 ]
Machado, Jose [2 ,3 ]
机构
[1] Univ Minho, Dept Informat, P-4710057 Braga, Portugal
[2] Univ Minho, Dept Informat, Algoritmi Res Ctr, P-4710057 Braga, Portugal
[3] Univ Minho, Intelligent Syst Associate Lab, LASI, P-4800058 Guimaraes, Portugal
关键词
action recognition; violence detection; real-time video stream; neural networks; audio classifiers; video classifiers;
D O I
10.3390/electronics11050782
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Neural networks have established themselves as powerhouses in what concerns several types of detection, ranging from human activities to their emotions. Several types of analysis exist, and the most popular and successful is video. However, there are other kinds of analysis, which, despite not being used as often, are still promising. In this article, a comparison between audio and video analysis is drawn in an attempt to classify violence detection in real-time streams. This study, which followed the CRISP-DM methodology, made use of several models available through PyTorch in order to test a diverse set of models and achieve robust results. The results obtained proved why video analysis has such prevalence, with the video classification handily outperforming its audio classification counterpart. Whilst the audio models attained on average 76% accuracy, video models secured average scores of 89%, showing a significant difference in performance. This study concluded that the applied methods are quite promising in detecting violence, using both audio and video.
引用
收藏
页数:13
相关论文
共 50 条
[21]   Enhancing human activity recognition using deep learning and time series augmented data [J].
Alawneh, Luay ;
Alsarhan, Tamam ;
Al-Zinati, Mohammad ;
Al-Ayyoub, Mahmoud ;
Jararweh, Yaser ;
Lu, Hongtao .
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 12 (12) :10565-10580
[22]   Personalized human activity recognition using deep learning and edge-cloud architecture [J].
Alawneh, Luay ;
Al-Ayyoub, Mahmoud ;
Al-Sharif, Ziad A. ;
Shatnawi, Ahmed .
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2022, 14 (9) :12021-12033
[23]   Persian speech recognition using deep learning [J].
Veisi, Hadi ;
Haji Mani, Armita .
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (04) :893-905
[24]   Recognition of driver distractions using deep learning [J].
Valeriano, Leonel Cuevas ;
Napoletano, Paolo ;
Schettini, Raimondo .
2018 IEEE 8TH INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - BERLIN (ICCE-BERLIN), 2018,
[25]   Audio Emotion Recognition using Machine Learning to support Sound Design [J].
Cunningham, Stuart ;
Ridley, Harrison ;
Weinel, Jonathan ;
Picking, Richard .
PROCEEDINGS OF THE 14TH INTERNATIONAL AUDIO MOSTLY CONFERENCE, AM 2019: A Journey in Sound, 2019, :116-123
[26]   Video Surveillance for Violence Detection Using Deep Learning [J].
Sharma, Manan ;
Baghel, Rishabh .
ADVANCES IN DATA SCIENCE AND MANAGEMENT, 2020, 37 :411-420
[27]   Deepfake video detection using deep learning algorithms [J].
Korkmaz, Sahin ;
Alkan, Mustafa .
JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI, 2023, 26 (02) :855-862
[28]   Video Action Classification Using Symmelets and Deep Learning [J].
Alghyaline, Salah ;
Hsieh, Jun-Wei ;
Chuang, Chi-Hung .
2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, :414-419
[29]   Deep Learning for Human Action Recognition [J].
Shekokar, R. U. ;
Kale, S. N. .
2021 6TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2021,
[30]   Deep Learning for Audio Signal Processing [J].
Purwins, Hendrik ;
Li, Bo ;
Virtanen, Tuomas ;
Schlueter, Jan ;
Chang, Shuo-Yiin ;
Sainath, Tara .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (02) :206-219