AReN: A Deep Learning Approach for Sound Event Recognition Using a Brain Inspired Representation

被引：35

作者：

Greco, Antonio ^{[1
]}

Petkov, Nicolai ^{[2
]}

Saggese, Alessia ^{[1
]}

Vento, Mario ^{[1
]}

机构：

[1] Univ Salerno, Dept Informat Engn Elect Engn & Appl Math, I-84084 Fisciano, Italy

[2] Univ Groningen, Fac Sci & Engn, NL-9712 CP Groningen, Netherlands

来源：

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY | 2020年 / 15卷

关键词：

Training; Time-frequency analysis; Machine learning; Spectrogram; Surveillance; Signal to noise ratio; Standards; audio surveillance; deep learning; CNN; gammatonegram; brain inspired representation; NEURAL-NETWORK; CLASSIFICATION; SURVEILLANCE; FEATURES; PATTERN;

D O I：

10.1109/TIFS.2020.2994740

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Audio surveillance is gaining in the last years wide interest. This is due to the large number of situations in which this kind of systems can be used, either alone or combined with video-based algorithms. In this paper we propose a deep learning method to automatically recognize events of interest in the context of audio surveillance (namely screams, broken glasses and gun shots). The audio stream is represented by a gammatonegram image. We propose a 21-layer CNN to which we feed sections of the gammatonegram representation. At the output of this CNN there are units that correspond to the classes. We trained the CNN, called AReN, by taking advantage of a problem-driven data augmentation, which extends the training dataset with gammatonegram images extracted by sounds acquired with different signal to noise ratios. We experimented it with three datasets freely available, namely SESA, MIVIA Audio Events and MIVIA Road Events and we achieved 91.43%, 99.62% and 100% recognition rate, respectively. We compared our method with other state of the art methodologies based both on traditional machine learning methodologies and deep learning. The comparison confirms the effectiveness of the proposed approach, which outperforms the existing methods in terms of recognition rate. We experimentally prove that the proposed network is resilient to the noise, has the capability to significantly reduce the false positive rate and is able to generalize in different scenarios. Furthermore, AReN is able to process 5 audio frames per second on a standard CPU and, consequently, it is suitable for real audio surveillance applications.

引用

页码：3610 / 3624

页数：15

共 50 条

[31] Heart Sound Classification Using Multi Modal Data Representation and Deep Learning
Lee, Jang Hyung
Kyung, Sun Young
Oh, Pyung Chun
Kim, Kwang Gi
Shin, Dong Jin
JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2020, 10 (03) : 537 - 543
[32] Deep Learning Based Emotion Recognition and Visualization of Figural Representation
Lu, Xiaofeng
FRONTIERS IN PSYCHOLOGY, 2022, 12
[33] A deep interpretable representation learning method for speech emotion recognition
Jing, Erkang
Liu, Yezheng
Chai, Yidong
Sun, Jianshan
Samtani, Sagar
Jiang, Yuanchun
Qian, Yang
INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (06)
[34] Deep Learning Based Face Recognition with Sparse Representation Classification
Cheng, Eric-Juwei
Prasad, Mukesh
Puthal, Deepak
Sharma, Nabin
Prasad, Om Kumar
Chin, Po-Hao
Lin, Chin-Teng
Blumenstein, Michael
NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 665 - 674
[35] Event-Driven Continuous STDP Learning With Deep Structure for Visual Pattern Recognition
Liu, Daqi
Yue, Shigang
IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (04) : 1377 - 1390
[36] Brain MRI analysis using a deep learning based evolutionary approach
Shahamat, Hossein
Abadeh, Mohammad Saniee
NEURAL NETWORKS, 2020, 126 (218-234) : 218 - 234
[37] Using Deep Learning Approach in Flight Exceedance Event Analysis
Shyur, Huan-Jyh
Cheng, Chi-Bin
Hsiao, Yu-Lin
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2021, 37 (06) : 1405 - 1418
[38] A survey of Deep Learning for Polyphonic Sound event detection
Dang, An
Vu, Toan H.
Wang, Jia-Ching
PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGIES (ICOT), 2017, : 75 - 78
[39] Amazigh Spoken Digit Recognition using a Deep Learning Approach based on MFCC
Boulal, Hossam
Hamidi, Mohamed
Abarkan, Mustapha
Barkani, Jamal
INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2023, 14 (07) : 791 - 798
[40] A Novel Approach for Emotion Recognition Based on EEG Signal Using Deep Learning
Abdulrahman, Awf
Baykara, Muhammet
Alakus, Talha Burak
APPLIED SCIENCES-BASEL, 2022, 12 (19):

← 1 2 3 4 5 →