AReN: A Deep Learning Approach for Sound Event Recognition Using a Brain Inspired Representation

被引：35

作者：

Greco, Antonio ^{[1
]}

Petkov, Nicolai ^{[2
]}

Saggese, Alessia ^{[1
]}

Vento, Mario ^{[1
]}

机构：

[1] Univ Salerno, Dept Informat Engn Elect Engn & Appl Math, I-84084 Fisciano, Italy

[2] Univ Groningen, Fac Sci & Engn, NL-9712 CP Groningen, Netherlands

来源：

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY | 2020年 / 15卷

关键词：

Training; Time-frequency analysis; Machine learning; Spectrogram; Surveillance; Signal to noise ratio; Standards; audio surveillance; deep learning; CNN; gammatonegram; brain inspired representation; NEURAL-NETWORK; CLASSIFICATION; SURVEILLANCE; FEATURES; PATTERN;

D O I：

10.1109/TIFS.2020.2994740

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Audio surveillance is gaining in the last years wide interest. This is due to the large number of situations in which this kind of systems can be used, either alone or combined with video-based algorithms. In this paper we propose a deep learning method to automatically recognize events of interest in the context of audio surveillance (namely screams, broken glasses and gun shots). The audio stream is represented by a gammatonegram image. We propose a 21-layer CNN to which we feed sections of the gammatonegram representation. At the output of this CNN there are units that correspond to the classes. We trained the CNN, called AReN, by taking advantage of a problem-driven data augmentation, which extends the training dataset with gammatonegram images extracted by sounds acquired with different signal to noise ratios. We experimented it with three datasets freely available, namely SESA, MIVIA Audio Events and MIVIA Road Events and we achieved 91.43%, 99.62% and 100% recognition rate, respectively. We compared our method with other state of the art methodologies based both on traditional machine learning methodologies and deep learning. The comparison confirms the effectiveness of the proposed approach, which outperforms the existing methods in terms of recognition rate. We experimentally prove that the proposed network is resilient to the noise, has the capability to significantly reduce the false positive rate and is able to generalize in different scenarios. Furthermore, AReN is able to process 5 audio frames per second on a standard CPU and, consequently, it is suitable for real audio surveillance applications.

引用

页码：3610 / 3624

页数：15

共 50 条

[41] Footballer Detection on Position Based Classification Recognition using Deep Learning Approach
Rashid, Fadilla Atyka Nor
Liew, Siaw-Hong
2022 INTERNATIONAL CONFERENCE ON GREEN ENERGY, COMPUTING AND SUSTAINABLE TECHNOLOGY (GECOST), 2022, : 193 - 197
[42] Event Image Classification using Deep Learning
Suganthi, S. Regina Lourdhu
Hanumanthappa, M.
Kavitha, S.
IEEE INTERNATIONAL CONFERENCE ON SOFT-COMPUTING AND NETWORK SECURITY (ICSNS 2018), 2018, : 99 - 106
[43] Hyperspectral Image Recognition Using SVM Combined Deep Learning
Li, Yifan
Li, Junbao
Pan, Jeng-Shyang
JOURNAL OF INTERNET TECHNOLOGY, 2019, 20 (03): : 851 - 859
[44] Intra Class Vegetable Recognition System using Deep Learning
Duth, Sudharshan P.
Jayasimha, K.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS 2020), 2020, : 602 - 606
[45] People Identification through Facial Recognition using Deep Learning
Chacua, Bolivar
Garcia, Ivan
Rosero, Paul
Suarez, Luis
Ramirez, Ivan
Simbana, Zhima
Pusda, Marco
2019 IEEE LATIN AMERICAN CONFERENCE ON COMPUTATIONAL INTELLIGENCE (LA-CCI), 2019, : 244 - 249
[46] Facial expression recognition using lightweight deep learning modeling
Ahmad, Mubashir
Saira
Alfandi, Omar
Khattak, Asad Masood
Qadri, Syed Furqan
Saeed, Iftikhar Ahmed
Khan, Salabat
Hayat, Bashir
Ahmad, Arshad
MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (05) : 8208 - 8225
[47] Using Deep Learning for Exploration and Recognition of Objects based on Images
Contreras, Stevenson
De la Rosa, Fernando
PROCEEDINGS OF 13TH LATIN AMERICAN ROBOTICS SYMPOSIUM AND 4TH BRAZILIAN SYMPOSIUM ON ROBOTICS - LARS/SBR 2016, 2016, : 1 - 6
[48] Fruit recognition from images using deep learning applications
Gill, Harmandeep Singh
Murugesan, Ganpathy
Khehra, Baljit Singh
Sajja, Guna Sekhar
Gupta, Gaurav
Bhatt, Abhishek
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (23) : 33269 - 33290
[49] Speech Emotion Recognition Using Deep Learning Techniques: A Review
Khalil, Ruhul Amin
Jones, Edward
Babar, Mohammad Inayatullah
Jan, Tariqullah
Zafar, Mohammad Haseeb
Alhussain, Thamer
IEEE ACCESS, 2019, 7 : 117327 - 117345
[50] Fruit recognition from images using deep learning applications
Harmandeep Singh Gill
Ganpathy Murugesan
Baljit Singh Khehra
Guna Sekhar Sajja
Gaurav Gupta
Abhishek Bhatt
Multimedia Tools and Applications, 2022, 81 : 33269 - 33290

← 1 2 3 4 5 →