AReN: A Deep Learning Approach for Sound Event Recognition Using a Brain Inspired Representation

被引:35
|
作者
Greco, Antonio [1 ]
Petkov, Nicolai [2 ]
Saggese, Alessia [1 ]
Vento, Mario [1 ]
机构
[1] Univ Salerno, Dept Informat Engn Elect Engn & Appl Math, I-84084 Fisciano, Italy
[2] Univ Groningen, Fac Sci & Engn, NL-9712 CP Groningen, Netherlands
关键词
Training; Time-frequency analysis; Machine learning; Spectrogram; Surveillance; Signal to noise ratio; Standards; audio surveillance; deep learning; CNN; gammatonegram; brain inspired representation; NEURAL-NETWORK; CLASSIFICATION; SURVEILLANCE; FEATURES; PATTERN;
D O I
10.1109/TIFS.2020.2994740
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Audio surveillance is gaining in the last years wide interest. This is due to the large number of situations in which this kind of systems can be used, either alone or combined with video-based algorithms. In this paper we propose a deep learning method to automatically recognize events of interest in the context of audio surveillance (namely screams, broken glasses and gun shots). The audio stream is represented by a gammatonegram image. We propose a 21-layer CNN to which we feed sections of the gammatonegram representation. At the output of this CNN there are units that correspond to the classes. We trained the CNN, called AReN, by taking advantage of a problem-driven data augmentation, which extends the training dataset with gammatonegram images extracted by sounds acquired with different signal to noise ratios. We experimented it with three datasets freely available, namely SESA, MIVIA Audio Events and MIVIA Road Events and we achieved 91.43%, 99.62% and 100% recognition rate, respectively. We compared our method with other state of the art methodologies based both on traditional machine learning methodologies and deep learning. The comparison confirms the effectiveness of the proposed approach, which outperforms the existing methods in terms of recognition rate. We experimentally prove that the proposed network is resilient to the noise, has the capability to significantly reduce the false positive rate and is able to generalize in different scenarios. Furthermore, AReN is able to process 5 audio frames per second on a standard CPU and, consequently, it is suitable for real audio surveillance applications.
引用
收藏
页码:3610 / 3624
页数:15
相关论文
共 50 条
  • [11] Symbolic Deep Networks: A Psychologically Inspired Lightweight and Efficient Approach to Deep Learning
    Veksler, Vladislav D.
    Hoffman, Blaine E.
    Buchler, Norbou
    TOPICS IN COGNITIVE SCIENCE, 2022, 14 (04) : 702 - 717
  • [12] Sound Event Localization and Detection Based on Deep Learning
    Zhao, Dada
    Ding, Kai
    Qi, Xiaogang
    Chen, Yu
    Feng, Hailin
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2024, 35 (02) : 294 - 301
  • [13] Speech Emotion Recognition and Deep Learning: An Extensive Validation Using Convolutional Neural Networks
    Ri, Francesco Ardan Dal
    Ciardi, Fabio Cifariello
    Conci, Nicola
    IEEE ACCESS, 2023, 11 : 116638 - 116649
  • [14] Brain Tumor Detection Using Machine Learning and Deep Learning: A Review
    Lotlikar, Venkatesh S.
    Satpute, Nitin
    Gupta, Aditya
    CURRENT MEDICAL IMAGING, 2022, 18 (06) : 604 - 622
  • [15] Detecting brain tumors using deep learning convolutional neural network with transfer learning approach
    Anjum, Sadia
    Hussain, Lal
    Ali, Mushtaq
    Alkinani, Monagi H.
    Aziz, Wajid
    Gheller, Sabrina
    Abbasi, Adeel Ahmed
    Marchal, Ali Raza
    Suresh, Harshini
    Duong, Tim Q.
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2022, 32 (01) : 307 - 323
  • [16] A Hybrid Deep Learning Framework for Automatic Detection of Brain Tumours Using Different Modalities
    Sahu, Adyasha
    Das, Pradeep Kumar
    Paul, Indraneel
    Meher, Sukadev
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
  • [17] Rare Sound Event Detection Using Deep Learning and Data Augmentation
    Chen, Yanping
    Jin, Hongxia
    INTERSPEECH 2019, 2019, : 619 - 623
  • [18] Emotion recognition in EEG signals using deep learning methods: A review
    Jafari, Mahboobeh
    Shoeibi, Afshin
    Khodatars, Marjane
    Bagherzadeh, Sara
    Shalbaf, Ahmad
    Garcia, David Lopez
    Gorriz, Juan M.
    Acharya, U. Rajendra
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 165
  • [19] Character Recognition using Machine Learning and Deep Learning - A Survey
    Sharma, Reya
    Kaushik, Baijnath
    Gondhi, Naveen
    2020 INTERNATIONAL CONFERENCE ON EMERGING SMART COMPUTING AND INFORMATICS (ESCI), 2020, : 341 - 345
  • [20] A comparative study: prediction of parkinson's disease using machine learning, deep learning and nature inspired algorithm
    Keserwani, Pankaj Kumar
    Das, Suman
    Sarkar, Nairita
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (27) : 69393 - 69441