AReN: A Deep Learning Approach for Sound Event Recognition Using a Brain Inspired Representation

被引:35
|
作者
Greco, Antonio [1 ]
Petkov, Nicolai [2 ]
Saggese, Alessia [1 ]
Vento, Mario [1 ]
机构
[1] Univ Salerno, Dept Informat Engn Elect Engn & Appl Math, I-84084 Fisciano, Italy
[2] Univ Groningen, Fac Sci & Engn, NL-9712 CP Groningen, Netherlands
关键词
Training; Time-frequency analysis; Machine learning; Spectrogram; Surveillance; Signal to noise ratio; Standards; audio surveillance; deep learning; CNN; gammatonegram; brain inspired representation; NEURAL-NETWORK; CLASSIFICATION; SURVEILLANCE; FEATURES; PATTERN;
D O I
10.1109/TIFS.2020.2994740
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Audio surveillance is gaining in the last years wide interest. This is due to the large number of situations in which this kind of systems can be used, either alone or combined with video-based algorithms. In this paper we propose a deep learning method to automatically recognize events of interest in the context of audio surveillance (namely screams, broken glasses and gun shots). The audio stream is represented by a gammatonegram image. We propose a 21-layer CNN to which we feed sections of the gammatonegram representation. At the output of this CNN there are units that correspond to the classes. We trained the CNN, called AReN, by taking advantage of a problem-driven data augmentation, which extends the training dataset with gammatonegram images extracted by sounds acquired with different signal to noise ratios. We experimented it with three datasets freely available, namely SESA, MIVIA Audio Events and MIVIA Road Events and we achieved 91.43%, 99.62% and 100% recognition rate, respectively. We compared our method with other state of the art methodologies based both on traditional machine learning methodologies and deep learning. The comparison confirms the effectiveness of the proposed approach, which outperforms the existing methods in terms of recognition rate. We experimentally prove that the proposed network is resilient to the noise, has the capability to significantly reduce the false positive rate and is able to generalize in different scenarios. Furthermore, AReN is able to process 5 audio frames per second on a standard CPU and, consequently, it is suitable for real audio surveillance applications.
引用
收藏
页码:3610 / 3624
页数:15
相关论文
共 50 条
  • [1] Multi-task deep learning approach for sound event recognition and tracking
    Chen, Tzung-Shi
    Chen, Ming-Ju
    Chen, Tzung-Cheng
    INTERNATIONAL JOURNAL OF AD HOC AND UBIQUITOUS COMPUTING, 2024, 46 (02) : 104 - 121
  • [2] A Robust Approach for Gender Recognition using Deep Learning
    Arora, Shefali
    Bhatia, M. P. S.
    2018 9TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2018,
  • [3] Automatic Unusual Activities Recognition Using Deep Learning in Academia
    Ramzan, Muhammad
    Abid, Adnan
    Awan, Shahid Mahmood
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (01): : 1829 - 1844
  • [4] Click-event sound detection in automotive industry using machine/deep learning
    Espinosa, Ricardo
    Ponce, Hiram
    Gutierrez, Sebastian
    APPLIED SOFT COMPUTING, 2021, 108
  • [5] Occupant behavior monitoring and emergency event detection in single-person households using deep learning-based sound recognition
    Kim, Jinwoo
    Min, Kyungjun
    Jung, Minhyuk
    Chi, Seokho
    BUILDING AND ENVIRONMENT, 2020, 181
  • [6] Recognition of Arabic Accents From English Spoken Speech Using Deep Learning Approach
    Habbash, Mansoor
    Mnasri, Sami
    Alghamdi, Mansoor
    Alrashidi, Malek
    Tarawneh, Ahmad S.
    Gumair, Abdullah
    Hassanat, Ahmad B.
    IEEE ACCESS, 2024, 12 : 37219 - 37230
  • [7] Detection of Brain Tumour Using Deep Learning
    Ahmed, Waqar
    Konur, Savas
    ARTIFICIAL INTELLIGENCE XXXVIII, 2021, 13101 : 133 - 138
  • [8] Human Activity Recognition by Using Different Deep Learning Approaches for Wearable Sensors
    Erdas, Cagatay Berke
    Guney, Selda
    NEURAL PROCESSING LETTERS, 2021, 53 (03) : 1795 - 1809
  • [9] Generative Model Driven Representation Learning in a Hybrid Framework for Environmental Audio Scene and Sound Event Recognition
    Chandrakala, S.
    Jayalakshmi, S. L.
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (01) : 3 - 14
  • [10] Human Brain Waves Study Using EEG and Deep Learning for Emotion Recognition
    Priyadarshani, Muskan
    Kumar, Pushpendra
    Babulal, Kanojia Sindhuben
    Rajput, Dharmendra Singh
    Patel, Harshita
    IEEE ACCESS, 2024, 12 : 101842 - 101850