AReN: A Deep Learning Approach for Sound Event Recognition Using a Brain Inspired Representation

被引:35
|
作者
Greco, Antonio [1 ]
Petkov, Nicolai [2 ]
Saggese, Alessia [1 ]
Vento, Mario [1 ]
机构
[1] Univ Salerno, Dept Informat Engn Elect Engn & Appl Math, I-84084 Fisciano, Italy
[2] Univ Groningen, Fac Sci & Engn, NL-9712 CP Groningen, Netherlands
关键词
Training; Time-frequency analysis; Machine learning; Spectrogram; Surveillance; Signal to noise ratio; Standards; audio surveillance; deep learning; CNN; gammatonegram; brain inspired representation; NEURAL-NETWORK; CLASSIFICATION; SURVEILLANCE; FEATURES; PATTERN;
D O I
10.1109/TIFS.2020.2994740
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Audio surveillance is gaining in the last years wide interest. This is due to the large number of situations in which this kind of systems can be used, either alone or combined with video-based algorithms. In this paper we propose a deep learning method to automatically recognize events of interest in the context of audio surveillance (namely screams, broken glasses and gun shots). The audio stream is represented by a gammatonegram image. We propose a 21-layer CNN to which we feed sections of the gammatonegram representation. At the output of this CNN there are units that correspond to the classes. We trained the CNN, called AReN, by taking advantage of a problem-driven data augmentation, which extends the training dataset with gammatonegram images extracted by sounds acquired with different signal to noise ratios. We experimented it with three datasets freely available, namely SESA, MIVIA Audio Events and MIVIA Road Events and we achieved 91.43%, 99.62% and 100% recognition rate, respectively. We compared our method with other state of the art methodologies based both on traditional machine learning methodologies and deep learning. The comparison confirms the effectiveness of the proposed approach, which outperforms the existing methods in terms of recognition rate. We experimentally prove that the proposed network is resilient to the noise, has the capability to significantly reduce the false positive rate and is able to generalize in different scenarios. Furthermore, AReN is able to process 5 audio frames per second on a standard CPU and, consequently, it is suitable for real audio surveillance applications.
引用
收藏
页码:3610 / 3624
页数:15
相关论文
共 50 条
  • [21] Adaptation of deep learning auditory event recognition and detection in audio surveillance systems
    Alsubhi, Sara
    Alkabsani, Ahad
    Endargiri, Safiah
    Laabidi, Kaouther
    INTERNATIONAL JOURNAL OF MODELLING IDENTIFICATION AND CONTROL, 2021, 38 (3-4) : 241 - 247
  • [22] Click Event Sound Detection Using Machine Learning in Automotive Industry
    Espinosa, Ricardo
    Ponce, Hiram
    Gutierrez, Sebastian
    Hernandez, Eluney
    ADVANCES IN SOFT COMPUTING, MICAI 2020, PT I, 2020, 12468 : 88 - 103
  • [23] Application of deep learning approach for recognition of voiced Odia digits
    Mohanty, Prithviraj
    Sahoo, Jyoti Prakash
    Nayak, Ajit Kumar
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2022, 25 (05) : 513 - 522
  • [24] Environmental sound recognition on embedded devices using deep learning: a review
    Gairi, Pau
    Palleja, Tomas
    Tresanchez, Marcel
    ARTIFICIAL INTELLIGENCE REVIEW, 2025, 58 (06)
  • [25] A mass correlation based deep learning approach using deep Convolutional neural network to classify the brain tumor
    Satyanarayana, Gandi
    Naidu, P. Appala
    Desanamukula, Venkata Subbaiah
    Kumar, Kadupukotla Satish
    Rao, B. Chinna
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 81
  • [26] TumorAwareNet: Deep representation learning with attention based sparse convolutional denoising autoencoder for brain tumor recognition
    Bodapati, Jyostna Devi
    Balaji, Bharadwaj Bagepalli
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (8) : 22099 - 22117
  • [27] Human Action Recognition Using Deep Learning Methods on Limited Sensory Data
    Tufek, Nilay
    Yalcin, Murat
    Altintas, Mucahit
    Kalaoglu, Fatma
    Li, Yi
    Bahadir, Senem Kursun
    IEEE SENSORS JOURNAL, 2020, 20 (06) : 3101 - 3112
  • [28] Classification of Satellite Images Using a Deep Learning-Inspired Hybrid Novel Approach
    Pandey, Bihari Nandan
    Pandey, Mahima Shanker
    TRAITEMENT DU SIGNAL, 2024, 41 (05) : 2529 - 2538
  • [29] Recognition of Kannada characters using deep learning approach
    Indira, K.
    Karki, Maya, V
    Mallika, H.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (03) : 2333 - 2346
  • [30] Deep Representation Learning for Multimodal Emotion Recognition Using Physiological Signals
    Zubair, Muhammad
    Woo, Sungpil
    Lim, Sunhwan
    Yoon, Changwoo
    IEEE ACCESS, 2024, 12 : 106605 - 106617