AReN: A Deep Learning Approach for Sound Event Recognition Using a Brain Inspired Representation

被引:35
|
作者
Greco, Antonio [1 ]
Petkov, Nicolai [2 ]
Saggese, Alessia [1 ]
Vento, Mario [1 ]
机构
[1] Univ Salerno, Dept Informat Engn Elect Engn & Appl Math, I-84084 Fisciano, Italy
[2] Univ Groningen, Fac Sci & Engn, NL-9712 CP Groningen, Netherlands
关键词
Training; Time-frequency analysis; Machine learning; Spectrogram; Surveillance; Signal to noise ratio; Standards; audio surveillance; deep learning; CNN; gammatonegram; brain inspired representation; NEURAL-NETWORK; CLASSIFICATION; SURVEILLANCE; FEATURES; PATTERN;
D O I
10.1109/TIFS.2020.2994740
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Audio surveillance is gaining in the last years wide interest. This is due to the large number of situations in which this kind of systems can be used, either alone or combined with video-based algorithms. In this paper we propose a deep learning method to automatically recognize events of interest in the context of audio surveillance (namely screams, broken glasses and gun shots). The audio stream is represented by a gammatonegram image. We propose a 21-layer CNN to which we feed sections of the gammatonegram representation. At the output of this CNN there are units that correspond to the classes. We trained the CNN, called AReN, by taking advantage of a problem-driven data augmentation, which extends the training dataset with gammatonegram images extracted by sounds acquired with different signal to noise ratios. We experimented it with three datasets freely available, namely SESA, MIVIA Audio Events and MIVIA Road Events and we achieved 91.43%, 99.62% and 100% recognition rate, respectively. We compared our method with other state of the art methodologies based both on traditional machine learning methodologies and deep learning. The comparison confirms the effectiveness of the proposed approach, which outperforms the existing methods in terms of recognition rate. We experimentally prove that the proposed network is resilient to the noise, has the capability to significantly reduce the false positive rate and is able to generalize in different scenarios. Furthermore, AReN is able to process 5 audio frames per second on a standard CPU and, consequently, it is suitable for real audio surveillance applications.
引用
收藏
页码:3610 / 3624
页数:15
相关论文
共 50 条
  • [31] Heart Sound Classification Using Multi Modal Data Representation and Deep Learning
    Lee, Jang Hyung
    Kyung, Sun Young
    Oh, Pyung Chun
    Kim, Kwang Gi
    Shin, Dong Jin
    JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2020, 10 (03) : 537 - 543
  • [32] Deep Learning Based Emotion Recognition and Visualization of Figural Representation
    Lu, Xiaofeng
    FRONTIERS IN PSYCHOLOGY, 2022, 12
  • [33] A deep interpretable representation learning method for speech emotion recognition
    Jing, Erkang
    Liu, Yezheng
    Chai, Yidong
    Sun, Jianshan
    Samtani, Sagar
    Jiang, Yuanchun
    Qian, Yang
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (06)
  • [34] Deep Learning Based Face Recognition with Sparse Representation Classification
    Cheng, Eric-Juwei
    Prasad, Mukesh
    Puthal, Deepak
    Sharma, Nabin
    Prasad, Om Kumar
    Chin, Po-Hao
    Lin, Chin-Teng
    Blumenstein, Michael
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 665 - 674
  • [35] Event-Driven Continuous STDP Learning With Deep Structure for Visual Pattern Recognition
    Liu, Daqi
    Yue, Shigang
    IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (04) : 1377 - 1390
  • [36] Brain MRI analysis using a deep learning based evolutionary approach
    Shahamat, Hossein
    Abadeh, Mohammad Saniee
    NEURAL NETWORKS, 2020, 126 (218-234) : 218 - 234
  • [37] Using Deep Learning Approach in Flight Exceedance Event Analysis
    Shyur, Huan-Jyh
    Cheng, Chi-Bin
    Hsiao, Yu-Lin
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2021, 37 (06) : 1405 - 1418
  • [38] A survey of Deep Learning for Polyphonic Sound event detection
    Dang, An
    Vu, Toan H.
    Wang, Jia-Ching
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGIES (ICOT), 2017, : 75 - 78
  • [39] Amazigh Spoken Digit Recognition using a Deep Learning Approach based on MFCC
    Boulal, Hossam
    Hamidi, Mohamed
    Abarkan, Mustapha
    Barkani, Jamal
    INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2023, 14 (07) : 791 - 798
  • [40] A Novel Approach for Emotion Recognition Based on EEG Signal Using Deep Learning
    Abdulrahman, Awf
    Baykara, Muhammet
    Alakus, Talha Burak
    APPLIED SCIENCES-BASEL, 2022, 12 (19):