AReN: A Deep Learning Approach for Sound Event Recognition Using a Brain Inspired Representation

被引:35
|
作者
Greco, Antonio [1 ]
Petkov, Nicolai [2 ]
Saggese, Alessia [1 ]
Vento, Mario [1 ]
机构
[1] Univ Salerno, Dept Informat Engn Elect Engn & Appl Math, I-84084 Fisciano, Italy
[2] Univ Groningen, Fac Sci & Engn, NL-9712 CP Groningen, Netherlands
关键词
Training; Time-frequency analysis; Machine learning; Spectrogram; Surveillance; Signal to noise ratio; Standards; audio surveillance; deep learning; CNN; gammatonegram; brain inspired representation; NEURAL-NETWORK; CLASSIFICATION; SURVEILLANCE; FEATURES; PATTERN;
D O I
10.1109/TIFS.2020.2994740
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Audio surveillance is gaining in the last years wide interest. This is due to the large number of situations in which this kind of systems can be used, either alone or combined with video-based algorithms. In this paper we propose a deep learning method to automatically recognize events of interest in the context of audio surveillance (namely screams, broken glasses and gun shots). The audio stream is represented by a gammatonegram image. We propose a 21-layer CNN to which we feed sections of the gammatonegram representation. At the output of this CNN there are units that correspond to the classes. We trained the CNN, called AReN, by taking advantage of a problem-driven data augmentation, which extends the training dataset with gammatonegram images extracted by sounds acquired with different signal to noise ratios. We experimented it with three datasets freely available, namely SESA, MIVIA Audio Events and MIVIA Road Events and we achieved 91.43%, 99.62% and 100% recognition rate, respectively. We compared our method with other state of the art methodologies based both on traditional machine learning methodologies and deep learning. The comparison confirms the effectiveness of the proposed approach, which outperforms the existing methods in terms of recognition rate. We experimentally prove that the proposed network is resilient to the noise, has the capability to significantly reduce the false positive rate and is able to generalize in different scenarios. Furthermore, AReN is able to process 5 audio frames per second on a standard CPU and, consequently, it is suitable for real audio surveillance applications.
引用
收藏
页码:3610 / 3624
页数:15
相关论文
共 50 条
  • [41] Footballer Detection on Position Based Classification Recognition using Deep Learning Approach
    Rashid, Fadilla Atyka Nor
    Liew, Siaw-Hong
    2022 INTERNATIONAL CONFERENCE ON GREEN ENERGY, COMPUTING AND SUSTAINABLE TECHNOLOGY (GECOST), 2022, : 193 - 197
  • [42] Event Image Classification using Deep Learning
    Suganthi, S. Regina Lourdhu
    Hanumanthappa, M.
    Kavitha, S.
    IEEE INTERNATIONAL CONFERENCE ON SOFT-COMPUTING AND NETWORK SECURITY (ICSNS 2018), 2018, : 99 - 106
  • [43] Hyperspectral Image Recognition Using SVM Combined Deep Learning
    Li, Yifan
    Li, Junbao
    Pan, Jeng-Shyang
    JOURNAL OF INTERNET TECHNOLOGY, 2019, 20 (03): : 851 - 859
  • [44] Intra Class Vegetable Recognition System using Deep Learning
    Duth, Sudharshan P.
    Jayasimha, K.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS 2020), 2020, : 602 - 606
  • [45] People Identification through Facial Recognition using Deep Learning
    Chacua, Bolivar
    Garcia, Ivan
    Rosero, Paul
    Suarez, Luis
    Ramirez, Ivan
    Simbana, Zhima
    Pusda, Marco
    2019 IEEE LATIN AMERICAN CONFERENCE ON COMPUTATIONAL INTELLIGENCE (LA-CCI), 2019, : 244 - 249
  • [46] Facial expression recognition using lightweight deep learning modeling
    Ahmad, Mubashir
    Saira
    Alfandi, Omar
    Khattak, Asad Masood
    Qadri, Syed Furqan
    Saeed, Iftikhar Ahmed
    Khan, Salabat
    Hayat, Bashir
    Ahmad, Arshad
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (05) : 8208 - 8225
  • [47] Using Deep Learning for Exploration and Recognition of Objects based on Images
    Contreras, Stevenson
    De la Rosa, Fernando
    PROCEEDINGS OF 13TH LATIN AMERICAN ROBOTICS SYMPOSIUM AND 4TH BRAZILIAN SYMPOSIUM ON ROBOTICS - LARS/SBR 2016, 2016, : 1 - 6
  • [48] Fruit recognition from images using deep learning applications
    Gill, Harmandeep Singh
    Murugesan, Ganpathy
    Khehra, Baljit Singh
    Sajja, Guna Sekhar
    Gupta, Gaurav
    Bhatt, Abhishek
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (23) : 33269 - 33290
  • [49] Speech Emotion Recognition Using Deep Learning Techniques: A Review
    Khalil, Ruhul Amin
    Jones, Edward
    Babar, Mohammad Inayatullah
    Jan, Tariqullah
    Zafar, Mohammad Haseeb
    Alhussain, Thamer
    IEEE ACCESS, 2019, 7 : 117327 - 117345
  • [50] Fruit recognition from images using deep learning applications
    Harmandeep Singh Gill
    Ganpathy Murugesan
    Baljit Singh Khehra
    Guna Sekhar Sajja
    Gaurav Gupta
    Abhishek Bhatt
    Multimedia Tools and Applications, 2022, 81 : 33269 - 33290