Acoustic event recognition using cochleagram image and convolutional neural networks

被引:41
|
作者
Sharan, Roneel V. [1 ]
Moir, Tom J. [2 ]
机构
[1] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld 4072, Australia
[2] Auckland Univ Technol, Sch Engn, Private Bag 92006, Auckland 1142, New Zealand
关键词
Acoustic event recognition; Cochleagram; Convolutional neural network; Mel-spectrogram; Spectrogram; FEATURES; CLASSIFICATION;
D O I
10.1016/j.apacoust.2018.12.006
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Convolutional neural networks (CNN) have produced encouraging results in image classification tasks and have been increasingly adopted in audio classification applications. However, in using CNN for acoustic event recognition, the first hurdle is finding the best image representation of an audio signal. In this work, we evaluate the performance of four time-frequency representations for use with CNN. Firstly, we consider the conventional spectrogram image. Secondly, we apply moving average to the spectrogram along the frequency domain to obtain what we refer as the smoothed spectrogram. Thirdly, we use the mel-spectrogram which utilizes the mel-filter, as in mel-frequency cepstral coefficients. Finally, we propose the use of a cochleagram image the frequency components of which are based on the frequency selectivity property of the human cochlea. We test the proposed techniques on an acoustic event database containing 50 sound classes. The results show that the proposed cochleagram time-frequency image representation gives the best classification performance when used with CNN. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:62 / 66
页数:5
相关论文
共 50 条
  • [1] Time-Frequency Image Resizing Using Interpolation for Acoustic Event Recognition with Convolutional Neural Networks
    Sharan, Roneel V.
    Moir, Tom J.
    2019 IEEE INTERNATIONAL CONFERENCE ON SIGNALS AND SYSTEMS (ICSIGSYS), 2019, : 8 - 11
  • [2] Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Recognition
    Takahashi, Naoya
    Gygli, Michael
    Pfister, Beat
    Van Goole, Luc
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2982 - 2986
  • [3] Pseudo-color cochleagram image feature and sequential feature selection for robust acoustic event recognition
    Sharan, Roneel V.
    Moir, Tom J.
    APPLIED ACOUSTICS, 2018, 140 : 198 - 204
  • [4] ROBUST SOUND EVENT RECOGNITION USING CONVOLUTIONAL NEURAL NETWORKS
    Zhang, Haomin
    McLoughlin, Ian
    Song, Yan
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 559 - 563
  • [5] Flotation froth image recognition with convolutional neural networks
    Fu, Y.
    Aldrich, C.
    MINERALS ENGINEERING, 2019, 132 : 183 - 190
  • [6] Improved Convolutional Neural Networks for Acoustic Event Classification
    Tang, Guichen
    Liang, Ruiyu
    Xie, Yue
    Bao, Yongqiang
    Wang, Shijia
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (12) : 15801 - 15816
  • [7] Improved Convolutional Neural Networks for Acoustic Event Classification
    Guichen Tang
    Ruiyu Liang
    Yue Xie
    Yongqiang Bao
    Shijia Wang
    Multimedia Tools and Applications, 2019, 78 : 15801 - 15816
  • [8] Robust Convolutional Neural Networks for Image Recognition
    Albeahdili, Hayder M.
    Alwzwazy, Haider A.
    Islam, Naz E.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2015, 6 (11) : 105 - 111
  • [9] Using convolutional neural networks for tick image recognition – a preliminary exploration
    Oghenekaro Omodior
    Mohammad R. Saeedpour-Parizi
    Md. Khaledur Rahman
    Ariful Azad
    Keith Clay
    Experimental and Applied Acarology, 2021, 84 : 607 - 622
  • [10] Acoustic Scene Recognition Based on Convolutional Neural Networks
    Sun, Fengjiao
    Wang, Mingjiang
    Xu, Qihang
    Xuan, Xiaogung
    Zhang, Xin
    2019 IEEE 4TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP 2019), 2019, : 122 - 126