Acoustic event recognition using cochleagram image and convolutional neural networks

被引:44
作者
Sharan, Roneel V. [1 ]
Moir, Tom J. [2 ]
机构
[1] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld 4072, Australia
[2] Auckland Univ Technol, Sch Engn, Private Bag 92006, Auckland 1142, New Zealand
关键词
Acoustic event recognition; Cochleagram; Convolutional neural network; Mel-spectrogram; Spectrogram; FEATURES; CLASSIFICATION;
D O I
10.1016/j.apacoust.2018.12.006
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Convolutional neural networks (CNN) have produced encouraging results in image classification tasks and have been increasingly adopted in audio classification applications. However, in using CNN for acoustic event recognition, the first hurdle is finding the best image representation of an audio signal. In this work, we evaluate the performance of four time-frequency representations for use with CNN. Firstly, we consider the conventional spectrogram image. Secondly, we apply moving average to the spectrogram along the frequency domain to obtain what we refer as the smoothed spectrogram. Thirdly, we use the mel-spectrogram which utilizes the mel-filter, as in mel-frequency cepstral coefficients. Finally, we propose the use of a cochleagram image the frequency components of which are based on the frequency selectivity property of the human cochlea. We test the proposed techniques on an acoustic event database containing 50 sound classes. The results show that the proposed cochleagram time-frequency image representation gives the best classification performance when used with CNN. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:62 / 66
页数:5
相关论文
共 24 条
[1]   Convolutional Neural Networks for Speech Recognition [J].
Abdel-Hamid, Ossama ;
Mohamed, Abdel-Rahman ;
Jiang, Hui ;
Deng, Li ;
Penn, Gerald ;
Yu, Dong .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545
[2]   Fall detection through acoustic Local Ternary Patterns [J].
Adnan, Syed M. ;
Irtaza, Aun ;
Aziz, Sumair ;
Ullah, M. Obaid ;
Javed, Ali ;
Mahmood, Muhammad Tariq .
APPLIED ACOUSTICS, 2018, 140 :296-300
[3]  
[Anonymous], 1987, Speech communication: Human and machine
[4]  
BISHOP C. M., 2006, Pattern recognition and machine learning, DOI [DOI 10.1117/1.2819119, 10.1007/978-0-387-45528-0]
[5]  
Ciregan D, 2012 IEEE C COMP VIS, P3642
[6]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[7]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[8]   Spectrogram Image Feature for Sound Event Classification in Mismatched Conditions [J].
Dennis, Jonathan ;
Tran, Huy Dat ;
Li, Haizhou .
IEEE SIGNAL PROCESSING LETTERS, 2011, 18 (02) :130-133
[9]   Cochleagram-based audio pattern separation using two-dimensional non-negative matrix factorization with automatic sparsity adaptation [J].
Gao, Bin ;
Woo, W. L. ;
Khor, C. .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2014, 135 (03) :1171-1185
[10]   A COCHLEAR FREQUENCY-POSITION FUNCTION FOR SEVERAL SPECIES - 29 YEARS LATER [J].
GREENWOOD, DD .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (06) :2592-2605