Noise robust sound event classification with convolutional neural network

被引：61

作者：

Ozer, Ilyas ^{[1
]}

Ozer, Zeynep ^{[1
]}

Findik, Oguz ^{[1
]}

机构：

[1] Karabuk Univ, Comp Engn Dept, Karabuk, Turkey

来源：

NEUROCOMPUTING | 2018年 / 272卷

关键词：

Sound event classification; Convolutional neural networks; Spectrogram; RECOGNITION; RETRIEVAL; DEEP;

D O I：

10.1016/j.neucom.2017.07.021

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automatic sound recognition (ASR) is a remarkable field of research in recent years. The ability to automatically recognize sound events through computers in a complex audio environment is very useful for machine hearing, acoustic surveillance and multimedia retrieval applications. On the other hand, ASR task become highly difficult as the ambient noise levels increase and many traditional methods show very weak performance under noise. Recent studies has shown that spectrogram image features (SIF) have high performance under noise, while success rates in clean conditions are relatively lower than in the state-of-the-art approaches. In this study, after converting highly overlapped spectrograms into linear quantized images and reducing dimensions by applying various image resizing methods, feature extraction and classification are performed with convolutional neural networks (CNN), which have very high performance in image classification. In the mismatched case, the proposed method achieves a performance improvement of 4.5%, which is equivalent to a relative error reduction of 63.4%, with a classification success of 97.4%, while the multicondition training method achieves an average of 98.63% success rate. (C) 2017 Elsevier B.V. All rights reserved.

引用

页码：505 / 512

页数：8

共 54 条

[1] Abdel-Hamid O, 2012, INT CONF ACOUST SPEE, P4277, DOI 10.1109/ICASSP.2012.6288864
[2] [Anonymous], 2013, Graphics Gems
[3] [Anonymous], 2009, P INT C ART INT STAT
[4] [Anonymous], AC SPEECH SIGN PROC
[5] [Anonymous], THESIS
[6] [Anonymous], 2011, 22 INT JT C ART INT, DOI 10.5555/2283516.2283603
[7] AdaBoost-based artificial neural network learning
Baig, Mirza M.
Awais, Mian M.
El-Alfy, El-Sayed M.
[J]. NEUROCOMPUTING, 2017, 248 : 120 - 126
[8] Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images
Cheng, Gong
Zhou, Peicheng
Han, Junwei
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2016, 54 (12): : 7405 - 7415
[9] Environmental Sound Recognition With Time-Frequency Audio Features
Chu, Selina
Narayanan, Shrikanth
Kuo, C. -C. Jay
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06): : 1142 - 1158
[10] Ciresan D, 2012, PROC CVPR IEEE, P3642, DOI 10.1109/CVPR.2012.6248110

← 1 2 3 4 5 6 →