LEARNING ENVIRONMENTAL SOUNDS WITH END-TO-END CONVOLUTIONAL NEURAL NETWORK

被引:0
作者
Tokozume, Yuji [1 ]
Harada, Tatsuya [1 ]
机构
[1] Univ Tokyo, Tokyo, Japan
来源
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2017年
关键词
Environmental sound classification; convolutional neural network; end-to-end system; feature learning;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Environmental sound classification (ESC) is usually conducted based on handcrafted features such as the log-mel feature. Meanwhile, end-to-end classification systems perform feature extraction jointly with classification and have achieved success particularly in image classification. In the same manner, if environmental sounds could be directly learned from the raw waveforms, we would be able to extract a new feature effective for classification that could not have been designed by humans, and thi s new feature could improve the classification performance. In this paper, we propose a novel end-to-end ESC system using a convolutional neural network (CNN). The classification accuracy of our system on ESC-50 is 5.1% higher than that achieved when using logmel-CNN with the static log-mel feature. Moreover, we achieve a 6.5% improvement in classification accuracy over the state-of-the-art logmel-CNN with the static and delta log-mel feature, simply by combining our system and logmel-CNN.
引用
收藏
页码:2721 / 2725
页数:5
相关论文
共 18 条
[1]   Convolutional Neural Networks for Speech Recognition [J].
Abdel-Hamid, Ossama ;
Mohamed, Abdel-Rahman ;
Jiang, Hui ;
Deng, Li ;
Penn, Gerald ;
Yu, Dong .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545
[2]  
[Anonymous], P MLSP
[3]  
[Anonymous], P ACM MULT
[4]  
[Anonymous], 2014, P INTERSPEECH
[5]  
[Anonymous], 2007, P ICASSP
[6]  
[Anonymous], P ICASSP
[7]  
[Anonymous], 2015, 16 ANN C INT SPEECH
[8]  
[Anonymous], 2013, P NIPS
[9]  
[Anonymous], 1989, P IEEE
[10]  
[Anonymous], 2015, INT C MACHINE LEARN