Deep Convolutional Neural Network with Structured Prediction for Weakly Supervised Audio Event Detection

被引:2
作者
Choi, Inkyu
Bae, Soo Hyun
Kim, Nam Soo [1 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, 1 Gwanak Ro, Seoul 08826, South Korea
来源
APPLIED SCIENCES-BASEL | 2019年 / 9卷 / 11期
关键词
audio event detection; weakly supervised learning; convolutional neural network; structured prediction; conditional random field;
D O I
10.3390/app9112302
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Audio event detection (AED) is a task of recognizing the types of audio events in an audio stream and estimating their temporal positions. AED is typically based on fully supervised approaches, requiring strong labels including both the presence and temporal position of each audio event. However, fully supervised datasets are not easily available due to the heavy cost of human annotation. Recently, weakly supervised approaches for AED have been proposed, utilizing large scale datasets with weak labels including only the occurrence of events in recordings. In this work, we introduce a deep convolutional neural network (CNN) model called DSNet based on densely connected convolution networks (DenseNets) and squeeze-and-excitation networks (SENets) for weakly supervised training of AED. DSNet alleviates the vanishing-gradient problem and strengthens feature propagation and models interdependencies between channels. We also propose a structured prediction method for weakly supervised AED. We apply a recurrent neural network (RNN) based framework and a prediction smoothness cost function to consider long-term contextual information with reduced error propagation. In post-processing, conditional random fields (CRFs) are applied to take into account the dependency between segments and delineate the borders of audio events precisely. We evaluated our proposed models on the DCASE 2017 task 4 dataset and obtained state-of-the-art results on both audio tagging and event detection tasks.
引用
收藏
页数:14
相关论文
共 39 条
[1]  
Abadi M., 2015, TENSORFLOW LARGE SCA, DOI DOI 10.48550/ARXIV.1603.04467
[2]  
[Anonymous], P DCASE2017 WORKSH M
[3]  
[Anonymous], 2015, P 2015 INT JOINT C N, DOI [DOI 10.1109/IJCNN.2015.7280624, 10.1109/IJCNN.2015.7280624]
[4]  
[Anonymous], ARXIV171200866
[5]  
[Anonymous], DCASE2017
[6]  
[Anonymous], PROC CVPR IEEE
[7]  
[Anonymous], IEEE INT CON MULTI
[8]  
[Anonymous], 2016, P 24 ACM INT C MULT, DOI DOI 10.1145/2964284.2964310
[9]  
[Anonymous], DCASE2017
[10]  
[Anonymous], P 2010 18 EUR SIGN P