On Learning Disentangled Representation for Acoustic Event Detection

被引:3
作者
Gao, Lijian [1 ]
Mao, Qirong [1 ]
Dong, Ming [2 ]
Jing, Yu [2 ]
Chinnam, Ratna [3 ]
机构
[1] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang 212013, Jiangsu, Peoples R China
[2] Wayne State Univ, Dept Comp Sci, Detroit, MI 48202 USA
[3] Wayne State Univ, Dept Ind & Syst Engn, Detroit, MI 48202 USA
来源
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19) | 2019年
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
acoustic event detection; disentangled latent representation; supervised variational autoencoder;
D O I
10.1145/3343031.3351086
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Polyphonic Acoustic Event Detection (AED) is a challenging task as the sounds are mixed with the signals from different events, and the features extracted from the mixture do not match well with features calculated from sounds in isolation, leading to suboptimal AED performance. In this paper, we propose a supervised beta-VAE model for AED, which adds a novel event-specific disentangling loss in the objective function of disentangled learning. By incorporating either latent factor blocks or latent attention in disentangling, supervised beta-VAE learns a set of discriminative features for each event. Extensive experiments on benchmark datasets show that our approach outperforms the current state-of-the-arts (top-1 performers in the Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 AED challenge). Supervised beta-VAE has great success in challenging AED tasks with a large variety of events and imbalanced data.
引用
收藏
页码:2006 / 2014
页数:9
相关论文
共 38 条
[1]  
[Anonymous], 2015, P 2015 INT JOINT C N, DOI [DOI 10.1109/IJCNN.2015.7280624, 10.1109/IJCNN.2015.7280624]
[2]  
[Anonymous], SIGN PROC C
[3]  
[Anonymous], 2014, ARXIV14126583
[4]  
[Anonymous], INT C REPR LEARN
[5]  
[Anonymous], STUDIES COMPUTATIONA
[6]  
[Anonymous], 2001, DETECTING SOUND EVEN
[7]  
[Anonymous], INTERSPEECH
[8]  
[Anonymous], INT C REPR LEARN
[9]  
[Anonymous], DCASE2017
[10]  
[Anonymous], 2017, P DCASE MUN GERM 16