MULTI-SCALE RECURRENT NEURAL NETWORK FOR SOUND EVENT DETECTION

被引：0

作者：

Lu, Rui ^{[1
]}

Duan, Zhiyao ^{[2
]}

Zhang, Changshui ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Automat, State Key Lab Intelligent Technol & Syst, TNList, Beijing, Peoples R China

[2] Univ Rochester, Dept Elect & Comp Engn, Rochester, NY USA

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

基金：

美国国家科学基金会;

关键词：

Multi-scale model; deep learning; recurrent neural network; sound event detection;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Sound event detection (SED) in real life is an interesting but challenging task due to the polyphonic and long-term dependent nature of sound events. Recently, multi-label recurrent neural networks (RNNs) have shown promises. However, even equipped with long short-term memory (LSTM) or gated recurrent unit (GRU) cells, RNNs are still limited to model the long-term dependency. In this paper, we propose a multi-scale RNN to address this issue. By integrating information from different time resolutions, we can better capture both the fine-grained and long-term dependencies of sound events. We experiment on the development sets of Task3 of DCASE2016 and DCASE2017. Compared to our previously proposed single-scale RNN that won the third place among the 13 teams in Task3 of DCASE2017, the proposed multiscale model achieves statistically significantly better performance on the development datasets of both DECASE2016 and DCASE2017.

引用

页码：131 / 135

页数：5

共 21 条

[1] Adavanne Sharath, 2017, 42 INT C AC SPEECH S
[2] [Anonymous], 2015, 3 INT C LEARNING REP
[3] [Anonymous], IEEE C COMP VIS PATT
[4] [Anonymous], DCASE2017
[5] [Anonymous], 2010, SIGKDD Explor.
[6] Baraldi L., 2017, IEEE C COMP VIS PATT
[7] Acoustic Scene Classification
Barchiesi, Daniele
Giannoulis, Dimitrios
Stowell, Dan
Plumbley, Mark D.
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (03) : 16 - 34
[8] Cakir E, 2015, IEEE IJCNN
[9] Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
Cakir, Emre
Parascandolo, Giambattista
Heittola, Toni
Huttunen, Heikki
Virtanen, Tuomas
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (06) : 1291 - 1303
[10] Chen Yukun, 2017, DCASE2017

← 1 2 3 →