MULTI-SCALE RECURRENT NEURAL NETWORK FOR SOUND EVENT DETECTION

被引:0
作者
Lu, Rui [1 ]
Duan, Zhiyao [2 ]
Zhang, Changshui [1 ]
机构
[1] Tsinghua Univ, Dept Automat, State Key Lab Intelligent Technol & Syst, TNList, Beijing, Peoples R China
[2] Univ Rochester, Dept Elect & Comp Engn, Rochester, NY USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
基金
美国国家科学基金会;
关键词
Multi-scale model; deep learning; recurrent neural network; sound event detection;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Sound event detection (SED) in real life is an interesting but challenging task due to the polyphonic and long-term dependent nature of sound events. Recently, multi-label recurrent neural networks (RNNs) have shown promises. However, even equipped with long short-term memory (LSTM) or gated recurrent unit (GRU) cells, RNNs are still limited to model the long-term dependency. In this paper, we propose a multi-scale RNN to address this issue. By integrating information from different time resolutions, we can better capture both the fine-grained and long-term dependencies of sound events. We experiment on the development sets of Task3 of DCASE2016 and DCASE2017. Compared to our previously proposed single-scale RNN that won the third place among the 13 teams in Task3 of DCASE2017, the proposed multiscale model achieves statistically significantly better performance on the development datasets of both DECASE2016 and DCASE2017.
引用
收藏
页码:131 / 135
页数:5
相关论文
共 21 条
  • [1] Adavanne Sharath, 2017, 42 INT C AC SPEECH S
  • [2] [Anonymous], 2015, 3 INT C LEARNING REP
  • [3] [Anonymous], IEEE C COMP VIS PATT
  • [4] [Anonymous], DCASE2017
  • [5] [Anonymous], 2010, SIGKDD Explor.
  • [6] Baraldi L., 2017, IEEE C COMP VIS PATT
  • [7] Acoustic Scene Classification
    Barchiesi, Daniele
    Giannoulis, Dimitrios
    Stowell, Dan
    Plumbley, Mark D.
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (03) : 16 - 34
  • [8] Cakir E, 2015, IEEE IJCNN
  • [9] Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
    Cakir, Emre
    Parascandolo, Giambattista
    Heittola, Toni
    Huttunen, Heikki
    Virtanen, Tuomas
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (06) : 1291 - 1303
  • [10] Chen Yukun, 2017, DCASE2017