A Deep Residual Network for Large-Scale Acoustic Scene Analysis

被引:32
作者
Ford, Logan [1 ]
Tang, Hao [1 ]
Grondin, Francois [1 ]
Glass, James [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
来源
INTERSPEECH 2019 | 2019年
关键词
acoustic scene analysis; audio classification; audio event detection; AUDIO CLASSIFICATION; NEURAL-NETWORKS;
D O I
10.21437/Interspeech.2019-2731
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Many of the recent advances in audio event detection, particularly on the AudioSet data set, have focused on improving performance using the released embeddings produced by a pretrained model. In this work, we instead study the task of training a multi-label event classifier directly from the audio recordings of AudioSet. Using the audio recordings, not only are we able to reproduce results from prior work, we have also confirmed improvements of other proposed additions, such as an attention module. Moreover, by training the embedding network jointly with the additions, we achieve an mAP of 0.392 and an AUC of 0.971, surpassing the state of the art without transfer learning from a large data set. We also analyze the output activations of the network and find that the models are able to localize audio events when a finer time resolution is needed.
引用
收藏
页码:2568 / 2572
页数:5
相关论文
共 30 条
  • [1] Bae S.H., 2016, P DET CLASS AC SCEN, P11
  • [2] Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
    Cakir, Emre
    Parascandolo, Giambattista
    Heittola, Toni
    Huttunen, Heikki
    Virtanen, Tuomas
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (06) : 1291 - 1303
  • [3] Improving mask learning based speech enhancement system with restoration layers and residual connection
    Chen, Zhuo
    Huang, Yan
    Li, Jinyu
    Gong, Yifan
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3632 - 3636
  • [4] Enhanced Voice Activity Detection Using Acoustic Event Detection and Classification
    Cho, Namgook
    Kim, Eun-Kyoung
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2011, 57 (01) : 196 - 202
  • [5] Dutta S, 2017, 2017 IEEE INTERNATIONAL CONFERENCE ON POWER, CONTROL, SIGNALS AND INSTRUMENTATION ENGINEERING (ICPCSI), P3001, DOI 10.1109/ICPCSI.2017.8392276
  • [6] Gemmeke JF, 2017, INT CONF ACOUST SPEE, P776, DOI 10.1109/ICASSP.2017.7952261
  • [7] Grondin F, 2016, IEEE INT CONF ROBOT, P1650, DOI 10.1109/ICRA.2016.7487306
  • [8] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [9] Hershey S, 2017, INT CONF ACOUST SPEE, P131, DOI 10.1109/ICASSP.2017.7952132
  • [10] Jansen A., 2018, IEEE ICASSP