A Deep Residual Network for Large-Scale Acoustic Scene Analysis

被引：32

作者：

Ford, Logan ^{[1
]}

Tang, Hao ^{[1
]}

Grondin, Francois ^{[1
]}

Glass, James ^{[1
]}

机构：

[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA

来源：

INTERSPEECH 2019 | 2019年

关键词：

acoustic scene analysis; audio classification; audio event detection; AUDIO CLASSIFICATION; NEURAL-NETWORKS;

D O I：

10.21437/Interspeech.2019-2731

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Many of the recent advances in audio event detection, particularly on the AudioSet data set, have focused on improving performance using the released embeddings produced by a pretrained model. In this work, we instead study the task of training a multi-label event classifier directly from the audio recordings of AudioSet. Using the audio recordings, not only are we able to reproduce results from prior work, we have also confirmed improvements of other proposed additions, such as an attention module. Moreover, by training the embedding network jointly with the additions, we achieve an mAP of 0.392 and an AUC of 0.971, surpassing the state of the art without transfer learning from a large data set. We also analyze the output activations of the network and find that the models are able to localize audio events when a finer time resolution is needed.

引用

页码：2568 / 2572

页数：5

共 30 条

[1] Bae S.H., 2016, P DET CLASS AC SCEN, P11
[2] Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
Cakir, Emre
Parascandolo, Giambattista
Heittola, Toni
Huttunen, Heikki
Virtanen, Tuomas
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (06) : 1291 - 1303
[3] Improving mask learning based speech enhancement system with restoration layers and residual connection
Chen, Zhuo
Huang, Yan
Li, Jinyu
Gong, Yifan
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3632 - 3636
[4] Enhanced Voice Activity Detection Using Acoustic Event Detection and Classification
Cho, Namgook
Kim, Eun-Kyoung
[J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2011, 57 (01) : 196 - 202
[5] Dutta S, 2017, 2017 IEEE INTERNATIONAL CONFERENCE ON POWER, CONTROL, SIGNALS AND INSTRUMENTATION ENGINEERING (ICPCSI), P3001, DOI 10.1109/ICPCSI.2017.8392276
[6] Gemmeke JF, 2017, INT CONF ACOUST SPEE, P776, DOI 10.1109/ICASSP.2017.7952261
[7] Grondin F, 2016, IEEE INT CONF ROBOT, P1650, DOI 10.1109/ICRA.2016.7487306
[8] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[9] Hershey S, 2017, INT CONF ACOUST SPEE, P131, DOI 10.1109/ICASSP.2017.7952132
[10] Jansen A., 2018, IEEE ICASSP

← 1 2 3 →