High Precision Sound Event Detection based on Transfer Learning using Transposed Convolutions and Feature Pyramid Network

被引：2

作者：

Luo, Shunyan ^{[1
]}

Feng, Yarong ^{[1
]}

Liu, Zongyi ^{[1
]}

Ling, Yuan ^{[1
]}

Dong, Shujing ^{[1
]}

Ferry, Bruce ^{[1
]}

机构：

[1] Amazon, Seattle, WA 98109 USA

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, ICCE | 2023年

关键词：

sound event detection; CNN; transposed convolution; feature pyramid network; audio signal processing;

D O I：

10.1109/ICCE56470.2023.10043383

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

We introduce two models for high precision sound event detection leveraging transfer learning. The sound events we detect include "speech", "music", and "chime". Both models consist of a CNN backbone pre-trained using AudioSet for audio classification. To get high precision detection results, the first model employs transposed convolutional layers as the detection head, while the second model uses Feature Pyramid Network(FPN) as the detection head. Experimental results show 98:8% accuracy and 98:6% F1 score on a private test set, from the one using FPN. Both models outperform a two-stage model using LSTM, various model ensembles, and a pre-trained neural network model for audio classification.

引用

页数：6

共 11 条

[1] Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
Adavanne, Sharath
Politis, Archontis
Nikunen, Joonas
Virtanen, Tuomas
[J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (01) : 34 - 48
[2] [Anonymous], 2018, A guide to convolution arithmetic for deep learning
[3] Dang A, 2017, INT CONF ORANGE TECH, P75, DOI 10.1109/ICOT.2017.8336092
[4] A Multi-Resolution CRNN-Based Approach for Semi-Supervised Sound Event Detection in DCASE 2020 Challenge
de Benito-Gorron, Diego
Ramos, Daniel
Toledano, Doroteo T.
[J]. IEEE ACCESS, 2021, 9 : 89029 - 89042
[5] Gemmeke JF, 2017, INT CONF ACOUST SPEE, P776, DOI 10.1109/ICASSP.2017.7952261
[6] PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
Kong, Qiuqiang
Cao, Yin
Iqbal, Turab
Wang, Yuxuan
Wang, Wenwu
Plumbley, Mark D.
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 2880 - 2894
[7] Feature Pyramid Networks for Object Detection
Lin, Tsung-Yi
Dollar, Piotr
Girshick, Ross
He, Kaiming
Hariharan, Bharath
Belongie, Serge
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 936 - 944
[8] Metrics for Polyphonic Sound Event Detection
Mesaros, Annamaria
Heittola, Toni
Virtanen, Tuomas
[J]. APPLIED SCIENCES-BASEL, 2016, 6 (06):
[9] Mesaros A, 2010, EUR SIGNAL PR CONF, P1267
[10] Ramirez J., 2007, ROBUST SPEECH RECOGN, V6, P1, DOI [10.5772/4740, DOI 10.5772/4740]

← 1 2 →