High Precision Sound Event Detection based on Transfer Learning using Transposed Convolutions and Feature Pyramid Network

被引:2
作者
Luo, Shunyan [1 ]
Feng, Yarong [1 ]
Liu, Zongyi [1 ]
Ling, Yuan [1 ]
Dong, Shujing [1 ]
Ferry, Bruce [1 ]
机构
[1] Amazon, Seattle, WA 98109 USA
来源
2023 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, ICCE | 2023年
关键词
sound event detection; CNN; transposed convolution; feature pyramid network; audio signal processing;
D O I
10.1109/ICCE56470.2023.10043383
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We introduce two models for high precision sound event detection leveraging transfer learning. The sound events we detect include "speech", "music", and "chime". Both models consist of a CNN backbone pre-trained using AudioSet for audio classification. To get high precision detection results, the first model employs transposed convolutional layers as the detection head, while the second model uses Feature Pyramid Network(FPN) as the detection head. Experimental results show 98:8% accuracy and 98:6% F1 score on a private test set, from the one using FPN. Both models outperform a two-stage model using LSTM, various model ensembles, and a pre-trained neural network model for audio classification.
引用
收藏
页数:6
相关论文
共 11 条
  • [1] Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
    Adavanne, Sharath
    Politis, Archontis
    Nikunen, Joonas
    Virtanen, Tuomas
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (01) : 34 - 48
  • [2] [Anonymous], 2018, A guide to convolution arithmetic for deep learning
  • [3] Dang A, 2017, INT CONF ORANGE TECH, P75, DOI 10.1109/ICOT.2017.8336092
  • [4] A Multi-Resolution CRNN-Based Approach for Semi-Supervised Sound Event Detection in DCASE 2020 Challenge
    de Benito-Gorron, Diego
    Ramos, Daniel
    Toledano, Doroteo T.
    [J]. IEEE ACCESS, 2021, 9 : 89029 - 89042
  • [5] Gemmeke JF, 2017, INT CONF ACOUST SPEE, P776, DOI 10.1109/ICASSP.2017.7952261
  • [6] PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
    Kong, Qiuqiang
    Cao, Yin
    Iqbal, Turab
    Wang, Yuxuan
    Wang, Wenwu
    Plumbley, Mark D.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 2880 - 2894
  • [7] Feature Pyramid Networks for Object Detection
    Lin, Tsung-Yi
    Dollar, Piotr
    Girshick, Ross
    He, Kaiming
    Hariharan, Bharath
    Belongie, Serge
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 936 - 944
  • [8] Metrics for Polyphonic Sound Event Detection
    Mesaros, Annamaria
    Heittola, Toni
    Virtanen, Tuomas
    [J]. APPLIED SCIENCES-BASEL, 2016, 6 (06):
  • [9] Mesaros A, 2010, EUR SIGNAL PR CONF, P1267
  • [10] Ramirez J., 2007, ROBUST SPEECH RECOGN, V6, P1, DOI [10.5772/4740, DOI 10.5772/4740]