Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019

被引:69
作者
Politis, Archontis [1 ]
Mesaros, Annamaria [1 ]
Adavanne, Sharath [1 ]
Heittola, Toni [1 ]
Virtanen, Tuomas [1 ]
机构
[1] Tampere Univ, Fac Informat Technol & Commun Sci, FI-33720 Tampere, Finland
基金
欧洲研究理事会;
关键词
Task analysis; Measurement; Azimuth; Hidden Markov models; Acoustics; Speech processing; Two dimensional displays; Acoustic scene analysis; microphone arrays; sound event localization and detection; sound source localization;
D O I
10.1109/TASLP.2020.3047233
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Sound event localization and detection is a novel area of research that emerged from the combined interest of analyzing the acoustic scene in terms of the spatial and temporal activity of sounds of interest. This paper presents an overview of the first international evaluation on sound event localization and detection, organized as a task of the DCASE 2019 Challenge. A large-scale realistic dataset of spatialized sound events was generated for the challenge, to be used for training of learning-based approaches, and for evaluation of the submissions in an unlabeled subset. The overview presents in detail how the systems were evaluated and ranked and the characteristics of the best-performing systems. Common strategies in terms of input features, model architectures, training approaches, exploitation of prior knowledge, and data augmentation are discussed. Since ranking in the challenge was based on individually evaluating localization and event classification performance, part of the overview focuses on presenting metrics for the joint measurement of the two, together with a reevaluation of submissions using these new metrics. The new analysis reveals submissions that performed better on the joint task of detecting the correct type of event close to its original location than some of the submissions that were ranked higher in the challenge. Consequently, ranking of submissions which performed strongly when evaluated separately on detection or localization, but not jointly on both, was affected negatively.
引用
收藏
页码:684 / 698
页数:15
相关论文
共 56 条
  • [1] Abdi H., 2010, Encyclopedia of research design, P655
  • [2] Adavanne S., 2019, P DET CLASS AC SCEN, P20
  • [3] Adavanne S, 2019, P DETECTION CLASSIFI, P10, DOI 10.33682/1xwd-5v76
  • [4] Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
    Adavanne, Sharath
    Politis, Archontis
    Nikunen, Joonas
    Virtanen, Tuomas
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (01) : 34 - 48
  • [5] Adavanne S, 2018, EUR SIGNAL PR CONF, P1462, DOI 10.23919/EUSIPCO.2018.8553182
  • [6] Acoustic Scene Classification
    Barchiesi, Daniele
    Giannoulis, Dimitrios
    Stowell, Dan
    Plumbley, Mark D.
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (03) : 16 - 34
  • [7] Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics
    Bernardin, Keni
    Stiefelhagen, Rainer
    [J]. EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2008, 2008 (1)
  • [8] Butko T, 2011, EUR SIGNAL PR CONF, P1317
  • [9] Cao Y., 2019, P DET CLASS AC SCEN, P30
  • [10] Multitask learning
    Caruana, R
    [J]. MACHINE LEARNING, 1997, 28 (01) : 41 - 75