Addressing Missing Labels in Large-Scale Sound Event Recognition Using a Teacher-Student Framework With Loss Masking

被引：13

作者：

Fonseca, Eduardo ^{[1
]}

Hershey, Shawn ^{[2
]}

Plakal, Manoj ^{[2
]}

Ellis, Daniel P. W. ^{[2
]}

Jansen, Aren ^{[2
]}

Moore, R. Channing ^{[2
]}

机构：

[1] Univ Pompeu Fabra, Mus Technol Grp, Barcelona 08002, Spain

[2] Google Res, New York, NY 10011 USA

来源：

IEEE SIGNAL PROCESSING LETTERS | 2020年 / 27卷

关键词：

Sound event recognition; label noise; missing labels; teacher-student; loss masking;

D O I：

10.1109/LSP.2020.3006378

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The study of label noise in sound event recognition has recently gained attention with the advent of larger and noisier datasets. This work addresses the problem of missing labels, one of the big weaknesses of large audio datasets, and one of the most conspicuous issues for AudioSet. We propose a simple and model-agnostic method based on a teacher-student framework with loss masking to first identify the most critical missing label candidates, and then ignore their contribution during the learning process. We find that a simple optimisation of the training label set improves recognition performance without additional computation. We discover that most of the improvement comes from ignoring a critical tiny portion of the missing labels. We also show that the damage done by missing labels is larger as the training set gets smaller, yet it can still be observed even when training with massive amounts of audio. We believe these insights can generalize to other large-scale datasets.

引用

页码：1235 / 1239

页数：5

共 29 条

[1] [Anonymous], 2018, ARXIV181109967
[2] Ba LJ, 2014, ADV NEUR IN, V27
[3] Fonseca E., 2017, P 18 ISMIR C INT SOC, P486, DOI DOI 10.5281/ZENODO.1417159
[4] Fonseca E., 2018, P DET CLASS AC SCEN, P69
[5] Fonseca E, 2019, IEEE WORK APPL SIG, P16, DOI [10.1109/waspaa.2019.8937249, 10.1109/WASPAA.2019.8937249]
[6] Fonseca E, 2019, INT CONF ACOUST SPEE, P21, DOI 10.1109/ICASSP.2019.8683158
[7] Occupational Health of Pre-Hospital Emergency Technicians: The Contribution of Trauma and Coping
Fonseca, Silvia M.
Cunha, Sonia
Campos, Rui
Goncalves, Sonia P.
Queiros, Cristina
[J]. INTERNATIONAL JOURNAL ON WORKING CONDITIONS, 2019, (17): : 69 - 88
[8] A Deep Residual Network for Large-Scale Acoustic Scene Analysis
Ford, Logan
Tang, Hao
Grondin, Francois
Glass, James
[J]. INTERSPEECH 2019, 2019, : 2568 - 2572
[9] Foster Peter, 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Proceedings, P1, DOI 10.1109/WASPAA.2015.7336899
[10] Classification in the Presence of Label Noise: a Survey
Frenay, Benoit
Verleysen, Michel
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (05) : 845 - 869

← 1 2 3 →