Strong Labeling of Sound Events Using Crowdsourced Weak Labels and Annotator Competence Estimation

被引：12

作者：

Martin-Morato, Irene ^{[1
]}

Mesaros, Annamaria ^{[1
]}

机构：

[1] Tampere Univ, Comp Sci, Tampere 33720, Finland

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2023年 / 31卷

基金：

芬兰科学院;

关键词：

Annotations; Task analysis; Crowdsourcing; Estimation; Reliability; Labeling; Speech processing; Strong labels; Sound event detection; Multi-annotator data; ALGORITHM; TRUTH;

D O I：

10.1109/TASLP.2022.3233468

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Crowdsourcing is a popular tool for collecting large amounts of annotated data, but the specific format of the strong labels necessary for sound event detection is not easily obtainable through crowdsourcing. In this work, we propose a novel annotation workflow that leverages the efficiency of crowdsourcing weak labels, and uses a high number of annotators to produce reliable and objective strong labels. The weak labels are collected in a highly redundant setup, to allow reconstruction of the temporal information. To obtain reliable labels, the annotators' competence is estimated using MACE (Multi-Annotator Competence Estimation) and incorporated into the strong labels estimation through weighing of individual opinions. We show that the proposed method produces consistently reliable strong annotations not only for synthetic audio mixtures, but also for audio recordings of real everyday environments. While only a maximum 80% coincidence with the complete and correct reference annotations was obtained for synthetic data, these results are explained by an extended study of how polyphony and SNR levels affect the identification rate of the sound events by the annotators. On real data, even though the estimated annotators' competence is significantly lower and the coincidence with reference labels is under 69%, the proposed majority opinion approach produces reliable aggregated strong labels in comparison with the more difficult task of crowdsourcing directly strong labels.

引用

页码：902 / 914

页数：13

共 37 条

[1] [Anonymous], 2013, P 2013 C N AM CHAPTE
[2] COMMON FACTORS IN THE IDENTIFICATION OF AN ASSORTMENT OF BRIEF EVERYDAY SOUNDS
BALLAS, JA
[J]. JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 1993, 19 (02) : 250 - 267
[3] Bilen C, 2020, INT CONF ACOUST SPEE, P61, DOI [10.1109/icassp40776.2020.9052995, 10.1109/ICASSP40776.2020.9052995]
[4] Cartwright Mark, 2017, Proceedings of the ACM on Human-Computer Interaction, V1, DOI 10.1145/3134664
[5] Crowdsourcing Multi-label Audio Annotation Tasks with Citizen Scientists
Cartwright, Mark
Dove, Graham
Mendez, Ana Elisa Mendez
Bello, Juan P.
Nov, Oded
[J]. CHI 2019: PROCEEDINGS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019,
[6] Cheung A. H., 2021, PROC 6 DETECTION CLA, P181
[7] Towards Duration Robust Weakly Supervised Sound Event Detection
Dinkel, Heinrich
Wu, Mengyue
Yu, Kai
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 887 - 900
[8] Drossos K, 2020, INT CONF ACOUST SPEE, P736, DOI [10.1109/ICASSP40776.2020.9052990, 10.1109/icassp40776.2020.9052990]
[9] Fonseca E., 2018, P DET CLASS AC SCEN, P69
[10] FSD50K: An Open Dataset of Human-Labeled Sound Events
Fonseca, Eduardo
Favory, Xavier
Pons, Jordi
Font, Frederic
Serra, Xavier
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 829 - 852

← 1 2 3 4 →