A MULTI-TASK LEARNING METHOD FOR WEAKLY SUPERVISED SOUND EVENT DETECTION

被引:2
作者
Liu, Sichen [1 ,2 ]
Yang, Feiran [1 ,2 ]
Kang, Fang [1 ,2 ]
Yang, Jun [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
关键词
Sound event detection (SED); source separation (SS); multi-task learning (MTL); weakly supervised; NEURAL-NETWORKS;
D O I
10.1109/ICASSP43922.2022.9746947
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In weakly supervised sound event detection (SED), only coarse-grained labels are available, and thus the supervision information is quite limited. To fully utilize prior knowledge of the time-frequency masks of each sound event, we propose a novel multi-task learning (MTL) method that takes SED as the main task and source separation as the auxiliary task. For active events, we minimize the overlap of their masks as the segment loss to learn distinguishing features. For inactive events, the proposed method measures the activity of masks as silent loss to reduce the insertion error. The auxiliary source separation task calculates an extra penalty according to the shared masks, which can further incorporate prior knowledge in the form of regularization constraints. We demonstrated that the proposed method can effectively reduce the insertion error and achieve a better performance in SED task than single-task methods.
引用
收藏
页码:8802 / 8806
页数:5
相关论文
共 24 条
[1]  
[Anonymous], 2018, Computational analysis of sound scenes and events, DOI [10.1007/978-3-319-63450-0, DOI 10.1007/978-3-319-63450-0]
[2]   Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection [J].
Cakir, Emre ;
Parascandolo, Giambattista ;
Heittola, Toni ;
Huttunen, Heikki ;
Virtanen, Tuomas .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (06) :1291-1303
[3]  
Fonseca E., 2017, PROC ISMIR, P486
[4]  
Phan H, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P336, DOI 10.1109/ICASSP.2018.8461353
[5]  
Jati A, 2020, INT CONF ACOUST SPEE, P4497, DOI [10.1109/ICASSP40776.2020.9053766, 10.1109/icassp40776.2020.9053766]
[6]   A low-complexity permutation alignment method for frequency-domain blind source separation [J].
Kang, Fang ;
Yang, Feiran ;
Yang, Jun .
SPEECH COMMUNICATION, 2019, 115 :88-94
[7]  
Kingma DP, 2014, ADV NEUR IN, V27
[8]   PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition [J].
Kong, Qiuqiang ;
Cao, Yin ;
Iqbal, Turab ;
Wang, Yuxuan ;
Wang, Wenwu ;
Plumbley, Mark D. .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 :2880-2894
[9]   Sound Event Detection and Time-Frequency Segmentation from Weakly Labelled Data [J].
Kong, Qiuqiang ;
Xu, Yong ;
Sobieraj, Iwona ;
Wang, Wenwu ;
Plumbley, Mark D. .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (04) :777-787
[10]   A JOINT SEPARATION-CLASSIFICATION MODEL FOR SOUND EVENT DETECTION OF WEAKLY LABELLED DATA [J].
Kong, Qiuqiang ;
Xu, Yong ;
Wang, Wenwu ;
Plumbley, Mark D. .
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, :321-325