CONTRASTIVE LOSS BASED FRAME-WISE FEATURE DISENTANGLEMENT FOR POLYPHONIC SOUND EVENT DETECTION

被引:1
作者
Guan, Yadong [1 ]
Han, Jiqing [1 ]
Song, Hongwei [1 ]
Song, Wenjie [1 ]
Zheng, Guibin [1 ]
Zheng, Tieran [1 ]
He, Yongjun [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年
基金
中国国家自然科学基金;
关键词
Polyphonic Sound Event Detection; Feature Disentanglement; Contrastive Loss;
D O I
10.1109/ICASSP48485.2024.10447743
中图分类号
学科分类号
摘要
Overlapping sound events are ubiquitous in real-world environments, but existing end-to-end sound event detection (SED) methods still struggle to detect them effectively. A critical reason is that these methods represent overlapping events using shared and entangled frame-wise features, which degrades the feature discrimination. To solve the problem, we propose a disentangled feature learning framework to learn a category-specific representation. Specifically, we employ different projectors to learn the frame-wise features for each category. To ensure that these feature does not contain information of other categories, we maximize the common information between frame-wise features within the same category and propose a frame-wise contrastive loss. In addition, considering that the labeled data used by the proposed method is limited, we propose a semi-supervised frame-wise contrastive loss that can leverage large amounts of unlabeled data to achieve feature disentanglement. The experimental results demonstrate the effectiveness of our method.
引用
收藏
页码:1021 / 1025
页数:5
相关论文
empty
未找到相关数据