Few-shot Audio Classification using Contrastive Training

被引：0

作者：

Cigdem, Enes Furkan ^{[1
,2
]}

Keles, Hacer Yalim ^{[1
]}

机构：

[1] Hacettepe Univ, Ankara, Turkiye

[2] SESTEK, Istanbul, Turkiye

来源：

32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024 | 2024年

关键词：

Few-shot Learning; Contrastive Training; Nonepisodic training; Audio Classification; Neural Speech Embedding;

D O I：

10.1109/SIU61531.2024.10600788

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this study, the issue of few-shot audio classification in scenarios with limited labeled data is addressed, and experiments conducted on the GSC and ESC-50 audio datasets are presented. The study examines three experimental setups structured around training scenarios with 5, 10, and 15 samples. In all these experiments, accuracy values obtained using 1 and 5 audio recordings per instance for 5-class situations are compared. Trainings were conducted with three different loss optimizations, and the effects of simple feature transformations on classification performance for each training were also assessed. The findings indicate that these feature transformations enhance classification accuracy in both datasets. Notably, the hybrid approach, which combines simultaneous contrastive loss with few sample cross-entropy loss, achieved the highest classification performance in the fine-tuned scenario. In this context, tests conducted with 5 samples for 5 classes yielded success rates ranging between 86% and 91% in ESC-50 dataset, 91% and 95% in GSC dataset, depending on the number of samples used in training.

引用

页数：4

共 15 条

[11]

Vinyals O., 2017, Matching Networks for One Shot Learning

[12] Deep Closest Point: Learning Representations for Point Cloud Registration [J].

Wang, Yue ;

Solomon, Justin M. .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3522-3531

[13]

Warden P., 2018, Speech commands: A dataset for limited-vocabulary speech recognition

[14]

Zbontar J, 2021, PR MACH LEARN RES, V139

[15] Few-Shot Audio Classification with Attentional Graph Neural Networks [J].

Zhang, Shilei ;

Qin, Yong ;

Sun, Kewei ;

Lin, Yonghua .

INTERSPEECH 2019, 2019, :3649-3653

← 1 2 →