Few-shot Audio Classification using Contrastive Training

被引:0
作者
Cigdem, Enes Furkan [1 ,2 ]
Keles, Hacer Yalim [1 ]
机构
[1] Hacettepe Univ, Ankara, Turkiye
[2] SESTEK, Istanbul, Turkiye
来源
32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024 | 2024年
关键词
Few-shot Learning; Contrastive Training; Nonepisodic training; Audio Classification; Neural Speech Embedding;
D O I
10.1109/SIU61531.2024.10600788
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study, the issue of few-shot audio classification in scenarios with limited labeled data is addressed, and experiments conducted on the GSC and ESC-50 audio datasets are presented. The study examines three experimental setups structured around training scenarios with 5, 10, and 15 samples. In all these experiments, accuracy values obtained using 1 and 5 audio recordings per instance for 5-class situations are compared. Trainings were conducted with three different loss optimizations, and the effects of simple feature transformations on classification performance for each training were also assessed. The findings indicate that these feature transformations enhance classification accuracy in both datasets. Notably, the hybrid approach, which combines simultaneous contrastive loss with few sample cross-entropy loss, achieved the highest classification performance in the fine-tuned scenario. In this context, tests conducted with 5 samples for 5 classes yielded success rates ranging between 86% and 91% in ESC-50 dataset, 91% and 95% in GSC dataset, depending on the number of samples used in training.
引用
收藏
页数:4
相关论文
共 15 条
[1]  
Chen T, 2020, PR MACH LEARN RES, V119
[2]   Exploring Simple Siamese Representation Learning [J].
Chen, Xinlei ;
He, Kaiming .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :15745-15753
[3]  
Chou SY, 2019, INT CONF ACOUST SPEE, P26, DOI [10.1109/ICASSP.2019.8682558, 10.1109/icassp.2019.8682558]
[4]   ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification [J].
Desplanques, Brecht ;
Thienpondt, Jenthe ;
Demuynck, Kris .
INTERSPEECH 2020, 2020, :3830-3834
[5]  
Nasiri A, 2021, Arxiv, DOI arXiv:2103.01929
[6]  
Paszke A, 2019, ADV NEUR IN, V32
[7]   ESC: Dataset for Environmental Sound Classification [J].
Piczak, Karol J. .
MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 2015, :1015-1018
[8]  
Ren Z., 2023, ICASSP 2023 2023 IEE, P1
[9]  
Saeed Aaqib, 2020, Contrastive learning of general-purpose audio representations
[10]  
Snell J, 2017, ADV NEUR IN, V30