From Pixels to Waveforms: Evaluating Pre-trained Image Models for Few-Shot Audio Classification

被引：0

作者：

Heggan, Calum ^{[1
]}

Hospedales, Tim ^{[2
]}

Budgett, Sam ^{[3
]}

Yaghoobi, Mehrdad ^{[1
]}

机构：

[1] Univ Edinburgh, Sch Engn, Edinburgh, Midlothian, Scotland

[2] Univ Edinburgh, Sch Informat, Edinburgh, Midlothian, Scotland

[3] Thales UK, Reading, Berks, England

来源：

2024 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN 2024 | 2024年

基金：

英国工程与自然科学研究理事会;

关键词：

Few-Shot Learning; Audio; Transfer-Learning; Imagery;

D O I：

10.1109/IJCNN60899.2024.10650537

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Within machine learning literature, few-shot learning has emerged as a crucial tool in many use-cases, enabling adaptation to new tasks quickly with limited supervision. While the majority of this research has concentrated on the image domain, recent efforts have extended its scope to others. A prevalent strategy in addressing few-shot learning involves harnessing pre-trained models-whether supervised or otherwise-as feature extractors. These models, paired with a lightweight trainable head, demonstrate remarkable efficacy, often achieving near or state-of-the-art results. However, this method encounters challenges in domains like audio, where the number of available and regularly published pre-trained models is comparatively low. Given that audio signals can be represented as spectrograms, that are akin to traditional imagery, a fundamental question arises: Can off-the-shelf pre-trained image models prove beneficial for few-shot audio classification? Additionally, can insights from the image domain guide model and approach selection in the audio domain? Our investigation yields diverse insights, showcasing the effectiveness of both supervised and self-supervised image-pre-trained models for few-shot audio classification. Alongside this, we identify strong relationships between the common Cross-Domain few-shot imagery learning settings and few-shot audio performance.

引用

页数：8

共 40 条

[1]

Al-Tahan H, 2021, PR MACH LEARN RES, V130

[2]

[Anonymous], 2019, ARXIV, DOI DOI 10.1109/ICCV.2019.00362

[3]

Asano YM, 2019, Proc. ICLR

[4]

Caron M, 2020, ADV NEUR IN, V33

[5] Emerging Properties in Self-Supervised Vision Transformers [J].

Caron, Mathilde ;

Touvron, Hugo ;

Misra, Ishan ;

Jegou, Herve ;

Mairal, Julien ;

Bojanowski, Piotr ;

Joulin, Armand .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9630-9640

[6] Deep Clustering for Unsupervised Learning of Visual Features [J].

Caron, Mathilde ;

Bojanowski, Piotr ;

Joulin, Armand ;

Douze, Matthijs .

COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :139-156

[7]

Chen T., 2020, Advances in Neural Information Processing Systems, V33, P22243, DOI 10.48550/arXiv.2006.10029

[8]

Chen T., 2020, A simple framework for contrastive learning of visual repre

[9]

Chen X., 2020, arXiv preprint arXiv:2003.04297

[10] Exploring Simple Siamese Representation Learning [J].

Chen, Xinlei ;

He, Kaiming .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :15745-15753

← 1 2 3 4 →