Few-Shot Class-Incremental Audio Classification With Adaptive Mitigation of Forgetting and Overfitting

被引：3

作者：

Li, Yanxiong ^{[1
]}

Li, Jialong ^{[1
]}

Si, Yongjie ^{[1
]}

Tan, Jiaxin ^{[1
]}

He, Qianhua ^{[1
]}

机构：

[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou, 510640, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2024年 / 32卷

基金：

中国国家自然科学基金;

关键词：

Audio classification; few-shot class-incremental learning; embedding extractor; expandable classifier;

D O I：

10.1109/TASLP.2024.3385287

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Few-shot Class-incremental Audio Classification (FCAC) is a task to continuously identify incremental classes with only few training samples after training the model on base classes with abundant samples. The key to solving the FCAC problem is to ensure that the model has good stability (without forgetting base classes) and strong plasticity (without overfitting incremental classes). In this paper, we propose a FCAC method which is able to adaptively mitigate the model's forgetting of base classes and overfitting of incremental classes. Our model consists of an embedding extractor and an expandable classifier. The former is the backbone of a residual network and is frozen after being trained using sufficient samples of base classes, whereas the latter can be expandable and is updated using few training samples of incremental classes in each incremental session. The expandable classifier consists of two branches and one fusion module. The two branches are designed to mitigate the model's forgetting of base classes and overfitting of incremental classes, respectively. The fusion module is designed to adaptively fuse predictions output by the above two branches. In addition, we define two losses for model training in the base and incremental sessions, respectively. Three experimental datasets (NSynth-100, FSC-89 and LS-100) are created by randomly choosing samples from audio corpora of NSynth, FSD-MIX-CLIP and LibriSpeech, respectively. Experimental results demonstrate that our proposed method outperforms all previous methods in accuracy and has advantage over most previous methods in computational load.

引用

页码：2297 / 2311

页数：15

共 47 条

[1]

Chauhan J, 2022, EUR SIGNAL PR CONF, P424

[2] HTS-AT: A HIERARCHICAL TOKEN-SEMANTIC AUDIO TRANSFORMER FOR SOUND CLASSIFICATION AND DETECTION [J].

Chen, Ke ;

Du, Xingjian ;

Zhu, Bilei ;

Ma, Zejun ;

Berg-Kirkpatrick, Taylor ;

Dubnov, Shlomo .

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :646-650

[3]

Cheng S.-Y., 2019, IEEE 21 INT WORKSHOP, P1

[4] Semantic-aware Knowledge Distillation for Few-Shot Class-Incremental Learning [J].

Cheraghian, Ali ;

Rahman, Shafin ;

Fang, Pengfei ;

Roy, Soumava Kumar ;

Petersson, Lars ;

Harandi, Mehrtash .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :2534-2543

[5]

Demsar J, 2006, J MACH LEARN RES, V7, P1

[6] ArcFace: Additive Angular Margin Loss for Deep Face Recognition [J].

Deng, Jiankang ;

Guo, Jia ;

Yang, Jing ;

Xue, Niannan ;

Kotsia, Irene ;

Zafeiriou, Stefanos .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) :5962-5979

[7] Learning without Memorizing [J].

Dhar, Prithviraj ;

Singh, Rajat Vikram ;

Peng, Kuan-Chuan ;

Wu, Ziyan ;

Chellappa, Rama .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5133-5141

[8]

Engel J, 2017, PR MACH LEARN RES, V70

[9] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[10]

Ho Ka Chon, 2021, 2021 IEEE 21st International Conference on Communication Technology (ICCT), P1341, DOI 10.1109/ICCT52962.2021.9658086

← 1 2 3 4 5 →