Few-shot class-incremental audio classification via discriminative prototype learning

被引:8
作者
Xie, Wei [1 ]
Li, Yanxiong [1 ]
He, Qianhua [1 ]
Cao, Wenchang [1 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510641, Peoples R China
关键词
Audio classification; Few-shot learning; Class-incremental learning; Selective-attention; Prototype adjustment; NEURAL-NETWORK; RECOGNITION; DATASET;
D O I
10.1016/j.eswa.2023.120044
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In real-world scenarios, new audio classes with insufficient samples usually emerge continually, which motivates the study of few-shot class-incremental audio classification (FCAC) in this paper. FCAC aims to enable the model to recognize new audio classes while remembering the base ones continually. To solve the FCAC problem, the discriminability of the prototypes is vital to the model's classification performance. Thus, we proposed a method to learn the discriminative prototypes from two aspects. First, since the generalization ability of the embedding module (EM) significantly affects the discriminability of the prototypes, the proposed method employs a scheme of pseudo-episodic incremental training to train the EM by simulating the test scenario. Second, to enable the model to achieve a balanced classification performance on both base and new audio classes, the proposed method employs a selective-attention module to adjust different prototypes to enhance their discriminability. Extensive experimental results demonstrate that the proposed method achieves state-of-the-art performance in solving the FCAC problem. Notably, the proposed method achieves a comprehensive performance score (CPS) of 87.82% and 59.25% on the Neural Synthesis musical notes of 100 classes (NSynth-100) and Free sound clips of 89 classes (FSC-89) datasets, respectively, which outperforms the comparison methods. Our code is available at https://github.com/chester-w-xie/DPL_FCAC.
引用
收藏
页数:13
相关论文
共 65 条
[1]   Cross-Corpus Speech Emotion Recognition Based on Few-Shot Learning and Domain Adaptation [J].
Ahn, Youngdo ;
Lee, Sung Joo ;
Shin, Jong Won .
IEEE SIGNAL PROCESSING LETTERS, 2021, 28 :1190-1194
[2]   A novel deep learning method based on attention mechanism for bearing remaining useful life prediction [J].
Chen, Yuanhang ;
Peng, Gaoliang ;
Zhu, Zhiyu ;
Li, Sijue .
APPLIED SOFT COMPUTING, 2020, 86
[3]  
Chou SY, 2019, INT CONF ACOUST SPEE, P26, DOI [10.1109/icassp.2019.8682558, 10.1109/ICASSP.2019.8682558]
[4]   A Two-Stage Approach to Few-Shot Learning for Image Recognition [J].
Das, Debasmit ;
Lee, C. S. George .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :3336-3350
[5]  
Engel J, 2017, PR MACH LEARN RES, V70
[6]   Few-Shot Learning in Emotion Recognition of Spontaneous Speech Using a Siamese Neural Network With Adaptive Sample Pair Formation [J].
Feng, Kexin ;
Chaspari, Theodora .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (02) :1627-1633
[7]   FSD50K: An Open Dataset of Human-Labeled Sound Events [J].
Fonseca, Eduardo ;
Favory, Xavier ;
Pons, Jordi ;
Font, Frederic ;
Serra, Xavier .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 :829-852
[8]   GENRE-CONDITIONED ACOUSTIC MODELS FOR AUTOMATIC LYRICS TRANSCRIPTION OF POLYPHONIC MUSIC [J].
Gao, Xiaoxue ;
Gupta, Chitralekha ;
Li, Haizhou .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :791-795
[9]   Automatic Lyrics Transcription of Polyphonic Music With Lyrics-Chord Multi-Task Learning [J].
Gao, Xiaoxue ;
Gupta, Chitralekha ;
Li, Haizhou .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 :2280-2294
[10]  
Gemmeke JF, 2017, INT CONF ACOUST SPEE, P776, DOI 10.1109/ICASSP.2017.7952261