Learning Noise Adapters for Incremental Speech Enhancement

被引：0

作者：

Yang, Ziye ^{[1
]}

Song, Xiang ^{[2
]}

Chen, Jie ^{[1
]}

Richard, Cedric ^{[3
]}

Cohen, Israel ^{[4
]}

机构：

[1] Northwestern Polytech Univ Shenzhen, Res & Dev Inst, Shenzhen 518063, Peoples R China

[2] Xi An Jiao Tong Univ, Sch Software Engn, Xian 710049, Peoples R China

[3] Univ Cote Dazur, CNRS, OCA, F-06103 Nice, France

[4] Technion Israel Inst Technol, IL-3200003 Haifa, Israel

来源：

IEEE SIGNAL PROCESSING LETTERS | 2024年 / 31卷

关键词：

Noise; Training; Adaptation models; Speech enhancement; Decoding; Data models; Transformers; Speech recognition; Signal to noise ratio; Noise measurement; Catastrophic forgetting problem; incremental learning; noise adapter; speech enhancement;

D O I：

10.1109/LSP.2024.3482171

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Incremental speech enhancement (ISE), with the ability to incrementally adapt to new noise domains, represents a critical yet comparatively under-investigated topic. While the regularization-based method has been proposed to solve the ISE task, it usually suffers from the dilemma wherein the gain of one domain directly entails the loss of another. To solve this issue, we propose an effective paradigm, termed Learning Noise Adapters (LNA), which significantly mitigates the catastrophic domain forgetting phenomenon in the ISE task. In our methodology, we employ a frozen pre-trained model to train and retain a domain-specific adapter for each newly encountered domain, enabling the capture of variations in feature distributions within these domains. Subsequently, our approach involves the development of an unsupervised, training-free noise selector for the inference stage, which is responsible for identifying the domains of test speech samples. A comprehensive experimental validation has substantiated the effectiveness of our approach.

引用

页码：2915 / 2919

页数：5

共 27 条

[1]

Chen SF, 2022, ADV NEUR IN

[2] Domain Incremental Object Detection Based on Feature Space Topology Preserving Strategy [J].

Ding, Li ;

Song, Xiang ;

He, Yuhang ;

Wang, Changxin ;

Dong, Songlin ;

Wei, Xing ;

Gong, Yihong .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) :424-437

[3] Domain Adaptation for Speech Enhancement in a Large Domain Gap [J].

Frenkel, Lior ;

Goldberger, Jacob ;

Chazan, Shlomo E. .

INTERSPEECH 2023, 2023, :2458-2462

[4] INCREMENTAL LEARNING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION [J].

Fu, Li ;

Li, Xiaoxiao ;

Zi, Libo ;

Zhang, Zhengchen ;

Wu, Youzheng ;

He, Xiaodong ;

Zhou, Bowen .

2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, :320-327

[5] CLIP-Adapter: Better Vision-Language Models with Feature Adapters [J].

Gao, Peng ;

Geng, Shijie ;

Zhang, Renrui ;

Ma, Teli ;

Fang, Rongyao ;

Zhang, Yongfeng ;

Li, Hongsheng ;

Qiao, Yu .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (02) :581-595

[6]

Garofolo J., 1993, LDC Catalog

[7]

Kingma DP, 2014, arXiv

[8] SERIL: Noise Adaptive Speech Enhancement using Regularization-based Incremental Learning [J].

Lee, Chi-Chang ;

Lin, Yu-Chen ;

Lin, Hsuan-Tien ;

Wang, Hsin-Min ;

Tsao, Yu .

INTERSPEECH 2020, 2020, :2432-2436

[9]

Lee D, 2023, IEEE SIGNAL PROC LET, V30, P155, DOI 10.1109/LSP.2023.3244428

[10] Few-Shot Class-Incremental Audio Classification With Adaptive Mitigation of Forgetting and Overfitting [J].

Li, Yanxiong ;

Li, Jialong ;

Si, Yongjie ;

Tan, Jiaxin ;

He, Qianhua .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 :2297-2311

← 1 2 3 →