Zero-shot test time adaptation via knowledge distillation for personalized speech denoising and dereverberation

被引：2

作者：

Kim, Sunwoo ^{[1
]}

Athi, Mrudula ^{[1
]}

Shi, Guangji ^{[1
]}

Kim, Minje ^{[1
,2
]}

Kristjansson, Trausti ^{[1
]}

机构：

[1] Amazon Lab126, Sunnyvale, CA 94089 USA

[2] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 2024年 / 155卷 / 02期

基金：

美国国家科学基金会;

关键词：

DOMAIN ADAPTATION; ENHANCEMENT; NOISE;

D O I：

10.1121/10.0024621

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

A personalization framework to adapt compact models to test time environments and improve their speech enhancement (SE) performance in noisy and reverberant conditions is proposed. The use-cases are when the end-user device encounters only one or a few speakers and noise types that tend to reoccur in the specific acoustic environment. Hence, a small personalized model that is sufficient to handle this focused subset of the original universal SE problem is postulated. The study addresses a major data shortage issue: although the goal is to learn from a specific user's speech signals and the test time environment, the target clean speech is unavailable for model training due to privacy-related concerns and technical difficulty of recording noise and reverberation-free voice signals. The proposed zero-shot personalization method uses no clean speech target. Instead, it employs the knowledge distillation framework, where the more advanced denoising results from an overly large teacher work as pseudo targets to train a small student model. Evaluation on various test time conditions suggests that the proposed personalization approach can significantly enhance the compact student model's test time performance. Personalized models outperform larger non-personalized baseline models, demonstrating that personalization achieves model compression with no loss in dereverberation and denoising performance.

引用

页码：1353 / 1367

页数：15

共 7 条

[1] TEST-TIME ADAPTATION TOWARD PERSONALIZED SPEECH ENHANCEMENT: ZERO-SHOT LEARNING WITH KNOWLEDGE DISTILLATION
Kim, Sunwoo
Kim, Minje
2021 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2021, : 176 - 180
[2] Triple-0: Zero-shot denoising and dereverberation on an end-to-end frozen anechoic speech separation network
Gul, Sania
Khan, Muhammad Salman
Ur-Rehman, Ata
PLOS ONE, 2024, 19 (07):
[3] Adversarial Distillation Adaptation Model with Sentiment Contrastive Learning for Zero-Shot Stance Detection
Zhang, Yu
Wang, Chunling
Wang, Jia
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2023, 16 (01)
[4] Adversarial Distillation Adaptation Model with Sentiment Contrastive Learning for Zero-Shot Stance Detection
Yu Zhang
Chunling Wang
Jia Wang
International Journal of Computational Intelligence Systems, 16
[5] Generalized zero-shot domain adaptation via coupled conditional variational autoencoders
Wang, Qian
Breckon, Toby P.
NEURAL NETWORKS, 2023, 163 : 40 - 52
[6] Zero-Shot Low-Dose CT Image Denoising via Patch-Based Content-Guided Diffusion Models
Su, Bo
Hu, Xiangyun
Zha, Yunfei
Wu, Zijun
Ma, Yuncheng
Xu, Jiabo
Zhang, Baochang
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
[7] Zero-shot knowledge transfer for seismic damage diagnosis through multi-channel 1D CNN integrated with autoencoder-based domain adaptation
Xiong, Qingsong
Kong, Qingzhao
Xiong, Haibei
Chen, Jiawei
Yuan, Cheng
Wang, Xiaoyou
Xia, Yong
MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2024, 217

← 1 →