UNSUPERVISED MULTI-CHANNEL SEPARATION AND ADAPTATION

被引：0

作者：

Han, Cong ^{[1
,2
]}

Wilson, Kevin ^{[2
]}

Wisdom, Scott ^{[2
]}

Hershey, John R. ^{[2
]}

机构：

[1] Columbia Univ, New York, NY 10027 USA

[2] Google, Mountain View, CA 94043 USA

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年

关键词：

multi-channel; speech separation;

D O I：

10.1109/ICASSP48485.2024.10447422

中图分类号：

学科分类号：

摘要：

A key challenge in machine learning is to generalize from training data to an application domain of interest. This work extends the recently-proposed mixture invariant training (MixIT) algorithm to perform unsupervised learning in the multi-channel setting. We use MixIT to train a model on far-field microphone array recordings of overlapping reverberant and noisy speech from the AMI Corpus. The models are trained on both supervised and unsupervised training data, and are tested on real AMI recordings containing overlapping speech. To objectively evaluate our models, we also use a synthetic multi-channel AMI test set. Holding network architectures constant, we find that semi-supervised fine-tuning of a model pretrained on a large and diverse single-channel dataset yields the largest improvement to SI-SNR and to human listening ratings across synthetic and real datasets, outperforming supervised models trained on well-matched synthetic data. Our results demonstrate that unsupervised learning through MixIT enables model adaptation on both single- and multi-channel real-world speech recordings.

引用

页码：721 / 725

页数：5

共 50 条

[1] Iteratively Refined Multi-Channel Speech Separation
Zhang, Xu
Bao, Changchun
Yang, Xue
Zhou, Jing
APPLIED SCIENCES-BASEL, 2024, 14 (14):
[2] MULTI-BAND PIT AND MODEL INTEGRATION FOR IMPROVED MULTI-CHANNEL SPEECH SEPARATION
Chen, Lianwu
Yu, Meng
Su, Dan
Yu, Dong
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 705 - 709
[3] Multi-channel separation of dynamic speech and sound events
Fujimura, Takuya
Scheibler, Robin
INTERSPEECH 2023, 2023, : 3749 - 3753
[4] MULTI-CHANNEL TARGET SPEECH EXTRACTION WITH CHANNEL DECORRELATION AND TARGET SPEAKER ADAPTATION
Han, Jiangyu
Zhou, Xinyuan
Long, Yanhua
Li, Yijie
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6094 - 6098
[5] Multi-Channel Conversational Speaker Separation via Neural Diarization
Taherian, Hassan
Wang, DeLiang
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2467 - 2476
[6] An Unsupervised Compressed Sensing Algorithm for Multi-Channel Neural Recording and Spike Sorting
Xiong, Tao
Zhang, Jie
Martinez-Rubio, Clarissa
Thakur, Chetan S.
Eskandar, Emad N.
Chin, Sang Peter
Etienne-Cummings, Ralph
Tran, Trac D.
IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2018, 26 (06) : 1121 - 1130
[7] ON END-TO-END MULTI-CHANNEL TIME DOMAIN SPEECH SEPARATION IN REVERBERANT ENVIRONMENTS
Zhang, Jisi
Zorila, Catalin
Doddipatla, Rama
Barker, Jon
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6389 - 6393
[8] A separation and interaction framework for causal multi-channel speech enhancement
Liu, Wenzhe
Li, Andong
Zheng, Chengshi
Li, Xiaodong
DIGITAL SIGNAL PROCESSING, 2022, 126
[9] DESNET: A MULTI-CHANNEL NETWORK FOR SIMULTANEOUS SPEECH DEREVERBERATION, ENHANCEMENT AND SEPARATION
Fu, Yihui
Wu, Jian
Hu, Yanxin
Xing, Mengtao
Xie, Lei
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 857 - 864
[10] Multi-Channel Speech Separation Using Spatially Selective Deep Non-Linear Filters
Tesch, Kristina
Gerkmann, Timo
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 542 - 553

← 1 2 3 4 5 →