UNSUPERVISED MULTI-CHANNEL SEPARATION AND ADAPTATION

被引：0

作者：

Han, Cong ^{[1
,2
]}

Wilson, Kevin ^{[2
]}

Wisdom, Scott ^{[2
]}

Hershey, John R. ^{[2
]}

机构：

[1] Columbia Univ, New York, NY 10027 USA

[2] Google, Mountain View, CA 94043 USA

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年

关键词：

multi-channel; speech separation;

D O I：

10.1109/ICASSP48485.2024.10447422

中图分类号：

学科分类号：

摘要：

A key challenge in machine learning is to generalize from training data to an application domain of interest. This work extends the recently-proposed mixture invariant training (MixIT) algorithm to perform unsupervised learning in the multi-channel setting. We use MixIT to train a model on far-field microphone array recordings of overlapping reverberant and noisy speech from the AMI Corpus. The models are trained on both supervised and unsupervised training data, and are tested on real AMI recordings containing overlapping speech. To objectively evaluate our models, we also use a synthetic multi-channel AMI test set. Holding network architectures constant, we find that semi-supervised fine-tuning of a model pretrained on a large and diverse single-channel dataset yields the largest improvement to SI-SNR and to human listening ratings across synthetic and real datasets, outperforming supervised models trained on well-matched synthetic data. Our results demonstrate that unsupervised learning through MixIT enables model adaptation on both single- and multi-channel real-world speech recordings.

引用

页码：721 / 725

页数：5

共 50 条

[21] LOCATION-BASED TRAINING FOR MULTI-CHANNEL TALKER-INDEPENDENT SPEAKER SEPARATION
Taherian, Hassan
Tan, Ke
Wang, DeLiang
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 696 - 700
[22] END-TO-END MICROPHONE PERMUTATION AND NUMBER INVARIANT MULTI-CHANNEL SPEECH SEPARATION
Luo, Yi
Chen, Zhuo
Mesgarani, Nima
Yoshioka, Takuya
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6394 - 6398
[23] Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output
Chen, Hangting
Yi, Yang
Feng, Dang
Zhang, Pengyuan
INTERSPEECH 2022, 2022, : 866 - 870
[24] SPLIT-ATTENTION MECHANISMS WITH GRAPH CONVOLUTIONAL NETWORK FOR MULTI-CHANNEL SPEECH SEPARATION
Tan, YingWei
Ding, XueFeng
2024 18TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT, IWAENC 2024, 2024, : 140 - 144
[25] EFFICIENT INTEGRATION OF FIXED BEAMFORMERS AND SPEECH SEPARATION NETWORKS FOR MULTI-CHANNEL FAR-FIELD SPEECH SEPARATION
Chen, Zhuo
Yoshioka, Takuya
Xiao, Xiong
Li, Jinyu
Seltzer, Michael L.
Gong, Yifan
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5384 - 5388
[26] DE-DPCTnet: Deep Encoder Dual-path Convolutional Transformer Network for Multi-channel Speech Separation
Wang, Zhenyu
Zhou, Yi
Gan, Lu
Chen, Rilin
Tang, Xinyu
Liu, Hongqing
2022 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2022, : 180 - 184
[27] Evaluating Multi-Channel Multi-Device Speech Separation Algorithms in the Wild: A Hardware-Software Solution
Ceolini, Enea
Kiselev, Ilya
Liu, Shih-Chii
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1428 - 1439
[28] On the Impact of Adjacent Channel Interference in Multi-Channel VANETs
Campolo, Claudia
Sommer, Christoph
Dressler, Falko
Molinaro, Antonella
2016 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2016,
[29] Understanding Adjacent Channel Interference in Multi-Channel VANETs
Campolo, Claudia
Molinaro, Antonella
Vinel, Alexey
2014 IEEE VEHICULAR NETWORKING CONFERENCE (VNC), 2014,
[30] Optimization of multi-channel interferometer.
Koudryashov, YY
Morzhakov, AA
THIRD INTERNATIONAL CONFERENCE ON VIBRATION MEASUREMENTS BY LASER TECHNIQUES: ADVANCES AND APPLICATIONS, 1998, 3411 : 236 - 238

← 1 2 3 4 5 →