UNSUPERVISED MULTI-CHANNEL SEPARATION AND ADAPTATION

被引：0

作者：

Han, Cong ^{[1
,2
]}

Wilson, Kevin ^{[2
]}

Wisdom, Scott ^{[2
]}

Hershey, John R. ^{[2
]}

机构：

[1] Columbia Univ, New York, NY 10027 USA

[2] Google, Mountain View, CA 94043 USA

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年

关键词：

multi-channel; speech separation;

D O I：

10.1109/ICASSP48485.2024.10447422

中图分类号：

学科分类号：

摘要：

A key challenge in machine learning is to generalize from training data to an application domain of interest. This work extends the recently-proposed mixture invariant training (MixIT) algorithm to perform unsupervised learning in the multi-channel setting. We use MixIT to train a model on far-field microphone array recordings of overlapping reverberant and noisy speech from the AMI Corpus. The models are trained on both supervised and unsupervised training data, and are tested on real AMI recordings containing overlapping speech. To objectively evaluate our models, we also use a synthetic multi-channel AMI test set. Holding network architectures constant, we find that semi-supervised fine-tuning of a model pretrained on a large and diverse single-channel dataset yields the largest improvement to SI-SNR and to human listening ratings across synthetic and real datasets, outperforming supervised models trained on well-matched synthetic data. Our results demonstrate that unsupervised learning through MixIT enables model adaptation on both single- and multi-channel real-world speech recordings.

引用

页码：721 / 725

页数：5

共 50 条

[41] Measuring channel balance in multi-channel radar receivers
Bickel, D. L.
Doerry, A. W.
RADAR SENSOR TECHNOLOGY XXII, 2018, 10633
[42] Audio-Visual End-to-End Multi-Channel Speech Separation, Dereverberation and Recognition
Li, Guinan
Deng, Jiajun
Geng, Mengzhe
Jin, Zengrui
Wang, Tianzi
Hu, Shujie
Cui, Mingyu
Meng, Helen
Liu, Xunying
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2707 - 2723
[43] AZIMUTH AMBIGUITY OF MULTI-CHANNEL SAR
Ma, Xile
Sun, Zaoyu
Dong, Zhen
Huang, Haifeng
2012 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2012, : 3807 - 3810
[44] Multi-Channel Talker-Independent Speaker Separation Through Location-Based Training
Taherian, Hassan
Tan, Ke
Wang, Deliang
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2791 - 2800
[45] Audio-visual Multi-channel Recognition of Overlapped Speech
Yu, Jianwei
Wu, Bo
Gu, Rongzhi
Zhang, Shi-Xiong
Chen, Lianwu
Xu, Yong
Yu, Meng
Su, Dan
Yu, Dong
Liu, Xunying
Meng, Helen
INTERSPEECH 2020, 2020, : 3496 - 3500
[46] DMANET: DEEP LEARNING-BASED DIFFERENTIAL MICROPHONE ARRAYS FOR MULTI-CHANNEL SPEECH SEPARATION
Yang, Xiaokang
Wei, Jianguo
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4363 - 4367
[47] Multi-channel and multi-function terahertz metasurface
Li, Jiu-sheng
Guo, Feng-lei
Chen, Yi
OPTICS COMMUNICATIONS, 2023, 537
[48] A hybrid model for unsupervised single channel speech separation
Kumar, M. K. Prasanna
Kumaraswamy, R.
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (05) : 13241 - 13259
[49] A hybrid model for unsupervised single channel speech separation
MK Prasanna Kumar
R. Kumaraswamy
Multimedia Tools and Applications, 2024, 83 : 13241 - 13259
[50] Constrained Channel Assignment in Multi-channel Wireless Mesh Network
Salleh, Shaharuddin
Salahudin, Nur Atikah
JURNAL TEKNOLOGI, 2014, 66 (01):

← 1 2 3 4 5 →