UNSUPERVISED MULTI-CHANNEL SEPARATION AND ADAPTATION

被引:0
|
作者
Han, Cong [1 ,2 ]
Wilson, Kevin [2 ]
Wisdom, Scott [2 ]
Hershey, John R. [2 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
[2] Google, Mountain View, CA 94043 USA
关键词
multi-channel; speech separation;
D O I
10.1109/ICASSP48485.2024.10447422
中图分类号
学科分类号
摘要
A key challenge in machine learning is to generalize from training data to an application domain of interest. This work extends the recently-proposed mixture invariant training (MixIT) algorithm to perform unsupervised learning in the multi-channel setting. We use MixIT to train a model on far-field microphone array recordings of overlapping reverberant and noisy speech from the AMI Corpus. The models are trained on both supervised and unsupervised training data, and are tested on real AMI recordings containing overlapping speech. To objectively evaluate our models, we also use a synthetic multi-channel AMI test set. Holding network architectures constant, we find that semi-supervised fine-tuning of a model pretrained on a large and diverse single-channel dataset yields the largest improvement to SI-SNR and to human listening ratings across synthetic and real datasets, outperforming supervised models trained on well-matched synthetic data. Our results demonstrate that unsupervised learning through MixIT enables model adaptation on both single- and multi-channel real-world speech recordings.
引用
收藏
页码:721 / 725
页数:5
相关论文
共 50 条
  • [11] Multi-Channel Feature Adaptation for Robust Speech Recognition
    Zhang, Zhaofeng
    Xiao, Xiong
    Wang, Longbiao
    Dang, Jianwu
    Iwahashi, Masahiro
    Chng, Eng Siong
    Li, Haizhou
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [12] Multi-channel separation of dynamic speech and sound events
    Fujimura, Takuya
    Scheibler, Robin
    INTERSPEECH 2023, 2023, : 3749 - 3753
  • [13] A New Neural Beamformer for Multi-channel Speech Separation
    Liu, Ruqiao
    Zhou, Yi
    Liu, Hongqing
    Xu, Xinmeng
    Jia, Jie
    Chen, Binbin
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2022, 94 (10): : 977 - 987
  • [14] Multi-channel source separation preserving spatial information
    Aichner, Robert
    Buchner, Herbert
    Zourub, Meray
    Kellermann, Walter
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PTS 1-3, PROCEEDINGS, 2007, : 5 - 8
  • [15] MULTI-CHANNEL RECORDER WITH TIME SEPARATION OF MEASURING CHANNELS
    MIKHAILOV, VI
    NAEK, SV
    PRIBORY I TEKHNIKA EKSPERIMENTA, 1975, (06): : 93 - 94
  • [16] A New Neural Beamformer for Multi-channel Speech Separation
    Ruqiao Liu
    Yi Zhou
    Hongqing Liu
    Xinmeng Xu
    Jie Jia
    Binbin Chen
    Journal of Signal Processing Systems, 2022, 94 : 977 - 987
  • [17] Unsupervised change-detection from multi-channel SAR data
    Moser, Gabriele
    Serpico, Sebastiano B.
    2006 7TH NORDIC SIGNAL PROCESSING SYMPOSIUM, 2006, : 246 - +
  • [18] MULTI-CHANNEL TARGET SPEECH EXTRACTION WITH CHANNEL DECORRELATION AND TARGET SPEAKER ADAPTATION
    Han, Jiangyu
    Zhou, Xinyuan
    Long, Yanhua
    Li, Yijie
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6094 - 6098
  • [19] Improvement of Spatial Ambiguity in Multi-Channel Speech Separation Using Channel Attention
    Hong, Qian-Bei
    Wu, Chung-Hsien
    Thanh Binh Nguyen
    Wang, Hsin-Min
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 619 - 623
  • [20] Multi-Channel Speech Separation with Cross-Attention and Beamforming
    Mosner, Ladislav
    Plchot, Oldrich
    Peng, Junyi
    Burget, Lukas
    Cernocky, Jan Honza
    INTERSPEECH 2023, 2023, : 1693 - 1697