UNSUPERVISED MULTI-CHANNEL SEPARATION AND ADAPTATION

被引:0
作者
Han, Cong [1 ,2 ]
Wilson, Kevin [2 ]
Wisdom, Scott [2 ]
Hershey, John R. [2 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
[2] Google, Mountain View, CA 94043 USA
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年
关键词
multi-channel; speech separation;
D O I
10.1109/ICASSP48485.2024.10447422
中图分类号
学科分类号
摘要
A key challenge in machine learning is to generalize from training data to an application domain of interest. This work extends the recently-proposed mixture invariant training (MixIT) algorithm to perform unsupervised learning in the multi-channel setting. We use MixIT to train a model on far-field microphone array recordings of overlapping reverberant and noisy speech from the AMI Corpus. The models are trained on both supervised and unsupervised training data, and are tested on real AMI recordings containing overlapping speech. To objectively evaluate our models, we also use a synthetic multi-channel AMI test set. Holding network architectures constant, we find that semi-supervised fine-tuning of a model pretrained on a large and diverse single-channel dataset yields the largest improvement to SI-SNR and to human listening ratings across synthetic and real datasets, outperforming supervised models trained on well-matched synthetic data. Our results demonstrate that unsupervised learning through MixIT enables model adaptation on both single- and multi-channel real-world speech recordings.
引用
收藏
页码:721 / 725
页数:5
相关论文
共 50 条
  • [21] LOCATION-BASED TRAINING FOR MULTI-CHANNEL TALKER-INDEPENDENT SPEAKER SEPARATION
    Taherian, Hassan
    Tan, Ke
    Wang, DeLiang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 696 - 700
  • [22] END-TO-END MICROPHONE PERMUTATION AND NUMBER INVARIANT MULTI-CHANNEL SPEECH SEPARATION
    Luo, Yi
    Chen, Zhuo
    Mesgarani, Nima
    Yoshioka, Takuya
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6394 - 6398
  • [23] Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output
    Chen, Hangting
    Yi, Yang
    Feng, Dang
    Zhang, Pengyuan
    INTERSPEECH 2022, 2022, : 866 - 870
  • [24] SPLIT-ATTENTION MECHANISMS WITH GRAPH CONVOLUTIONAL NETWORK FOR MULTI-CHANNEL SPEECH SEPARATION
    Tan, YingWei
    Ding, XueFeng
    2024 18TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT, IWAENC 2024, 2024, : 140 - 144
  • [25] EFFICIENT INTEGRATION OF FIXED BEAMFORMERS AND SPEECH SEPARATION NETWORKS FOR MULTI-CHANNEL FAR-FIELD SPEECH SEPARATION
    Chen, Zhuo
    Yoshioka, Takuya
    Xiao, Xiong
    Li, Jinyu
    Seltzer, Michael L.
    Gong, Yifan
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5384 - 5388
  • [26] DE-DPCTnet: Deep Encoder Dual-path Convolutional Transformer Network for Multi-channel Speech Separation
    Wang, Zhenyu
    Zhou, Yi
    Gan, Lu
    Chen, Rilin
    Tang, Xinyu
    Liu, Hongqing
    2022 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2022, : 180 - 184
  • [27] Evaluating Multi-Channel Multi-Device Speech Separation Algorithms in the Wild: A Hardware-Software Solution
    Ceolini, Enea
    Kiselev, Ilya
    Liu, Shih-Chii
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1428 - 1439
  • [28] On the Impact of Adjacent Channel Interference in Multi-Channel VANETs
    Campolo, Claudia
    Sommer, Christoph
    Dressler, Falko
    Molinaro, Antonella
    2016 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2016,
  • [29] Understanding Adjacent Channel Interference in Multi-Channel VANETs
    Campolo, Claudia
    Molinaro, Antonella
    Vinel, Alexey
    2014 IEEE VEHICULAR NETWORKING CONFERENCE (VNC), 2014,
  • [30] Optimization of multi-channel interferometer.
    Koudryashov, YY
    Morzhakov, AA
    THIRD INTERNATIONAL CONFERENCE ON VIBRATION MEASUREMENTS BY LASER TECHNIQUES: ADVANCES AND APPLICATIONS, 1998, 3411 : 236 - 238