UNSUPERVISED MULTI-CHANNEL SEPARATION AND ADAPTATION

被引:0
作者
Han, Cong [1 ,2 ]
Wilson, Kevin [2 ]
Wisdom, Scott [2 ]
Hershey, John R. [2 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
[2] Google, Mountain View, CA 94043 USA
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年
关键词
multi-channel; speech separation;
D O I
10.1109/ICASSP48485.2024.10447422
中图分类号
学科分类号
摘要
A key challenge in machine learning is to generalize from training data to an application domain of interest. This work extends the recently-proposed mixture invariant training (MixIT) algorithm to perform unsupervised learning in the multi-channel setting. We use MixIT to train a model on far-field microphone array recordings of overlapping reverberant and noisy speech from the AMI Corpus. The models are trained on both supervised and unsupervised training data, and are tested on real AMI recordings containing overlapping speech. To objectively evaluate our models, we also use a synthetic multi-channel AMI test set. Holding network architectures constant, we find that semi-supervised fine-tuning of a model pretrained on a large and diverse single-channel dataset yields the largest improvement to SI-SNR and to human listening ratings across synthetic and real datasets, outperforming supervised models trained on well-matched synthetic data. Our results demonstrate that unsupervised learning through MixIT enables model adaptation on both single- and multi-channel real-world speech recordings.
引用
收藏
页码:721 / 725
页数:5
相关论文
共 50 条
  • [1] Iteratively Refined Multi-Channel Speech Separation
    Zhang, Xu
    Bao, Changchun
    Yang, Xue
    Zhou, Jing
    APPLIED SCIENCES-BASEL, 2024, 14 (14):
  • [2] MULTI-BAND PIT AND MODEL INTEGRATION FOR IMPROVED MULTI-CHANNEL SPEECH SEPARATION
    Chen, Lianwu
    Yu, Meng
    Su, Dan
    Yu, Dong
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 705 - 709
  • [3] Multi-channel separation of dynamic speech and sound events
    Fujimura, Takuya
    Scheibler, Robin
    INTERSPEECH 2023, 2023, : 3749 - 3753
  • [4] MULTI-CHANNEL TARGET SPEECH EXTRACTION WITH CHANNEL DECORRELATION AND TARGET SPEAKER ADAPTATION
    Han, Jiangyu
    Zhou, Xinyuan
    Long, Yanhua
    Li, Yijie
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6094 - 6098
  • [5] Multi-Channel Conversational Speaker Separation via Neural Diarization
    Taherian, Hassan
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2467 - 2476
  • [6] An Unsupervised Compressed Sensing Algorithm for Multi-Channel Neural Recording and Spike Sorting
    Xiong, Tao
    Zhang, Jie
    Martinez-Rubio, Clarissa
    Thakur, Chetan S.
    Eskandar, Emad N.
    Chin, Sang Peter
    Etienne-Cummings, Ralph
    Tran, Trac D.
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2018, 26 (06) : 1121 - 1130
  • [7] ON END-TO-END MULTI-CHANNEL TIME DOMAIN SPEECH SEPARATION IN REVERBERANT ENVIRONMENTS
    Zhang, Jisi
    Zorila, Catalin
    Doddipatla, Rama
    Barker, Jon
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6389 - 6393
  • [8] A separation and interaction framework for causal multi-channel speech enhancement
    Liu, Wenzhe
    Li, Andong
    Zheng, Chengshi
    Li, Xiaodong
    DIGITAL SIGNAL PROCESSING, 2022, 126
  • [9] DESNET: A MULTI-CHANNEL NETWORK FOR SIMULTANEOUS SPEECH DEREVERBERATION, ENHANCEMENT AND SEPARATION
    Fu, Yihui
    Wu, Jian
    Hu, Yanxin
    Xing, Mengtao
    Xie, Lei
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 857 - 864
  • [10] Multi-Channel Speech Separation Using Spatially Selective Deep Non-Linear Filters
    Tesch, Kristina
    Gerkmann, Timo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 542 - 553