UNSUPERVISED MULTI-CHANNEL SEPARATION AND ADAPTATION

被引:0
作者
Han, Cong [1 ,2 ]
Wilson, Kevin [2 ]
Wisdom, Scott [2 ]
Hershey, John R. [2 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
[2] Google, Mountain View, CA 94043 USA
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年
关键词
multi-channel; speech separation;
D O I
10.1109/ICASSP48485.2024.10447422
中图分类号
学科分类号
摘要
A key challenge in machine learning is to generalize from training data to an application domain of interest. This work extends the recently-proposed mixture invariant training (MixIT) algorithm to perform unsupervised learning in the multi-channel setting. We use MixIT to train a model on far-field microphone array recordings of overlapping reverberant and noisy speech from the AMI Corpus. The models are trained on both supervised and unsupervised training data, and are tested on real AMI recordings containing overlapping speech. To objectively evaluate our models, we also use a synthetic multi-channel AMI test set. Holding network architectures constant, we find that semi-supervised fine-tuning of a model pretrained on a large and diverse single-channel dataset yields the largest improvement to SI-SNR and to human listening ratings across synthetic and real datasets, outperforming supervised models trained on well-matched synthetic data. Our results demonstrate that unsupervised learning through MixIT enables model adaptation on both single- and multi-channel real-world speech recordings.
引用
收藏
页码:721 / 725
页数:5
相关论文
共 50 条
  • [41] Measuring channel balance in multi-channel radar receivers
    Bickel, D. L.
    Doerry, A. W.
    RADAR SENSOR TECHNOLOGY XXII, 2018, 10633
  • [42] Audio-Visual End-to-End Multi-Channel Speech Separation, Dereverberation and Recognition
    Li, Guinan
    Deng, Jiajun
    Geng, Mengzhe
    Jin, Zengrui
    Wang, Tianzi
    Hu, Shujie
    Cui, Mingyu
    Meng, Helen
    Liu, Xunying
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2707 - 2723
  • [43] AZIMUTH AMBIGUITY OF MULTI-CHANNEL SAR
    Ma, Xile
    Sun, Zaoyu
    Dong, Zhen
    Huang, Haifeng
    2012 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2012, : 3807 - 3810
  • [44] Multi-Channel Talker-Independent Speaker Separation Through Location-Based Training
    Taherian, Hassan
    Tan, Ke
    Wang, Deliang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2791 - 2800
  • [45] Audio-visual Multi-channel Recognition of Overlapped Speech
    Yu, Jianwei
    Wu, Bo
    Gu, Rongzhi
    Zhang, Shi-Xiong
    Chen, Lianwu
    Xu, Yong
    Yu, Meng
    Su, Dan
    Yu, Dong
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2020, 2020, : 3496 - 3500
  • [46] DMANET: DEEP LEARNING-BASED DIFFERENTIAL MICROPHONE ARRAYS FOR MULTI-CHANNEL SPEECH SEPARATION
    Yang, Xiaokang
    Wei, Jianguo
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4363 - 4367
  • [47] Multi-channel and multi-function terahertz metasurface
    Li, Jiu-sheng
    Guo, Feng-lei
    Chen, Yi
    OPTICS COMMUNICATIONS, 2023, 537
  • [48] A hybrid model for unsupervised single channel speech separation
    Kumar, M. K. Prasanna
    Kumaraswamy, R.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (05) : 13241 - 13259
  • [49] A hybrid model for unsupervised single channel speech separation
    MK Prasanna Kumar
    R. Kumaraswamy
    Multimedia Tools and Applications, 2024, 83 : 13241 - 13259
  • [50] Constrained Channel Assignment in Multi-channel Wireless Mesh Network
    Salleh, Shaharuddin
    Salahudin, Nur Atikah
    JURNAL TEKNOLOGI, 2014, 66 (01):