UNSUPERVISED MULTI-CHANNEL SEPARATION AND ADAPTATION

被引:0
|
作者
Han, Cong [1 ,2 ]
Wilson, Kevin [2 ]
Wisdom, Scott [2 ]
Hershey, John R. [2 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
[2] Google, Mountain View, CA 94043 USA
关键词
multi-channel; speech separation;
D O I
10.1109/ICASSP48485.2024.10447422
中图分类号
学科分类号
摘要
A key challenge in machine learning is to generalize from training data to an application domain of interest. This work extends the recently-proposed mixture invariant training (MixIT) algorithm to perform unsupervised learning in the multi-channel setting. We use MixIT to train a model on far-field microphone array recordings of overlapping reverberant and noisy speech from the AMI Corpus. The models are trained on both supervised and unsupervised training data, and are tested on real AMI recordings containing overlapping speech. To objectively evaluate our models, we also use a synthetic multi-channel AMI test set. Holding network architectures constant, we find that semi-supervised fine-tuning of a model pretrained on a large and diverse single-channel dataset yields the largest improvement to SI-SNR and to human listening ratings across synthetic and real datasets, outperforming supervised models trained on well-matched synthetic data. Our results demonstrate that unsupervised learning through MixIT enables model adaptation on both single- and multi-channel real-world speech recordings.
引用
收藏
页码:721 / 725
页数:5
相关论文
共 50 条
  • [41] Multi-channel Feedforward ANC System Combined with Noise Source Separation
    Kinoshita, Satoshi
    Kajikawa, Yoshinobu
    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 379 - 383
  • [42] A New Approach to Moving Targets and Background Separation in Multi-Channel SAR
    Wu, Di
    Yaghoobi, Mehrdad
    Davies, Mike
    2016 IEEE RADAR CONFERENCE (RADARCONF), 2016, : 1206 - 1209
  • [43] An End-to-end Architecture of Online Multi-channel Speech Separation
    Wu, Jian
    Chen, Zhuo
    Li, Jinyu
    Yoshioka, Takuya
    Tan, Zhili
    Lin, Edward
    Luo, Yi
    Xie, Lei
    INTERSPEECH 2020, 2020, : 81 - 85
  • [44] DESNET: A MULTI-CHANNEL NETWORK FOR SIMULTANEOUS SPEECH DEREVERBERATION, ENHANCEMENT AND SEPARATION
    Fu, Yihui
    Wu, Jian
    Hu, Yanxin
    Xing, Mengtao
    Xie, Lei
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 857 - 864
  • [45] AUDIO-VISUAL MULTI-CHANNEL SPEECH SEPARATION, DEREVERBERATION AND RECOGNITION
    Li, Guinan
    Yu, Jianwei
    Deng, Jiajun
    Liu, Xunying
    Meng, Helen
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6042 - 6046
  • [46] PROMOTING CONVERGENCE IN MULTI-CHANNEL BLIND SIGNAL SEPARATION USING PNLMS
    Ikram, Muhammad Z.
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 1741 - 1744
  • [47] A survey on control separation techniques in multi-radio multi-channel MAC protocols
    Wang, J. C. -P.
    Abolhasan, M.
    Safaei, F.
    Franklin, D.
    2007 INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES, VOLS 1-3, 2007, : 854 - 859
  • [48] Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation
    Zhang, Zhuohuang
    Xu, Yong
    Yu, Meng
    Zhang, Shi-Xiong
    Chen, Lianwu
    Williamson, Donald S.
    Yu, Dong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3526 - 3540
  • [49] MULTI-BAND PIT AND MODEL INTEGRATION FOR IMPROVED MULTI-CHANNEL SPEECH SEPARATION
    Chen, Lianwu
    Yu, Meng
    Su, Dan
    Yu, Dong
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 705 - 709
  • [50] Visual-LiDAR SLAM Based on Unsupervised Multi-channel Deep Neural Networks
    An, Yi
    Shi, Jin
    Gu, Dongbing
    Liu, Qiang
    COGNITIVE COMPUTATION, 2022, 14 (04) : 1496 - 1508