Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation

被引:21
|
作者
Zhang, Zhuohuang [1 ,2 ]
Xu, Yong [3 ]
Yu, Meng [3 ]
Zhang, Shi-Xiong [3 ]
Chen, Lianwu [4 ]
Williamson, Donald S. [1 ]
Yu, Dong [3 ]
机构
[1] Indiana Univ, Dept Comp Sci, Bloomington, IN 47408 USA
[2] Indiana Univ, Dept Speech Language & Hearing Sci, Bloomington, IN 47408 USA
[3] Tencent Al Lab, Bellevue, WA 98004 USA
[4] Tencent AI Lab, Shenzhen 518054, Peoples R China
关键词
Nonlinear distortion; Covariance matrices; Artificial neural networks; Array signal processing; Noise measurement; Feature extraction; Task analysis; Speech separation; deep learning; MVDR; ADL-MVDR; RECURRENT NEURAL-NETWORK; NOISE-REDUCTION; ENHANCEMENT; SINGLE; PERFORMANCE; MODEL; DEREVERBERATION; RECOGNITION; BEAMFORMER;
D O I
10.1109/TASLP.2021.3129335
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Many purely neural network based speech separation approaches have been proposed to improve objective assessment scores, but they often introduce nonlinear distortions that are harmful to modern automatic speech recognition (ASR) systems. Minimum variance distortionless response (MVDR) filters are often adopted to remove nonlinear distortions, however, conventional neural mask-based MVDR systems still result in relatively high levels of residual noise. Moreover, the matrix inverse involved in the MVDR solution is sometimes numerically unstable during joint training with neural networks. In this study, we propose a multi-channel multi-frame (MCMF) all deep learning (ADL)-MVDR approach for target speech separation, which extends our preliminary multi-channel ADL-MVDR approach. The proposed MCMF ADL-MVDR system addresses linear and nonlinear distortions. Spatio-temporal cross correlations are also fully utilized in the proposed approach. The proposed systems are evaluated using a Mandarin audio-visual corpus and are compared with several state-of-the-art approaches. Experimental results demonstrate the superiority of our proposed systems under different scenarios and across several objective evaluation metrics, including ASR performance.
引用
收藏
页码:3526 / 3540
页数:15
相关论文
共 50 条
  • [31] Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation
    Wang, Xiaofei
    Wang, Dongmei
    Kanda, Naoyuki
    Eskimez, Sefik Emre
    Yoshioka, Takuya
    INTERSPEECH 2022, 2022, : 3814 - 3818
  • [32] DESNET: A MULTI-CHANNEL NETWORK FOR SIMULTANEOUS SPEECH DEREVERBERATION, ENHANCEMENT AND SEPARATION
    Fu, Yihui
    Wu, Jian
    Hu, Yanxin
    Xing, Mengtao
    Xie, Lei
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 857 - 864
  • [33] An End-to-end Architecture of Online Multi-channel Speech Separation
    Wu, Jian
    Chen, Zhuo
    Li, Jinyu
    Yoshioka, Takuya
    Tan, Zhili
    Lin, Edward
    Luo, Yi
    Xie, Lei
    INTERSPEECH 2020, 2020, : 81 - 85
  • [34] AUDIO-VISUAL MULTI-CHANNEL SPEECH SEPARATION, DEREVERBERATION AND RECOGNITION
    Li, Guinan
    Yu, Jianwei
    Deng, Jiajun
    Liu, Xunying
    Meng, Helen
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6042 - 6046
  • [35] Multi-frame thinking
    Pesut, DJ
    NURSING OUTLOOK, 1999, 47 (05) : 200 - 200
  • [36] Exploiting Multi-Channel Speech Presence Probability in Parametric Multi-Channel Wiener Filter
    Bagheri, Saeed
    Giacobello, Daniele
    INTERSPEECH 2019, 2019, : 101 - 105
  • [37] Multi-channel interference separation for the AWGN channel
    Jong, GJ
    Liao, PJ
    Jung, CY
    Su, TJ
    ISPACS 2005: PROCEEDINGS OF THE 2005 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS, 2005, : 581 - 584
  • [38] Multi-channel Memristive Pulse Coupled Neural Network Based Multi-frame Images Super-resolution Reconstruction Algorithm
    Dong Zhekang
    Du Chenjie
    Lin Huipin
    Lai Chun Sing
    Hu Xiaofang
    Duan Shukai
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2020, 42 (04) : 835 - 843
  • [39] UNSUPERVISED MULTI-CHANNEL SEPARATION AND ADAPTATION
    Han, Cong
    Wilson, Kevin
    Wisdom, Scott
    Hershey, John R.
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 721 - 725
  • [40] Multi-Channel Signal Separation by Decorrelation
    Weinstein, Ehud
    Feder, Meir
    Oppenheim, Alan V.
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (04): : 405 - 413