Binaural speech separation algorithm based on long and short time memory networks

被引:0
|
作者
Zhou L. [1 ]
Lu S. [1 ]
Zhong Q. [1 ]
Chen Y. [1 ,2 ]
Tang Y. [3 ]
Zhou Y. [3 ]
机构
[1] School of Information Science and Engineering, Southeast University, Nanjing
[2] Department of Psychiatry, Columbia University, NYSPI, New York
[3] College of Internet of Things Engineering, Hohai University, Changzhou
基金
中国国家自然科学基金;
关键词
Binaural speech separation; Feature vectors; Ideal ratio mask; Long and short time memory networks;
D O I
10.32604/CMC.2020.010182
中图分类号
学科分类号
摘要
Speaker separation in complex acoustic environment is one of challenging tasks in speech separation. In practice, speakers are very often unmoving or moving slowly in normal communication. In this case, the spatial features among the consecutive speech frames become highly correlated such that it is helpful for speaker separation by providing additional spatial information. To fully exploit this information, we design a separation system on Recurrent Neural Network (RNN) with long short-term memory (LSTM) which effectively learns the temporal dynamics of spatial features. In detail, a LSTM-based speaker separation algorithm is proposed to extract the spatial features in each time-frequency (TF) unit and form the corresponding feature vector. Then, we treat speaker separation as a supervised learning problem, where a modified ideal ratio mask (IRM) is defined as the training function during LSTM learning. Simulations show that the proposed system achieves attractive separation performance in noisy and reverberant environments. Specifically, during the untrained acoustic test with limited priors, e.g., unmatched signal to noise ratio (SNR) and reverberation, the proposed LSTM based algorithm can still outperforms the existing DNN based method in the measures of PESQ and STOI. It indicates our method is more robust in untrained conditions. © 2020 Tech Science Press. All rights reserved.
引用
收藏
页码:1373 / 1386
页数:13
相关论文
共 8 条
  • [1] Binaural Speech Separation Algorithm Based on Long and Short Time Memory Networks
    Zhou, Lin
    Lu, Siyuan
    Zhong, Qiuyue
    Chen, Ying
    Tang, Yibin
    Zhou, Yan
    CMC-COMPUTERS MATERIALS & CONTINUA, 2020, 63 (03): : 1373 - 1386
  • [2] Binaural Speech Separation Algorithm Based on Deep Clustering
    Zhou, Lin
    Feng, Kun
    Wang, Tianyi
    Xu, Yue
    Shi, Jingang
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2021, 30 (02) : 527 - 537
  • [3] Binaural reverberant Speech separation based on deep neural networks
    Zhang, Xueliang
    Wang, DeLiang
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2018 - 2022
  • [4] Segmented Time-Frequency Masking Algorithm for Speech Separation Based on Deep Neural Networks
    Guo, Xinyu
    Ou, Shifeng
    Gao, Meng
    Gao, Ying
    2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 445 - 450
  • [5] REAL-TIME BINAURAL SPEECH SEPARATION WITH PRESERVED SPATIAL CUES
    Han, Cong
    Luo, Yi
    Mesgarani, Nima
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6404 - 6408
  • [6] Deep Learning Based Binaural Speech Separation in Reverberant Environments
    Zhang, Xueliang
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (05) : 1075 - 1084
  • [7] Robust binaural speech separation in adverse conditions based on deep neural network with modified spatial features and training target
    Dadvar, Paria
    Geravanchizadeh, Masoud
    SPEECH COMMUNICATION, 2019, 108 : 41 - 52
  • [8] Speech separation based on reliable binaural cues with two-stage neural network in noisy-reverberant environments
    Li, Ruwei
    Li, Tao
    Sun, Xiaoyue
    Sun, Xingwu
    Zhao, Fengnian
    APPLIED ACOUSTICS, 2020, 168