Binaural speech separation algorithm based on long and short time memory networks

被引：0

作者：

Zhou L. ^{[1
]}

Lu S. ^{[1
]}

Zhong Q. ^{[1
]}

Chen Y. ^{[1
,2
]}

Tang Y. ^{[3
]}

Zhou Y. ^{[3
]}

机构：

[1] School of Information Science and Engineering, Southeast University, Nanjing

[2] Department of Psychiatry, Columbia University, NYSPI, New York

[3] College of Internet of Things Engineering, Hohai University, Changzhou

来源：

Zhou, Lin (Linzhou@seu.edu.cn) | 2020年 / Tech Science Press卷 / 63期

基金：

中国国家自然科学基金;

关键词：

Binaural speech separation; Feature vectors; Ideal ratio mask; Long and short time memory networks;

D O I：

10.32604/CMC.2020.010182

中图分类号：

学科分类号：

摘要：

Speaker separation in complex acoustic environment is one of challenging tasks in speech separation. In practice, speakers are very often unmoving or moving slowly in normal communication. In this case, the spatial features among the consecutive speech frames become highly correlated such that it is helpful for speaker separation by providing additional spatial information. To fully exploit this information, we design a separation system on Recurrent Neural Network (RNN) with long short-term memory (LSTM) which effectively learns the temporal dynamics of spatial features. In detail, a LSTM-based speaker separation algorithm is proposed to extract the spatial features in each time-frequency (TF) unit and form the corresponding feature vector. Then, we treat speaker separation as a supervised learning problem, where a modified ideal ratio mask (IRM) is defined as the training function during LSTM learning. Simulations show that the proposed system achieves attractive separation performance in noisy and reverberant environments. Specifically, during the untrained acoustic test with limited priors, e.g., unmatched signal to noise ratio (SNR) and reverberation, the proposed LSTM based algorithm can still outperforms the existing DNN based method in the measures of PESQ and STOI. It indicates our method is more robust in untrained conditions. © 2020 Tech Science Press. All rights reserved.

引用

页码：1373 / 1386

页数：13

共 8 条

[1] Binaural Speech Separation Algorithm Based on Long and Short Time Memory Networks
Zhou, Lin
Lu, Siyuan
Zhong, Qiuyue
Chen, Ying
Tang, Yibin
Zhou, Yan
CMC-COMPUTERS MATERIALS & CONTINUA, 2020, 63 (03): : 1373 - 1386
[2] Binaural Speech Separation Algorithm Based on Deep Clustering
Zhou, Lin
Feng, Kun
Wang, Tianyi
Xu, Yue
Shi, Jingang
INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2021, 30 (02) : 527 - 537
[3] Binaural reverberant Speech separation based on deep neural networks
Zhang, Xueliang
Wang, DeLiang
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2018 - 2022
[4] Segmented Time-Frequency Masking Algorithm for Speech Separation Based on Deep Neural Networks
Guo, Xinyu
Ou, Shifeng
Gao, Meng
Gao, Ying
2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 445 - 450
[5] REAL-TIME BINAURAL SPEECH SEPARATION WITH PRESERVED SPATIAL CUES
Han, Cong
Luo, Yi
Mesgarani, Nima
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6404 - 6408
[6] Deep Learning Based Binaural Speech Separation in Reverberant Environments
Zhang, Xueliang
Wang, DeLiang
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (05) : 1075 - 1084
[7] Robust binaural speech separation in adverse conditions based on deep neural network with modified spatial features and training target
Dadvar, Paria
Geravanchizadeh, Masoud
SPEECH COMMUNICATION, 2019, 108 : 41 - 52
[8] Speech separation based on reliable binaural cues with two-stage neural network in noisy-reverberant environments
Li, Ruwei
Li, Tao
Sun, Xiaoyue
Sun, Xingwu
Zhao, Fengnian
APPLIED ACOUSTICS, 2020, 168

← 1 →