Binaural Speech Separation Based on the Time-Frequency Binary Mask

被引:0
作者
Mahmoodzadeh, A. [1 ]
Abutalebi, H. R. [2 ]
Soltanian-Zadeh, H. [3 ,4 ]
Sheikhzadeh, H. [5 ]
机构
[1] Islamic Azad Univ, EE Dept, Fars Sci & Res Branch, Shiraz, Iran
[2] Yazd Univ, ECE Dept, Speech Proc Res, Shiraz, Iran
[3] Univ Tehran, Control & Intelligent Proc Ctr Excellence, Tehran, Iran
[4] Henry Ford Hlth, Image Anal Lab, Detroit, MI USA
[5] Amirkabir Univ Technol, Tehran, Iran
来源
2012 SIXTH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST) | 2012年
关键词
interaural intensity differences; interaural time differences; speech separation; time-frequency binary mask; BLIND SEPARATION; RECOGNITION; SIGNALS;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The perceptual ability of the human auditory system in capturing the target voice and filtering out the interferers has been remained as a great challenge. This paper proposes a binaural system for speech segregation based on spatial localization cues: Interaural Time Differences (ITD) and Interaural Intensity Differences (IID). A target speech signal is separated from interfering sounds by estimating time-frequency masks using the multi-level extension of the Otsu thresholding algorithm used in image segmentation. The ITD and IID are important features for mask estimation in low and high frequencies, respectively. A systematic evaluation in terms of Perceptual Evaluation of Speech Quality (PESQ) index shows that the resulting system yields significant improvement in performance of speech separation.
引用
收藏
页码:848 / 853
页数:6
相关论文
共 50 条
[31]   Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising [J].
Williamson, Donald S. ;
Wang, DeLiang .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (07) :1492-1501
[32]   Identifying Important Time-frequency Locations in Continuous Speech Utterances [J].
Kavaki, Hassan Salami ;
Mandel, Michael, I .
INTERSPEECH 2020, 2020, :1639-1643
[33]   Time-Frequency Emotional Assessment of Speech using the Wigner Function [J].
Materdey, Thomas ;
Materdey, Albert ;
Materdey, Alexander ;
Truong, Alice ;
Materdey, Tomas .
2018 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRONICS & COMMUNICATIONS ENGINEERING (ICCECE), 2018, :128-133
[34]   A real-time time-frequency based instantaneous frequency estimator [J].
Stankovic, Ljubisa ;
Dakovic, Milos ;
Thayaparan, Thayananthan .
SIGNAL PROCESSING, 2013, 93 (05) :1392-1397
[35]   Time-Frequency Feature and AMS-GMM Mask for Acoustic Emotion Classification [J].
Zao, L. ;
Cavalcante, D. ;
Coelho, R. .
IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (05) :620-624
[36]   Speech activity detection using time-frequency auditory spectral pattern [J].
Mondal, Sujoy ;
Das Barman, Abhirup .
APPLIED ACOUSTICS, 2020, 167
[37]   GAIT RECOGNITION BASED ON TIME-FREQUENCY ANALYSIS [J].
Huang, Xiaxi ;
Boulgouris, Nikolaos V. ;
Georgakis, Apostolos .
2009 16TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, VOLS 1 AND 2, 2009, :646-649
[38]   Selective Time-Frequency Reassignment Based on Synchrosqueezing [J].
Ahrabian, Alireza ;
Mandic, Danilo P. .
IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (11) :2039-2043
[39]   Improvement of Mask-Based Speech Source Separation Using DNN [J].
Zhan, Ge ;
Huang, Zhaoqiong ;
Ying, Dongwen ;
Pan, Jielin ;
Yan, Yonghong .
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[40]   Time-Frequency Filter Bank: A Simple Approach for Audio and Music Separation [J].
Yang, Ning ;
Usman, Muhammad ;
He, Xiangjian ;
Jan, Mian Ahmad ;
Zhang, Liming .
IEEE ACCESS, 2017, 5 :27114-27125