Binaural Speech Separation Based on the Time-Frequency Binary Mask

被引：0

作者：

Mahmoodzadeh, A. ^{[1
]}

Abutalebi, H. R. ^{[2
]}

Soltanian-Zadeh, H. ^{[3
,4
]}

Sheikhzadeh, H. ^{[5
]}

机构：

[1] Islamic Azad Univ, EE Dept, Fars Sci & Res Branch, Shiraz, Iran

[2] Yazd Univ, ECE Dept, Speech Proc Res, Shiraz, Iran

[3] Univ Tehran, Control & Intelligent Proc Ctr Excellence, Tehran, Iran

[4] Henry Ford Hlth, Image Anal Lab, Detroit, MI USA

[5] Amirkabir Univ Technol, Tehran, Iran

来源：

2012 SIXTH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST) | 2012年

关键词：

interaural intensity differences; interaural time differences; speech separation; time-frequency binary mask; BLIND SEPARATION; RECOGNITION; SIGNALS;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The perceptual ability of the human auditory system in capturing the target voice and filtering out the interferers has been remained as a great challenge. This paper proposes a binaural system for speech segregation based on spatial localization cues: Interaural Time Differences (ITD) and Interaural Intensity Differences (IID). A target speech signal is separated from interfering sounds by estimating time-frequency masks using the multi-level extension of the Otsu thresholding algorithm used in image segmentation. The ITD and IID are important features for mask estimation in low and high frequencies, respectively. A systematic evaluation in terms of Perceptual Evaluation of Speech Quality (PESQ) index shows that the resulting system yields significant improvement in performance of speech separation.

引用

页码：848 / 853

页数：6

共 50 条

[31] Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising [J].

Williamson, Donald S. ;

Wang, DeLiang .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (07) :1492-1501

[32] Identifying Important Time-frequency Locations in Continuous Speech Utterances [J].

Kavaki, Hassan Salami ;

Mandel, Michael, I .

INTERSPEECH 2020, 2020, :1639-1643

[33] Time-Frequency Emotional Assessment of Speech using the Wigner Function [J].

Materdey, Thomas ;

Materdey, Albert ;

Materdey, Alexander ;

Truong, Alice ;

Materdey, Tomas .

2018 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRONICS & COMMUNICATIONS ENGINEERING (ICCECE), 2018, :128-133

[34] A real-time time-frequency based instantaneous frequency estimator [J].

Stankovic, Ljubisa ;

Dakovic, Milos ;

Thayaparan, Thayananthan .

SIGNAL PROCESSING, 2013, 93 (05) :1392-1397

[35] Time-Frequency Feature and AMS-GMM Mask for Acoustic Emotion Classification [J].

Zao, L. ;

Cavalcante, D. ;

Coelho, R. .

IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (05) :620-624

[36] Speech activity detection using time-frequency auditory spectral pattern [J].

Mondal, Sujoy ;

Das Barman, Abhirup .

APPLIED ACOUSTICS, 2020, 167

[37] GAIT RECOGNITION BASED ON TIME-FREQUENCY ANALYSIS [J].

Huang, Xiaxi ;

Boulgouris, Nikolaos V. ;

Georgakis, Apostolos .

2009 16TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, VOLS 1 AND 2, 2009, :646-649

[38] Selective Time-Frequency Reassignment Based on Synchrosqueezing [J].

Ahrabian, Alireza ;

Mandic, Danilo P. .

IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (11) :2039-2043

[39] Improvement of Mask-Based Speech Source Separation Using DNN [J].

Zhan, Ge ;

Huang, Zhaoqiong ;

Ying, Dongwen ;

Pan, Jielin ;

Yan, Yonghong .

2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,

[40] Time-Frequency Filter Bank: A Simple Approach for Audio and Music Separation [J].

Yang, Ning ;

Usman, Muhammad ;

He, Xiangjian ;

Jan, Mian Ahmad ;

Zhang, Liming .

IEEE ACCESS, 2017, 5 :27114-27125

← 1 2 3 4 5 →