A CONVEX OPTIMIZATION APPROACH FOR TIME-FREQUENCY MASK ESTIMATION

被引:0
|
作者
Bao, Feng [1 ]
Abdulla, Waleed H. [1 ]
机构
[1] Univ Auckland, Elect & Comp Engn Dept, 20 Symond St, Auckland 1010, New Zealand
关键词
Computational auditory scene analysis (CASA); Ideal binary mask (IBM); Convex optimization; Speech enhancement; SPEECH; NOISE; ENHANCEMENT;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a new time-frequency mask method for computational auditory scene analysis (CASA) based on convex optimization of the binary mask. In the proposed method, the pitch estimation and segment segregation in conventional CASA are completely replaced by the convex optimization of speech power. Considering the cross-correlation between the power spectra of noisy speech and noise in each of a Gammatone filterbank channel, the objective function of speech power used for convex optimization is built. The speech power is estimated by gradient descent method. Thus, the time-frequency units dominated by speech and noise are labeled by comparing the powers of noisy and estimated speech, and noise. The erroneous local masks are also removed by using the Teager energy of the estimated speech and time-frequency unit smoothing. The results from the average segmental signal-to-noise ratio improvement, HIT-False Alarm rate and subjective test show that the performance of the proposed method outperforms the reference methods.
引用
收藏
页码:31 / 35
页数:5
相关论文
共 50 条
  • [1] A new time-frequency binary mask estimation method based on convex optimization of speech power
    Bao, Feng
    Abdulla, Waleed H.
    SPEECH COMMUNICATION, 2018, 97 : 51 - 65
  • [2] Variance based time-frequency mask estimation for unsupervised speech enhancement
    Nasir Saleem
    Muhammad Irfan Khattak
    Gunawan Witjaksono
    Gulzar Ahmad
    Multimedia Tools and Applications, 2019, 78 : 31867 - 31891
  • [3] Variance based time-frequency mask estimation for unsupervised speech enhancement
    Saleem, Nasir
    Khattak, Muhammad Irfan
    Witjaksono, Gunawan
    Ahmad, Gulzar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (22) : 31867 - 31891
  • [4] Speech mask estimation using the time-frequency correlation of speech presence
    Zhan, Ge
    Huang, Zhao-Qiong
    Ying, Dong-Wen
    Pan, Jie-Lin
    Yan, Yong-Hong
    Ruan Jian Xue Bao/Journal of Software, 2016, 27 : 64 - 68
  • [5] A data-driven approach for estimating the time-frequency binary mask
    Kim, Gibak
    Loizou, Philipos C.
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 884 - 887
  • [6] Spectrographic Speech Mask Estimation Using the Time-Frequency Correlation of Speech Presence
    Zhan, Ge
    Huang, Zhaoqiong
    Ying, Dongwen
    Pan, Jielin
    Yan, Yonghong
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2287 - 2291
  • [7] ON TIME-FREQUENCY MASK ESTIMATION FOR MVDR BEAMFORMING WITH APPLICATION IN ROBUST SPEECH RECOGNITION
    Xiao, Xiong
    Zhao, Shengkui
    Jones, Douglas L.
    Chng, Eng Siong
    Li, Haizhou
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 3246 - 3250
  • [8] Carrier Frequency Estimation of Time-Frequency Overlapped MASK Signals for Underlay Cognitive Radio Network
    Liu, Mingqian
    Zhang, Junlin
    Lin, Yun
    Wu, Zhen
    Shang, Bodong
    Gong, Fengkui
    IEEE ACCESS, 2019, 7 : 58277 - 58285
  • [9] DIRECTION OF ARRIVAL ESTIMATION IN HIGHLY REVERBERANT ENVIRONMENTS USING SOFT TIME-FREQUENCY MASK
    Tourbabin, Vladimir
    Donley, Jacob
    Rafaely, Boaz
    Mehra, Ravish
    2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 383 - 387
  • [10] AUGMENTED TIME-FREQUENCY MASK ESTIMATION IN CLUSTER-BASED SOURCE SEPARATION ALGORITHMS
    Luo, Yi
    Mesgarani, Nima
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 710 - 714