A Joint Learning Algorithm for Complex-Valued T-F Masks in Deep Learning-Based Single-Channel Speech Enhancement Systems

被引:20
|
作者
Lee, Jinkyu [1 ]
Kang, Hong-Goo [1 ]
机构
[1] Yonsei Univ, Dept Elect & Elect Engn, Seoul 03722, South Korea
基金
新加坡国家研究基金会;
关键词
Single-channel speech enhancement; complex-valued time-frequency mask; exact time-domain reconstruction; spectrogram consistency; SIGNAL ESTIMATION; PHASE; NOISE;
D O I
10.1109/TASLP.2019.2910638
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a joint learning algorithm for complex-valued time-frequency (T-F) masks in single-channel speech enhancement systems. Most speech enhancement algorithms operating in a single-channel microphone environment aim to enhance the magnitude component in a T-F domain, while the input noisy phase component is used directly without any processing. Consequently, the mismatch between the processed magnitude and the unprocessed phase degrades the sound quality. To address this issue, a learning method of targeting a T-F mask that is defined in a complex domain has recently been proposed. However, due to a wide dynamic range and an irregular spectrogram pattern of the complex-valued T-F mask, the learning process is difficult even with a large-scale deep learning network. Moreover, the learning process targeting the T-F mask itself does not directly minimize the distortion in spectra or time domains. In order to address these concerns, we focus on three issues: 1) an effective estimation of complex numbers with a wide dynamic range; 2) a learning method that is directly related to speech enhancement performance; and 3) a way to resolve the mismatch between the estimated magnitude and phase spectra. In this study, we propose objective functions that can solve each of these issues and train the network by minimizing them with a joint learning framework. The evaluation results demonstrate that the proposed learning algorithm achieves significant performance improvement in various objective measures and subjective preference listening test.
引用
收藏
页码:1098 / 1109
页数:12
相关论文
共 8 条
  • [1] Single-channel speech enhancement based on joint constrained dictionary learning
    Linhui Sun
    Yunyi Bu
    Pingan Li
    Zihao Wu
    EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [2] Single-channel speech enhancement based on joint constrained dictionary learning
    Sun, Linhui
    Bu, Yunyi
    Li, Pingan
    Wu, Zihao
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
  • [3] Deep Learning-based Speech Presence Probability Estimation for Noise PSD Estimation in Single-channel Speech Enhancement
    Yang, Haemin
    Choe, Soyeon
    Kim, Keulbit
    Kang, Hong-Goo
    2018 INTERNATIONAL CONFERENCE ON SIGNALS AND SYSTEMS (ICSIGSYS), 2018, : 267 - 270
  • [4] Deep Learning with Augmented Kalman Filter for Single-Channel Speech Enhancement
    Roy, Sujan Kumar
    Nicolson, Aaron
    Paliwal, Kuldip K.
    2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
  • [5] Phase-Sensitive Joint Learning Algorithms for Deep Learning-Based Speech Enhancement
    Lee, Jinkyu
    Skoglund, Jan
    Shabestary, Turaj
    Kang, Hong-Goo
    IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (08) : 1276 - 1280
  • [6] Supervised single-channel speech enhancement using ratio mask with joint dictionary learning
    Zhang, Long
    Bao, Guangzhao
    Zhang, Jing
    Ye, Zhongfu
    SPEECH COMMUNICATION, 2016, 82 : 38 - 52
  • [7] A HYBRID APPROACH TO COMBINING CONVENTIONAL AND DEEP LEARNING TECHNIQUES FOR SINGLE-CHANNEL SPEECH ENHANCEMENT AND RECOGNITION
    Tu, Yan-Hui
    Tashev, Ivan
    Zarar, Shuayb
    Lee, Chin-Hui
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2531 - 2535
  • [8] Multi-Task Learning U-Net for Single-Channel Speech Enhancement and Mask-Based Voice Activity Detection
    Lee, Geon Woo
    Kim, Hong Kook
    APPLIED SCIENCES-BASEL, 2020, 10 (09):