ALLEVIATING THE LOSS-METRIC MISMATCH IN SUPERVISED SINGLE-CHANNEL SPEECH ENHANCEMENT

被引:1
作者
Yang, Yang [1 ,2 ,3 ]
Zhang, Hui [1 ,2 ,3 ]
Zhang, Xueliang [1 ,2 ,3 ]
Zhang, Huaiwen [1 ,2 ,3 ]
机构
[1] Inner Mongolia Univ, Coll Comp Sci, Hohhot, Inner Mongolia, Peoples R China
[2] Natl & Local Joint Engn Res Ctr Intelligent Infor, Hohhot, Inner Mongolia, Peoples R China
[3] Inner Mongolia Key Lab Mongolian Informat Proc Te, Hohhot, Inner Mongolia, Peoples R China
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
基金
中国国家自然科学基金;
关键词
Supervised Single-Channel Speech Enhancement; Loss-Metric Mismatch; Function Smoothing; NOISE;
D O I
10.1109/ICASSP43922.2022.9746915
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we study the loss-metric mismatch problem of supervised single-channel speech enhancement system. Most of the existing speech enhancement systems achieve unsatisfying performance since their empirically selected loss functions have semantic gaps with the non-differentiable evaluation metrics, a.k.a., the loss-metric mismatch problem. In this work, we propose a simple yet efficient method to generate suitable loss functions for the real front-end speech enhancement scenarios to alleviate the loss-metric mismatch problem. Specifically, we adopt the function smoothing technique and approximate the non-differentiable evaluation metrics by a set of basis functions and their linear combination. Experimental results demonstrate that the loss function generated by our method helps the speech enhancement system achieve remarkable performance in most evaluation metrics than the traditional empirically selected ones.
引用
收藏
页码:6952 / 6956
页数:5
相关论文
共 24 条
  • [1] SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION
    BOLL, SF
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02): : 113 - 120
  • [2] Learning With Learned Loss Function: Speech Enhancement With Quality-Net to Improve Perceptual Evaluation of Speech Quality
    Fu, Szu-Wei
    Liao, Chien-Feng
    Tsao, Yu
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 26 - 30
  • [3] Huang C, 2019, PR MACH LEARN RES, V97
  • [4] Kaelbling L. P., 1996, REINFORCEMENT LEARNI
  • [5] Kolbæk M, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5059, DOI 10.1109/ICASSP.2018.8462040
  • [6] Kreimer J., 1992, ANN OPER RES, V39, P97, DOI DOI 10.1007/BF02060937
  • [7] Investigation of Cost Function for Supervised Monaural Speech Separation
    Liu, Yun
    Zhang, Hui
    Zhang, Xueliang
    Cao, Yuhang
    [J]. INTERSPEECH 2019, 2019, : 3178 - 3182
  • [8] Loizou P.C., 2013, SPEECH ENHANCEMENT T
  • [9] Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions
    Loizou, Philipos C.
    Kim, Gibak
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (01): : 47 - 56
  • [10] Lyons J. W., 1993, DARPA TIMIT acoustic-phonetic continuous speech corpus