Power Exponent Based Weighting Criterion for DNN-Based Mask Approximation in Speech Enhancement

被引：3

作者：

Cui, Zihao ^{[1
]}

Bao, Changchun ^{[1
]}

机构：

[1] Beijing Univ Technol, Beijing 100124, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2021年 / 28卷

基金：

中国国家自然科学基金;

关键词：

Speech enhancement; Noise measurement; Linear programming; Training; Databases; Indexes; Time-frequency analysis; Ideal amplitude masking (IAM); weighted mean square error; mask approximation; DNN; speech enhancement; NOISE;

D O I：

10.1109/LSP.2021.3063888

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this letter, a novel weighted mean square error (WMSE) is proposed to improve the DNN-based mask approximation method for speech enhancement, in which the weighting is closely related to the power exponent about noisy spectrum amplitude (NSA) base. The power exponents 0 and 2 separately reflect ideal amplitude masking (IAM) without any clippings and the indirect mapping (IM) on short-time spectral amplitude (STSA), and it is highly related to the enhanced spectrum and the performance of the enhanced signal based on the tests. Also, the experimental results show that the outstanding weighting is the noisy spectrum base with the power exponent 1 for the phase-unaware masking and results in better harmonic structure restoration. The objective function with the WMSE on the NSA (WMSE-NSA) can averagely improve 0.1 on the test of perceptual evaluation of speech quality (PESQ) and 1.7% on the test of short-time objective intelligibility (STOI) compared with the MSE-based mask approximation methods.

引用

页码：618 / 622

页数：5

共 31 条

[1]

[Anonymous], 2013, P M AC

[2]

Barker J, 2015, 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P504, DOI 10.1109/ASRU.2015.7404837

[3] SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].

BOLL, SF .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120

[4] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR LOG-SPECTRAL AMPLITUDE ESTIMATOR [J].

EPHRAIM, Y ;

MALAH, D .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02) :443-445

[5] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR [J].

EPHRAIM, Y ;

MALAH, D .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06) :1109-1121

[6]

Erdogan H, 2015, INT CONF ACOUST SPEE, P708, DOI 10.1109/ICASSP.2015.7178061

[7]

Erdogan Hakan, 2017, New Era for Robust SpeechRecognition: ExploitingDeep Learning, P165

[8]

Garofolo J. S., 1993, NASA STIRECON TECH R, V93

[9] On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement [J].

Kolbaek, Morten ;

Tan, Zheng-Hua ;

Jensen, Soren Holdt ;

Jensen, Jesper .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 :825-838

[10]

Koyama Y., 2020, ARXIV200511611

← 1 2 3 4 →