DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments Matching

被引：0

作者：

Mizoguchi, Satoshi ^{[1
]}

Saito, Yuki ^{[1
]}

Takamichi, Shinnosuke ^{[1
]}

Saruwatari, Hiroshi ^{[1
]}

机构：

[1] Univ Tokyo, Tokyo 1138656, Japan

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2021年 / E104D卷 / 11期

关键词：

speech enhancement; musical noise; kurtosis; moment matching; deep learning;

D O I：

10.1587/transinf.2021EDP7041

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose deep neural network (DNN)-based speech enhancement that reduces musical noise and achieves better auditory impressions. The musical noise is an artifact generated by nonlinear signal processing and negatively affects the auditory impressions. We aim to develop musical-noise-free speech enhancement methods that suppress the musical noise generation and produce perceptually-comfortable enhanced speech. DNN-based speech enhancement using a soft mask achieves high noise reduction but generates musical noise in non-speech regions. Therefore, first, we define kurtosis matching for DNN-based low-musical-noise speech enhancement. Kurtosis is the fourth-order moment and is known to correlate with the amount of musical noise. The kurtosis matching is a penalty term of the DNN training and works to reduce the amount of musical noise. We further extend this scheme to standardized-moment matching. The extended scheme involves using moments whose orders are higher than kurtosis and generalizes the conventional musical-noise-free method based on kurtosis matching. We formulate standardized-moment matching and explore how effectively the higher-order moments reduce the amount of musical noise. Experimental evaluation results 1) demonstrate that kurtosis matching can reduce musical noise without negatively affecting noise suppression and 2) newly reveal that the sixth-moment matching also achieves low-musical-noise speech enhancement as well as kurtosis matching.

引用

页码：1971 / 1980

页数：10

共 23 条

[1] [Anonymous], 2017, ARXIV171100354
[2] Elimination of the Musical Noise Phenomenon with the Ephraim and Malah Noise Suppressor
Cappe, Olivier
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02): : 345 - 349
[3] Multichannel speech enhancement based on generalized gamma prior distribution with its online adaptive estimation
Dat, Tran Huy
Takeda, Kazuya
Itakura, Fumitada
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (03) : 439 - 447
[4] SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement
Fu, Szu-Wei
Tsao, Yu
Lu, Xugang
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3768 - 3772
[5] Postprocessing method for suppressing musical noise generated by spectral subtraction
Goh, Z
Tan, KC
Tan, BTG
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (03): : 287 - 292
[6] Angular Region-wise Speech Enhancement for Hands-free Speakerphone
Hioka, Yusuke
Furuya, Ken'ichi
Kobayashi, Kazunori
Sakauchi, Sumitaka
Haneda, Yoichi
[J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2012, 58 (04) : 1403 - 1410
[7] Kenny J.F., 1962, MATH STAT, V3rd, P98
[8] Kingma DP, 2014, ADV NEUR IN, V27
[9] A hands-free unit with noise reduction by using adaptive beamformer
Kobayashi, Kazunori
Haneda, Yoichi
Furuya, Ken'ichi
Kataoka, Akitoshi
[J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2008, 54 (01) : 116 - 122
[10] Koizumi Y, 2017, INT CONF ACOUST SPEE, P81, DOI 10.1109/ICASSP.2017.7952122

← 1 2 3 →