Integration of Spatial Cue-Based Noise Reduction and Speech Model-Based Source Restoration for Real Time Speech Enhancement

被引：1

作者：

Kawase, Tomoko ^{[1
]}

Niwa, Kenta ^{[1
]}

Fujimoto, Masakiyo ^{[2
,3
]}

Kobayashi, Kazunori ^{[1
]}

Araki, Shoko ^{[2
]}

Nakatani, Tomohiro ^{[2
]}

机构：

[1] NTT Corp, NTT Media Intelligence Labs, Musashino, Tokyo 1808585, Japan

[2] NTT Corp, NTT Commun Sci Labs, Kyoto 6190237, Japan

[3] Natl Inst Informat & Commun Technol, Kyoto, Japan

来源：

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES | 2017年 / E100A卷 / 05期

关键词：

microphone array; beamforming (BF); power spectral density (PSD) estimation; Gaussian mixture model; Wiener filtering;

D O I：

10.1587/transfun.E100.A.1127

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose a microphone array speech enhancement method that integrates spatial-cue-based source power spectral density (PSD) estimation and statistical speech model-based PSD estimation. The goal of this research was to clearly pick up target speech even in noisy environments such as crowded places, factories, and cars running at high speed. Beamforming with post-Wiener filtering is commonly used in many conventional studies on microphone-array noise reduction. For calculating a Wiener filter, speech/noise PSDs are essential, and they are estimated using spatial cues obtained from microphone observations. Assuming that the sound sources are sparse in the temporal-spatial domain, speech/noise PSDs may be estimated accurately. However, PSD estimation errors increase under circumstances beyond this assumption. In this study, we integrated speech models and PSD-estimation-in-beamspace method to correct speech/noise PSD estimation errors. The roughly estimated noise PSD was obtained frame-by-frame by analyzing spatial cues from array observations. By combining noise PSD with the statistical model of clean-speech, the relationships between the PSD of the observed signal and that of the target speech, hereafter called the observation model, could be described without pre-training. By exploiting Bayes' theorem, a Wiener filter is statistically generated from observation models. Experiments conducted to evaluate the proposed method showed that the signal-to-noise ratio and naturalness of the output speech signal were significantly better than that with conventional methods.

引用

页码：1127 / 1136

页数：10

共 50 条

[1] NOISE IDENTIFICATION FOR MODEL-BASED SPEECH ENHANCEMENT
Jiang Wenbin
Ying Rendong
Liu Peilin
2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 478 - 483
[2] Wavelet based Noise Reduction Techniques for Real Time Speech Enhancement
Ravi, Bhat Raghavendra
Deepu, S. P.
Kini, Ramesh M.
David, Sumam S.
2018 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2018, : 846 - 851
[3] Integration of deep learning with expectation maximization for spatial cue-based speech separation in reverberant conditions
Gul, Sania
Khan, Muhammad Salman
Shah, Syed Waqar
APPLIED ACOUSTICS, 2021, 179
[4] REAL-TIME INTEGRATION OF STATISTICAL MODEL-BASED SPEECH ENHANCEMENT WITH UNSUPERVISED NOISE PSD ESTIMATION USING MICROPHONE ARRAY
Kawase, T.
Niwa, K.
Fujimoto, M.
Kamado, N.
Kobayashi, K.
Araki, S.
Nakatani, T.
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 604 - 608
[5] Adaptive model-based speech enhancement
Logan, B
Robinson, T
SPEECH COMMUNICATION, 2001, 34 (04) : 351 - 368
[6] INDIRECT MODEL-BASED SPEECH ENHANCEMENT
Le Roux, Jonathan
Hershey, John R.
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4045 - 4048
[7] A Statistical Model-Based Speech Enhancement Using Acoustic Noise Classification for Robust Speech Communication
Choi, Jae-Hun
Chang, Joon-Hyuk
IEICE TRANSACTIONS ON COMMUNICATIONS, 2012, E95B (07) : 2513 - 2516
[8] Automatic detection of nasalization in speech via cue-based analysis
Kong, Blisse
Choi, Jeung-Yoon
Shattuck-Hufnagel, Stefanie
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
[9] Model-based eigenspectrum estimation for speech enhancement
Bhunjun, Vinesh
Brookes, Mike
Naylor, Patrick
2006 FORTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-5, 2006, : 1331 - +
[10] Model-Based Speech Enhancement in the Modulation Domain
Wang, Yu
Brookes, Mike
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (03) : 580 - 594

← 1 2 3 4 5 →