Integration of Spatial Cue-Based Noise Reduction and Speech Model-Based Source Restoration for Real Time Speech Enhancement

被引:1
|
作者
Kawase, Tomoko [1 ]
Niwa, Kenta [1 ]
Fujimoto, Masakiyo [2 ,3 ]
Kobayashi, Kazunori [1 ]
Araki, Shoko [2 ]
Nakatani, Tomohiro [2 ]
机构
[1] NTT Corp, NTT Media Intelligence Labs, Musashino, Tokyo 1808585, Japan
[2] NTT Corp, NTT Commun Sci Labs, Kyoto 6190237, Japan
[3] Natl Inst Informat & Commun Technol, Kyoto, Japan
关键词
microphone array; beamforming (BF); power spectral density (PSD) estimation; Gaussian mixture model; Wiener filtering;
D O I
10.1587/transfun.E100.A.1127
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a microphone array speech enhancement method that integrates spatial-cue-based source power spectral density (PSD) estimation and statistical speech model-based PSD estimation. The goal of this research was to clearly pick up target speech even in noisy environments such as crowded places, factories, and cars running at high speed. Beamforming with post-Wiener filtering is commonly used in many conventional studies on microphone-array noise reduction. For calculating a Wiener filter, speech/noise PSDs are essential, and they are estimated using spatial cues obtained from microphone observations. Assuming that the sound sources are sparse in the temporal-spatial domain, speech/noise PSDs may be estimated accurately. However, PSD estimation errors increase under circumstances beyond this assumption. In this study, we integrated speech models and PSD-estimation-in-beamspace method to correct speech/noise PSD estimation errors. The roughly estimated noise PSD was obtained frame-by-frame by analyzing spatial cues from array observations. By combining noise PSD with the statistical model of clean-speech, the relationships between the PSD of the observed signal and that of the target speech, hereafter called the observation model, could be described without pre-training. By exploiting Bayes' theorem, a Wiener filter is statistically generated from observation models. Experiments conducted to evaluate the proposed method showed that the signal-to-noise ratio and naturalness of the output speech signal were significantly better than that with conventional methods.
引用
收藏
页码:1127 / 1136
页数:10
相关论文
共 50 条
  • [1] NOISE IDENTIFICATION FOR MODEL-BASED SPEECH ENHANCEMENT
    Jiang Wenbin
    Ying Rendong
    Liu Peilin
    2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 478 - 483
  • [2] Wavelet based Noise Reduction Techniques for Real Time Speech Enhancement
    Ravi, Bhat Raghavendra
    Deepu, S. P.
    Kini, Ramesh M.
    David, Sumam S.
    2018 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2018, : 846 - 851
  • [3] Integration of deep learning with expectation maximization for spatial cue-based speech separation in reverberant conditions
    Gul, Sania
    Khan, Muhammad Salman
    Shah, Syed Waqar
    APPLIED ACOUSTICS, 2021, 179
  • [4] REAL-TIME INTEGRATION OF STATISTICAL MODEL-BASED SPEECH ENHANCEMENT WITH UNSUPERVISED NOISE PSD ESTIMATION USING MICROPHONE ARRAY
    Kawase, T.
    Niwa, K.
    Fujimoto, M.
    Kamado, N.
    Kobayashi, K.
    Araki, S.
    Nakatani, T.
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 604 - 608
  • [5] Adaptive model-based speech enhancement
    Logan, B
    Robinson, T
    SPEECH COMMUNICATION, 2001, 34 (04) : 351 - 368
  • [6] INDIRECT MODEL-BASED SPEECH ENHANCEMENT
    Le Roux, Jonathan
    Hershey, John R.
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4045 - 4048
  • [7] A Statistical Model-Based Speech Enhancement Using Acoustic Noise Classification for Robust Speech Communication
    Choi, Jae-Hun
    Chang, Joon-Hyuk
    IEICE TRANSACTIONS ON COMMUNICATIONS, 2012, E95B (07) : 2513 - 2516
  • [8] Automatic detection of nasalization in speech via cue-based analysis
    Kong, Blisse
    Choi, Jeung-Yoon
    Shattuck-Hufnagel, Stefanie
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
  • [9] Model-based eigenspectrum estimation for speech enhancement
    Bhunjun, Vinesh
    Brookes, Mike
    Naylor, Patrick
    2006 FORTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-5, 2006, : 1331 - +
  • [10] Model-Based Speech Enhancement in the Modulation Domain
    Wang, Yu
    Brookes, Mike
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (03) : 580 - 594