An Overview of Monaural Speech Denoising and Dereverberation Research

被引:0
|
作者
Lan T. [1 ]
Peng C. [1 ]
Li S. [1 ]
Ye W. [1 ]
Li M. [1 ]
Hui G. [1 ]
Lü Y. [1 ]
Qian Y. [1 ]
Liu Q. [1 ]
机构
[1] School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu
来源
Liu, Qiao (qliu@uestc.edu.cn) | 1600年 / Science Press卷 / 57期
基金
中国国家自然科学基金;
关键词
Deep neural network; Machine learning; Speech denoising; Speech dereverberation; Speech enhancement;
D O I
10.7544/issn1000-1239.2020.20190306
中图分类号
学科分类号
摘要
Speech enhancement refers to the use of audio signal processing techniques and various algorithms to improve the intelligibility and quality of the distorted speech signals. It has great research value and a wide range of applications including speech recognition, VoIP, tele-conference and hearing aids. Most early work utilized unsupervised digital signal analysis methods to decompose the speech signal to obtain the characteristics of the clean speech and the noise. With the development of machine learning, some supervised methods which aim to learn the relationship between noisy and clean speech signals were proposed. In particular, the introduction of deep learning has greatly improved the performance. In order to help beginners and related researchers to understand the current research status of this topic, this paper conducts a comprehensive survey of the development process of the monaural speech enhancement, and systematically summarizes from the aspect of model methods, datasets, features, evaluation metrics, etc. First, we divide speech enhancement into noise reduction and de-reverberation, then respectively sort out the existing work of traditional and machine-learning-based methods in these two directions. Moreover, we briefly introduce the main ideas of typical solutions, and compare the performance of different methods. Then, commonly used datasets, features, learning objectives and evaluation metrics in experiments are enumerated and illustrated. Finally, four major challenges and corresponding issues in this area are summarized. © 2020, Science Press. All right reserved.
引用
收藏
页码:928 / 953
页数:25
相关论文
共 173 条
  • [11] Paliwal K.K., Basu A., A speech enhancement method based on Kalman filtering, Proc of the 12th IEEE Int Conf on Acoustics, Speech and Signal Processing, pp. 177-180, (1987)
  • [12] Gabrea M., Grivel E., Najun M., A single microphone Kalman filter-based noise canceller, IEEE Signal Processing Letters, 6, 3, pp. 55-57, (1999)
  • [13] Wang Y., Brookes M., Model-based speech enhancement in the modulation domain, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26, 3, pp. 580-594, (2018)
  • [14] Andersen K.T., Moonen M., Robust speech-distortion weighted interframe Wiener filters for single-channel noise reduction, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26, 1, pp. 97-107, (2018)
  • [15] Peng R., Tan Z., Li X., Et al., A perceptually motivated LP residual estimator in noisy and reverberant environments, Speech Communication, 96, pp. 129-141, (2018)
  • [16] Ephraim Y., Malah D., Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator, IEEE Transactions on Acoustics, Speech, and Signal Processing, 32, 6, pp. 1109-1121, (1984)
  • [17] Ephraim Y., Malah D., Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Transactions on Acoustics, Speech, and Signal Processing, 33, 2, pp. 443-445, (1985)
  • [18] Martin R., Speech enhancement using mmse short time spectral estimation with Gamma distributed speech priors, Proc of the 25th IEEE Int Conf on Acoustics, Speech and Signal Processing, pp. I-253-I-256, (2002)
  • [19] Faraji N., Kohansal A., Mmse and maximum a posteriori estimators for speech enhancement in additive noise assuming a t-location-scale clean speech prior, IET Signal Processing, 12, 4, pp. 532-543, (2018)
  • [20] Jia H., Wang W., Wang D., Et al., Speech enhancement using modified MMSE-LSA and phase reconstruction in voiced and unvoiced speech, International Journal of Pattern Recognition and Artificial Intelligence, 33, 2, (2019)