Speech Enhancement: A Review of Different Deep Learning Methods

被引：1

作者：

Yechuri, Sivaramakrishna ^{[1
]}

Vanabathina, Sunny Dayal ^{[1
]}

机构：

[1] VIT AP Univ, Sch Elect Engn, Amaravati, Andhra Prades, India

来源：

INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS | 2025年 / 25卷 / 03期

关键词：

Speech enhancement; noise reduction; deep learning; speech processing; generative adversarial networks; convolutional neural networks; deep neural networks; NEURAL-NETWORK; ALGORITHM; DATABASE; CORPUS;

D O I：

10.1142/S021946782550024

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Speech enhancement methods differ depending on the degree of degradation and noise in the speech signal, so research in the field is still difficult, especially when dealing with residual and background noise, which is highly transient. Numerous deep learning networks have been developed that provide promising results for improving the perceptual quality and intelligibility of noisy speech. Innovation and research in speech enhancement have been opened up by the power of deep learning techniques with implications across a wide range of real time applications. By reviewing the important datasets, feature extraction methods, deep learning models, training algorithms and evaluation metrics for speech enhancement, this paper provides a comprehensive overview. We begin by tracing the evolution of speech enhancement research, from early approaches to recent advances in deep learning architectures. By analyzing and comparing the approaches to solving speech enhancement challenges, we categorize them according to their strengths and weaknesses. Moreover, we discuss the challenges and future directions of deep learning in speech enhancement, including the demand for parameter-efficient models for speech enhancement. The purpose of this paper is to examine the development of the field, compare and contrast different approaches, and highlight future directions as well as challenges for further research.

引用

页数：31

共 120 条

[1]

ACKLEY DH, 1985, COGNITIVE SCI, V9, P147

[2]

Ahmad AM, 2004, IEEE INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES 2004 (ISCIT 2004), PROCEEDINGS, VOLS 1 AND 2, P98

[3]

Aldeneh Z, 2017, INT CONF ACOUST SPEE, P2741, DOI 10.1109/ICASSP.2017.7952655

[4] A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds [J].

Alias, Francesc ;

Socoro, Joan Claudi ;

Sevillano, Xavier .

APPLIED SCIENCES-BASEL, 2016, 6 (05)

[5]

[Anonymous], 1988, Objective Measures of Speech Quality

[6]

[Anonymous], 2006, P 19 INT C NEURAL IN

[7]

[Anonymous], 2016, ARXIV160509782

[8]

[Anonymous], 2017, INT C LEARN REPR

[9]

[Anonymous], 2006, Advances in Neural Information Processing (NIPS 2006)

[10]

Baby D, 2019, INT CONF ACOUST SPEE, P106, DOI [10.1109/icassp.2019.8683799, 10.1109/ICASSP.2019.8683799]

← 1 2 3 4 5 6 7 8 9 10 →