Sixty Years of Frequency-Domain Monaural Speech Enhancement: From Traditional to Deep Learning Methods

被引：27

作者：

Zheng, Chengshi ^{[1
,2
,4
,5
]}

Zhang, Huiyong ^{[1
,2
]}

Liu, Wenzhe ^{[1
,2
]}

Luo, Xiaoxue ^{[1
,2
]}

Li, Andong ^{[1
,2
]}

Li, Xiaodong ^{[1
,2
]}

Moore, Brian C. J. ^{[3
]}

机构：

[1] Chinese Acad Sci, Inst Acoust, Key Lab Noise & Vibrat Res, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Beijing, Peoples R China

[3] Univ Cambridge, Dept Psychol, Cambridge Hearing Grp, Cambridge, England

[4] Chinese Acad Sci, Inst Acoust, Key Lab Noise & Vibrat Res, Beijing 100190, Peoples R China

[5] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

来源：

TRENDS IN HEARING | 2023年 / 27卷

关键词：

speech enhancement; speech dereverberation; multistage learning; noise estimation; deep complex network; GENERALIZED SPECTRAL SUBTRACTION; NOISE-REDUCTION ALGORITHM; RECURRENT NEURAL-NETWORKS; SQUARE ERROR ESTIMATION; HEARING-AID DELAYS; STATISTICAL-MODEL; SOURCE SEPARATION; MMSE ESTIMATOR; MUSICAL NOISE; PHASE;

D O I：

10.1177/23312165231209913

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Frequency-domain monaural speech enhancement has been extensively studied for over 60 years, and a great number of methods have been proposed and applied to many devices. In the last decade, monaural speech enhancement has made tremendous progress with the advent and development of deep learning, and performance using such methods has been greatly improved relative to traditional methods. This survey paper first provides a comprehensive overview of traditional and deep-learning methods for monaural speech enhancement in the frequency domain. The fundamental assumptions of each approach are then summarized and analyzed to clarify their limitations and advantages. A comprehensive evaluation of some typical methods was conducted using the WSJ + Deep Noise Suppression (DNS) challenge and Voice Bank + DEMAND datasets to give an intuitive and unified comparison. The benefits of monaural speech enhancement methods using objective metrics relevant for normal-hearing and hearing-impaired listeners were evaluated. The objective test results showed that compression of the input features was important for simulated normal-hearing listeners but not for simulated hearing-impaired listeners. Potential future research and development topics in monaural speech enhancement are suggested.

引用

页数：52

共 14 条

[1] FREQUENCY-DOMAIN ADAPTIVE POSTFILTERING FOR ENHANCEMENT OF NOISY SPEECH
WANG, FM
KABAL, P
RAMACHANDRAN, RP
OSHAUGHNESSY, D
SPEECH COMMUNICATION, 1993, 12 (01) : 41 - 56
[2] A Comparative Study of Time and Frequency Domain Approaches to Deep Learning based Speech Enhancement
Nossier, Soha A.
Wall, Julie
Moniri, Mansour
Glackin, Cornelius
Cannings, Nigel
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[3] Speech Enhancement: Traditional and Deep Learning Techniques
Gaddamedi, Satya Prasad
Patel, Anuj
Chandra, Sabyasachi
Bharati, Puja
Ghosh, Nirmalya
Das Mandal, Shyamal Kumar
PROCEEDINGS OF 27TH INTERNATIONAL SYMPOSIUM ON FRONTIERS OF RESEARCH IN SPEECH AND MUSIC, FRSM 2023, 2024, 1455 : 75 - 86
[4] A Fast-Converging Adaptive Frequency-Domain MVDR Beamformer for Speech Enhancement
Zhao, Shengkui
Jones, Douglas L.
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1928 - 1931
[5] Speech Enhancement: A Review of Different Deep Learning Methods
Yechuri, Sivaramakrishna
Vanabathina, Sunny Dayal
INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2023,
[6] Enhanced Frequency-Domain Frost Algorithm Using Conjugate Gradient Techniques for Speech Enhancement
Douglas L. Jones
Journal of Electronic Science and Technology, 2012, (02) : 158 - 162
[7] Two-Stage Learning and Fusion Network With Noise Aware for Time-Domain Monaural Speech Enhancement
Xiang, Xiaoxiao
Zhang, Xiaojuan
Chen, Haozhe
IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1754 - 1758
[8] An efficient frequency-domain adaptive forward BSS algorithm for acoustic noise reduction and speech quality enhancement
Djendi, Mohamed
COMPUTERS & ELECTRICAL ENGINEERING, 2016, 52 : 12 - 27
[9] Two-stage deep learning approach for speech enhancement and reconstruction in the frequency and time domains
Nossier, Soha A.
Wall, Julie
Moniri, Mansour
Glackin, Cornelius
Cannings, Nigel
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[10] Speech Separation in Time-frequency Domain by Deep Learning with High Performance and Reducing Parameters
Takahashi, K.
Shiraishi, T.
JOURNAL OF VIBRATION ENGINEERING & TECHNOLOGIES, 2025, 13 (01)

← 1 2 →