Sixty Years of Frequency-Domain Monaural Speech Enhancement: From Traditional to Deep Learning Methods

被引:27
|
作者
Zheng, Chengshi [1 ,2 ,4 ,5 ]
Zhang, Huiyong [1 ,2 ]
Liu, Wenzhe [1 ,2 ]
Luo, Xiaoxue [1 ,2 ]
Li, Andong [1 ,2 ]
Li, Xiaodong [1 ,2 ]
Moore, Brian C. J. [3 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Noise & Vibrat Res, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Univ Cambridge, Dept Psychol, Cambridge Hearing Grp, Cambridge, England
[4] Chinese Acad Sci, Inst Acoust, Key Lab Noise & Vibrat Res, Beijing 100190, Peoples R China
[5] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
来源
TRENDS IN HEARING | 2023年 / 27卷
关键词
speech enhancement; speech dereverberation; multistage learning; noise estimation; deep complex network; GENERALIZED SPECTRAL SUBTRACTION; NOISE-REDUCTION ALGORITHM; RECURRENT NEURAL-NETWORKS; SQUARE ERROR ESTIMATION; HEARING-AID DELAYS; STATISTICAL-MODEL; SOURCE SEPARATION; MMSE ESTIMATOR; MUSICAL NOISE; PHASE;
D O I
10.1177/23312165231209913
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Frequency-domain monaural speech enhancement has been extensively studied for over 60 years, and a great number of methods have been proposed and applied to many devices. In the last decade, monaural speech enhancement has made tremendous progress with the advent and development of deep learning, and performance using such methods has been greatly improved relative to traditional methods. This survey paper first provides a comprehensive overview of traditional and deep-learning methods for monaural speech enhancement in the frequency domain. The fundamental assumptions of each approach are then summarized and analyzed to clarify their limitations and advantages. A comprehensive evaluation of some typical methods was conducted using the WSJ + Deep Noise Suppression (DNS) challenge and Voice Bank + DEMAND datasets to give an intuitive and unified comparison. The benefits of monaural speech enhancement methods using objective metrics relevant for normal-hearing and hearing-impaired listeners were evaluated. The objective test results showed that compression of the input features was important for simulated normal-hearing listeners but not for simulated hearing-impaired listeners. Potential future research and development topics in monaural speech enhancement are suggested.
引用
收藏
页数:52
相关论文
共 14 条
  • [1] FREQUENCY-DOMAIN ADAPTIVE POSTFILTERING FOR ENHANCEMENT OF NOISY SPEECH
    WANG, FM
    KABAL, P
    RAMACHANDRAN, RP
    OSHAUGHNESSY, D
    SPEECH COMMUNICATION, 1993, 12 (01) : 41 - 56
  • [2] A Comparative Study of Time and Frequency Domain Approaches to Deep Learning based Speech Enhancement
    Nossier, Soha A.
    Wall, Julie
    Moniri, Mansour
    Glackin, Cornelius
    Cannings, Nigel
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [3] Speech Enhancement: Traditional and Deep Learning Techniques
    Gaddamedi, Satya Prasad
    Patel, Anuj
    Chandra, Sabyasachi
    Bharati, Puja
    Ghosh, Nirmalya
    Das Mandal, Shyamal Kumar
    PROCEEDINGS OF 27TH INTERNATIONAL SYMPOSIUM ON FRONTIERS OF RESEARCH IN SPEECH AND MUSIC, FRSM 2023, 2024, 1455 : 75 - 86
  • [4] A Fast-Converging Adaptive Frequency-Domain MVDR Beamformer for Speech Enhancement
    Zhao, Shengkui
    Jones, Douglas L.
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1928 - 1931
  • [5] Speech Enhancement: A Review of Different Deep Learning Methods
    Yechuri, Sivaramakrishna
    Vanabathina, Sunny Dayal
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2023,
  • [6] Enhanced Frequency-Domain Frost Algorithm Using Conjugate Gradient Techniques for Speech Enhancement
    Douglas L. Jones
    Journal of Electronic Science and Technology, 2012, (02) : 158 - 162
  • [7] Two-Stage Learning and Fusion Network With Noise Aware for Time-Domain Monaural Speech Enhancement
    Xiang, Xiaoxiao
    Zhang, Xiaojuan
    Chen, Haozhe
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1754 - 1758
  • [8] An efficient frequency-domain adaptive forward BSS algorithm for acoustic noise reduction and speech quality enhancement
    Djendi, Mohamed
    COMPUTERS & ELECTRICAL ENGINEERING, 2016, 52 : 12 - 27
  • [9] Two-stage deep learning approach for speech enhancement and reconstruction in the frequency and time domains
    Nossier, Soha A.
    Wall, Julie
    Moniri, Mansour
    Glackin, Cornelius
    Cannings, Nigel
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [10] Speech Separation in Time-frequency Domain by Deep Learning with High Performance and Reducing Parameters
    Takahashi, K.
    Shiraishi, T.
    JOURNAL OF VIBRATION ENGINEERING & TECHNOLOGIES, 2025, 13 (01)