Assessment of Self-Supervised Denoising Methods for Esophageal Speech Enhancement

被引:0
|
作者
Amarjouf, Madiha [1 ]
Ibn Elhaj, El Hassan [1 ]
Chami, Mouhcine [2 ]
Ezzine, Kadria [3 ]
Di Martino, Joseph [3 ]
机构
[1] Natl Inst Posts & Telecommun INPT, Res Lab Telecommun Syst Networks & Serv STRS, Res Team Multimedia Signal & Commun Syst MUSICS, Ave Allal Fassi, Rabat 10112, Morocco
[2] Natl Inst Posts & Telecommun INPT, Res Lab Telecommun Syst Networks & Serv STRS, Res Team Secure & Mixed Architecture Reliable Tech, Ave Allal Fassi, Rabat 10112, Morocco
[3] LORIA Lab Lorrain Rech Informat & Ses Applicat, BP 239, F-54506 Vandoeuvre Les Nancy, France
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 15期
关键词
esophageal speech; self-supervised denoising; speech enhancement; DCUNET; DCUNET-cTSTM; STFT; VoiceFixer; VOICE CONVERSION;
D O I
10.3390/app14156682
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Esophageal speech (ES) is a pathological voice that is often difficult to understand. Moreover, acquiring recordings of a patient's voice before a laryngectomy proves challenging, thereby complicating enhancing this kind of voice. That is why most supervised methods used to enhance ES are based on voice conversion, which uses healthy speaker targets, things that may not preserve the speaker's identity. Otherwise, unsupervised methods for ES are mostly based on traditional filters, which cannot alone beat this kind of noise, making the denoising process difficult. Also, these methods are known for producing musical artifacts. To address these issues, a self-supervised method based on the Only-Noisy-Training (ONT) model was applied, consisting of denoising a signal without needing a clean target. Four experiments were conducted using Deep Complex UNET (DCUNET) and Deep Complex UNET with Complex Two-Stage Transformer Module (DCUNET-cTSTM) for assessment. Both of these models are based on the ONT approach. Also, for comparison purposes and to calculate the evaluation metrics, the pre-trained VoiceFixer model was used to restore the clean wave files of esophageal speech. Even with the fact that ONT-based methods work better with noisy wave files, the results have proven that ES can be denoised without the need for clean targets, and hence, the speaker's identity is retained.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Self-Supervised Poisson-Gaussian Denoising
    Khademi, Wesley
    Rao, Sonia
    Minnerath, Clare
    Hagen, Guy
    Ventura, Jonathan
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 2130 - 2138
  • [22] DMFNet: A Novel Self-Supervised Dynamic Multi-Focusing Network for Speech Denoising
    Yang, Chenghao
    Tao, Yi
    Liu, Jingyin
    Xu, Xiaomei
    IEEE ACCESS, 2024, 12 : 98225 - 98238
  • [23] EXPLORING EFFICIENT-TUNING METHODS IN SELF-SUPERVISED SPEECH MODELS
    Chen, Zih-Ching
    Fu, Chin-Lun
    Liu, Chih-Ying
    Li, Shang-Wen
    Lee, Hung-yi
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 1120 - 1127
  • [24] THE EFFECT OF SPOKEN LANGUAGE ON SPEECH ENHANCEMENT USING SELF-SUPERVISED SPEECH REPRESENTATION LOSS FUNCTIONS
    Close, George
    Hain, Thomas
    Goetze, Stefan
    2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,
  • [25] End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
    Chang, Xuankai
    Maekaku, Takashi
    Fujita, Yuya
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 3819 - 3823
  • [26] Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement
    Yang, Hejung
    Kang, Hong-Goo
    INTERSPEECH 2023, 2023, : 814 - 818
  • [27] A STUDY ON THE IMPACT OF SELF-SUPERVISED LEARNING ON AUTOMATIC DYSARTHRIC SPEECH ASSESSMENT
    Cadet, Xavier F.
    Aloufi, Ranya
    Ahmadi-Abhari, Sara
    Haddadi, Hamed
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 630 - 634
  • [28] Boosting the Intelligibility of Waveform Speech Enhancement Networks through Self-supervised Representations
    Sun, Tao
    Gong, Shuyu
    Wang, Zhewei
    Smith, Charles D.
    Wang, Xianhui
    Xu, Li
    Liu, Jundong
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 992 - 997
  • [29] JOINT LEARNING WITH SHARED LATENT SPACE FOR SELF-SUPERVISED MONAURAL SPEECH ENHANCEMENT
    Li, Yi
    Sun, Yang
    Wang, Wenwu
    Naqvi, Syed Mohsen
    2023 SENSOR SIGNAL PROCESSING FOR DEFENCE CONFERENCE, SSPD, 2023, : 21 - 25
  • [30] Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning
    Kim, Eesung
    Jeon, Jae-Jin
    Seo, Hyeji
    Kim, Hoon
    INTERSPEECH 2022, 2022, : 1411 - 1415