Assessment of Self-Supervised Denoising Methods for Esophageal Speech Enhancement

被引：0

作者：

Amarjouf, Madiha ^{[1
]}

Ibn Elhaj, El Hassan ^{[1
]}

Chami, Mouhcine ^{[2
]}

Ezzine, Kadria ^{[3
]}

Di Martino, Joseph ^{[3
]}

机构：

[1] Natl Inst Posts & Telecommun INPT, Res Lab Telecommun Syst Networks & Serv STRS, Res Team Multimedia Signal & Commun Syst MUSICS, Ave Allal Fassi, Rabat 10112, Morocco

[2] Natl Inst Posts & Telecommun INPT, Res Lab Telecommun Syst Networks & Serv STRS, Res Team Secure & Mixed Architecture Reliable Tech, Ave Allal Fassi, Rabat 10112, Morocco

[3] LORIA Lab Lorrain Rech Informat & Ses Applicat, BP 239, F-54506 Vandoeuvre Les Nancy, France

来源：

APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 15期

关键词：

esophageal speech; self-supervised denoising; speech enhancement; DCUNET; DCUNET-cTSTM; STFT; VoiceFixer; VOICE CONVERSION;

D O I：

10.3390/app14156682

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Esophageal speech (ES) is a pathological voice that is often difficult to understand. Moreover, acquiring recordings of a patient's voice before a laryngectomy proves challenging, thereby complicating enhancing this kind of voice. That is why most supervised methods used to enhance ES are based on voice conversion, which uses healthy speaker targets, things that may not preserve the speaker's identity. Otherwise, unsupervised methods for ES are mostly based on traditional filters, which cannot alone beat this kind of noise, making the denoising process difficult. Also, these methods are known for producing musical artifacts. To address these issues, a self-supervised method based on the Only-Noisy-Training (ONT) model was applied, consisting of denoising a signal without needing a clean target. Four experiments were conducted using Deep Complex UNET (DCUNET) and Deep Complex UNET with Complex Two-Stage Transformer Module (DCUNET-cTSTM) for assessment. Both of these models are based on the ONT approach. Also, for comparison purposes and to calculate the evaluation metrics, the pre-trained VoiceFixer model was used to restore the clean wave files of esophageal speech. Even with the fact that ONT-based methods work better with noisy wave files, the results have proven that ES can be denoised without the need for clean targets, and hence, the speaker's identity is retained.

引用

页数：14

共 50 条

[21] Self-Supervised Poisson-Gaussian Denoising
Khademi, Wesley
Rao, Sonia
Minnerath, Clare
Hagen, Guy
Ventura, Jonathan
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 2130 - 2138
[22] DMFNet: A Novel Self-Supervised Dynamic Multi-Focusing Network for Speech Denoising
Yang, Chenghao
Tao, Yi
Liu, Jingyin
Xu, Xiaomei
IEEE ACCESS, 2024, 12 : 98225 - 98238
[23] EXPLORING EFFICIENT-TUNING METHODS IN SELF-SUPERVISED SPEECH MODELS
Chen, Zih-Ching
Fu, Chin-Lun
Liu, Chih-Ying
Li, Shang-Wen
Lee, Hung-yi
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 1120 - 1127
[24] THE EFFECT OF SPOKEN LANGUAGE ON SPEECH ENHANCEMENT USING SELF-SUPERVISED SPEECH REPRESENTATION LOSS FUNCTIONS
Close, George
Hain, Thomas
Goetze, Stefan
2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,
[25] End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
Chang, Xuankai
Maekaku, Takashi
Fujita, Yuya
Watanabe, Shinji
INTERSPEECH 2022, 2022, : 3819 - 3823
[26] Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement
Yang, Hejung
Kang, Hong-Goo
INTERSPEECH 2023, 2023, : 814 - 818
[27] A STUDY ON THE IMPACT OF SELF-SUPERVISED LEARNING ON AUTOMATIC DYSARTHRIC SPEECH ASSESSMENT
Cadet, Xavier F.
Aloufi, Ranya
Ahmadi-Abhari, Sara
Haddadi, Hamed
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 630 - 634
[28] Boosting the Intelligibility of Waveform Speech Enhancement Networks through Self-supervised Representations
Sun, Tao
Gong, Shuyu
Wang, Zhewei
Smith, Charles D.
Wang, Xianhui
Xu, Li
Liu, Jundong
20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 992 - 997
[29] JOINT LEARNING WITH SHARED LATENT SPACE FOR SELF-SUPERVISED MONAURAL SPEECH ENHANCEMENT
Li, Yi
Sun, Yang
Wang, Wenwu
Naqvi, Syed Mohsen
2023 SENSOR SIGNAL PROCESSING FOR DEFENCE CONFERENCE, SSPD, 2023, : 21 - 25
[30] Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning
Kim, Eesung
Jeon, Jae-Jin
Seo, Hyeji
Kim, Hoon
INTERSPEECH 2022, 2022, : 1411 - 1415

← 1 2 3 4 5 →