Self-Supervised Speech Enhancement for Arabic Speech Recognition in Real-World Environments

被引：6

作者：

Dendani, Bilal ^{[1
,2
]}

Bahi, Halima ^{[1
]}

Sari, Toufik ^{[1
,2
]}

机构：

[1] Univ Badji Mokhtar Annaba, Comp Sci Dept, Annaba 23000, Algeria

[2] Univ Badji Mokhtar Annaba, Labged Lab, Annaba 23000, Algeria

来源：

TRAITEMENT DU SIGNAL | 2021年 / 38卷 / 02期

关键词：

Arabic language; deep autoencoder; deep learning; self-supervised speech enhancement; speech recognition; ubiquitous systems;

D O I：

10.18280/ts.380212

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Mobile speech recognition attracts much attention in the ubiquitous context, however, background noises, speech coding, and transmission errors are prone to corrupt the incoming speech. Therein, building a robust speech recognizer requires the availability of a large number of real-world speech samples. Arabic language, like many other languages, lacks such resources; to overcome this limitation, we propose a speech enhancement step, before the recognition begins. For the speech enhancement purpose, we suggest the use of a deep autoencoder (DAE) algorithm. A two-step procedure is suggested: in the first step, an overcomplete DAE is trained in an unsupervised way, and in the second one, a denoising DAE is trained in a supervised way leveraging the clean speech produced in the previous step. Experimental results performed on a real-life mobile database confirmed the potentials of the proposed approach and show a reduction of the WER (Word Error Rate) of a ubiquitous Arabic speech recognizer. Further experiments show an improvement of the perceptual evaluation of speech quality (PESQ), and the short-time objective intelligibility (STOI) as well.

引用

页码：349 / 358

页数：10

共 50 条

[1] Evaluating Self-Supervised Speech Representations for Speech Emotion Recognition
Atmaja, Bagus Tris
Sasou, Akira
IEEE ACCESS, 2022, 10 : 124396 - 124407
[2] End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
Chang, Xuankai
Maekaku, Takashi
Fujita, Yuya
Watanabe, Shinji
INTERSPEECH 2022, 2022, : 3819 - 3823
[3] Boosting Self-Supervised Embeddings for Speech Enhancement
Hung, Kuo-Hsuan
Fu, Szu-Wei
Tseng, Huan-Hsin
Chiang, Hsin-Tien
Tsao, Yu
Lin, Chii-Wann
INTERSPEECH 2022, 2022, : 186 - 190
[4] Auditory processing of speech signals for robust speech recognition in real-world noisy environments
Kim, DS
Lee, SY
Kil, RM
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (01): : 55 - 69
[5] A Joint Speech Enhancement and Self-Supervised Representation Learning Framework for Noise-Robust Speech Recognition
Zhu, Qiu-Shi
Zhang, Jie
Zhang, Zi-Qiang
Dai, Li-Rong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1927 - 1939
[6] Robust coherence-based spectral enhancement for speech recognition in adverse real-world environments
Barfuss, Hendrik
Huemmer, Christian
Schwarz, Andreas
Kellermann, Walter
COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 388 - 400
[7] INVESTIGATING SELF-SUPERVISED LEARNING FOR SPEECH ENHANCEMENT AND SEPARATION
Huang, Zili
Watanabe, Shinji
Yang, Shu-wen
Garcia, Paola
Khudanpur, Sanjeev
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6837 - 6841
[8] SPEECH EMOTION RECOGNITION USING SELF-SUPERVISED FEATURES
Morais, Edmilson
Hoory, Ron
Zhu, Weizhong
Gat, Itai
Damasceno, Matheus
Aronowitz, Hagai
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6922 - 6926
[9] SPEAKER NORMALIZATION FOR SELF-SUPERVISED SPEECH EMOTION RECOGNITION
Gat, Itai
Aronowitz, Hagai
Zhu, Weizhong
Morais, Edmilson
Hoory, Ron
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7342 - 7346
[10] EFFICIENT ADAPTER TRANSFER OF SELF-SUPERVISED SPEECH MODELS FOR AUTOMATIC SPEECH RECOGNITION
Thomas, Bethan
Kessler, Samuel
Karout, Salah
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7102 - 7106

← 1 2 3 4 5 →