Unpaired Speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition

被引：19

作者：

Kim, Geonmin ^{[1
]}

Lee, Hwaran ^{[1
]}

Kim, Bo-Kyeong ^{[1
]}

Oh, Sang-Hoon ^{[2
]}

Lee, Soo-Young ^{[3
]}

机构：

[1] Korea Adv Inst Sci & Technol, Dept Elect Engn, Daejeon 305701, South Korea

[2] Mokwon Univ, Div Informat & Commun Convergence Engn, Daejeon 302318, South Korea

[3] Korea Adv Inst Sci & Technol, Inst Artificial Intelligence, Daejeon 305701, South Korea

来源：

IEEE SIGNAL PROCESSING LETTERS | 2019年 / 26卷 / 01期

关键词：

Speech enhancement; room simulator; connectionist temporal classification; generative adversarial network;

D O I：

10.1109/LSP.2018.2880285

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Many speech enhancement methods try to learn the relationship between noisy and clean speechs, obtained using an acoustic room simulator. We point out several limitations of enhancement methods relying on clean speech targets; the goal of this letter is to propose an alternative learning algorithm, called acoustic and adversarial supervision (AAS). AAS makes the enhanced output both maximizing the likelihood of transcription on the pre-trained acoustic model and having general characteristics of clean speech, which improve generalization on unseen noisy speeches. We employ the connectionist temporal classification and the unpaired conditional boundary equilibrium generative adversarial network as the loss function of AAS. AAS is tested on two datasets including additive noise without and with reverberation, Librispeech + DEMAND, and CHiME-4. By visualizing the enhanced speech with different loss combinations, we demonstrate the role of each supervision. AAS achieves a lower word error rate than other state-of-the-art methods using the clean speech target in both datasets.

引用

页码：159 / 163

页数：5

共 50 条

[21] Compensation of speech enhancement distortion for robust speech recognition
Ding, P
Cao, ZG
2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 449 - 452
[22] Speech enhancement applied to speech recognition in noisy environments
Xu, Y.F., 2001, Press of Tsinghua University (41):
[23] DUAL APPLICATION OF SPEECH ENHANCEMENT FOR AUTOMATIC SPEECH RECOGNITION
Pandey, Ashutosh
Liu, Chunxi
Wang, Yun
Saraf, Yatharth
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 223 - 228
[24] CONSTRAINED ITERATIVE SPEECH ENHANCEMENT WITH APPLICATION TO SPEECH RECOGNITION
HANSEN, JHL
CLEMENTS, MA
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1991, 39 (04) : 795 - 805
[25] Robust recognition of noisy speech using speech enhancement
Xu, YF
Zhang, JJ
Yao, KS
Cao, ZG
Ma, ZX
2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 734 - 737
[26] ADVERSARIAL LEARNING OF RAW SPEECH FEATURES FOR DOMAIN INVARIANT SPEECH RECOGNITION
Tripathi, Aditay
Mohan, Aanchan
Anand, Saket
Singh, Maneesh
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5959 - 5963
[27] Adversarial Dictionary Learning for Monaural Speech Enhancement
Ji, Yunyun
Xu, Longting
Zhu, Wei-Ping
INTERSPEECH 2020, 2020, : 4034 - 4038
[28] CycleGAN-based speech enhancement for the unpaired training data
Yuan, Jing
Bao, Changchun
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 878 - 883
[29] Adversarial Latent Representation Learning for Speech Enhancement
Qiu, Yuanhang
Wang, Ruili
INTERSPEECH 2020, 2020, : 2662 - 2666
[30] ATT:Adversarial Trained Transformer for Speech Enhancement
Aitawade, Aniket
Bharati, Puja
Chandra, Sabyasachi
Prasad, G. Satya
Pramanik, Debolina
Khadse, Parth Sanjay
Das Mandal, Shyamal Kumar
SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 258 - 270

← 1 2 3 4 5 →