Unpaired Speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition

被引：19

作者：

Kim, Geonmin ^{[1
]}

Lee, Hwaran ^{[1
]}

Kim, Bo-Kyeong ^{[1
]}

Oh, Sang-Hoon ^{[2
]}

Lee, Soo-Young ^{[3
]}

机构：

[1] Korea Adv Inst Sci & Technol, Dept Elect Engn, Daejeon 305701, South Korea

[2] Mokwon Univ, Div Informat & Commun Convergence Engn, Daejeon 302318, South Korea

[3] Korea Adv Inst Sci & Technol, Inst Artificial Intelligence, Daejeon 305701, South Korea

来源：

IEEE SIGNAL PROCESSING LETTERS | 2019年 / 26卷 / 01期

关键词：

Speech enhancement; room simulator; connectionist temporal classification; generative adversarial network;

D O I：

10.1109/LSP.2018.2880285

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Many speech enhancement methods try to learn the relationship between noisy and clean speechs, obtained using an acoustic room simulator. We point out several limitations of enhancement methods relying on clean speech targets; the goal of this letter is to propose an alternative learning algorithm, called acoustic and adversarial supervision (AAS). AAS makes the enhanced output both maximizing the likelihood of transcription on the pre-trained acoustic model and having general characteristics of clean speech, which improve generalization on unseen noisy speeches. We employ the connectionist temporal classification and the unpaired conditional boundary equilibrium generative adversarial network as the loss function of AAS. AAS is tested on two datasets including additive noise without and with reverberation, Librispeech + DEMAND, and CHiME-4. By visualizing the enhanced speech with different loss combinations, we demonstrate the role of each supervision. AAS achieves a lower word error rate than other state-of-the-art methods using the clean speech target in both datasets.

引用

页码：159 / 163

页数：5

共 50 条

[31] Single channel enhancement for speech recognition
Droppo, Jasha
2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 2008, : 94 - 98
[32] Controllable Conformer for Speech Enhancement and Recognition
Guo, Zilu
Du, Jun
Siniscalchi, Sabato Marco
Pan, Jia
Liu, Qingfeng
IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 156 - 160
[33] Speech Enhancement Using Forked Generative Adversarial Networks with Spectral Subtraction
Lin, Ju
Niu, Sufeng
Wei, Zice
Lan, Xiang
van Wijngaarden, Adriaan J.
Smith, Melissa C.
Wang, Kuang-Ching
INTERSPEECH 2019, 2019, : 3163 - 3167
[34] Time-domain speech enhancement using generative adversarial networks
Pascual, Santiago
Serra, Joan
Bonafonte, Antonio
SPEECH COMMUNICATION, 2019, 114 : 10 - 21
[35] On the Use of Audio Fingerprinting Features for Speech Enhancement with Generative Adversarial Network
Faraji, Farnood
Attabi, Yazid
Champagne, Benoit
Zhu, Wei-Ping
2020 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2020, : 77 - 82
[36] Performance Analysis of Speech Enhancement Algorithm for Robust Speech Recognition System
Babu, C. Ganesh
Vanathi, P. T.
Ramachandran, R.
Rajaa, M. Senthil
RECENT ADVANCES IN NETWORKING, VLSI AND SIGNAL PROCESSING, 2010, : 197 - +
[37] Combining speech enhancement and auditory feature extraction for robust speech recognition
Kleinschmidt, M
Tchorz, J
Kollmeier, B
SPEECH COMMUNICATION, 2001, 34 (1-2) : 75 - 91
[38] Towards Generalized Speech Enhancement with Generative Adversarial Networks
Pascual, Santiago
Serra, Joan
Bonafonte, Antonio
INTERSPEECH 2019, 2019, : 1791 - 1795
[39] Speech Enhancement Method Based On LSTM Neural Network for Speech Recognition
Liu, Ming
Wang, Yujun
Wang, Jin
Wang, Jing
Xie, Xiang
PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 245 - 249
[40] Comparative Evaluation of Speech Enhancement Methods for Robust Automatic Speech Recognition
Paliwal, Kuldip K.
Lyons, James G.
So, Stephen
Stark, Anthony P.
Wojcicki, Kamil K.
2010 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2010,

← 1 2 3 4 5 →