Jointly Adversarial Enhancement Training for Robust End-to-End Speech Recognition

被引：19

作者：

Liu, Bin ^{[1
,2
]}

Nie, Shuai ^{[1
]}

Liang, Shan ^{[1
]}

Liu, Wenju ^{[1
]}

Yu, Meng ^{[3
]}

Chen, Lianwu ^{[4
]}

Peng, Shouye ^{[5
]}

Li, Changliang ^{[6
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China

[3] Tencent AI Lab, Bellevue, WA USA

[4] Tencent AI Lab, Shenzhen, Peoples R China

[5] Xueersi Online Sch, Beijing, Peoples R China

[6] Kingsoft AI Lab, Beijing, Peoples R China

来源：

INTERSPEECH 2019 | 2019年

基金：

中国国家自然科学基金;

关键词：

end-to-end speech recognition; robust speech recognition; speech enhancement; generative adversarial networks;

D O I：

10.21437/Interspeech.2019-1242

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Recently, the end-to-end system has made significant breakthroughs in the field of speech recognition. However, this single end-to-end architecture is not especially robust to the input variations interfered of noises and reverberations, resulting in performance degradation dramatically in reality. To alleviate this issue, the mainstream approach is to use a well-designed speech enhancement module as the front-end of ASR. However, enhancement modules would result in speech distortions and mismatches to training, which sometimes degrades the ASR performance. In this paper, we propose a jointly adversarial enhancement training to boost robustness of end-to-end systems. Specifically, we use a jointly compositional scheme of mask-based enhancement network, attention-based encoder-decoder network and discriminant network during training. The discriminator is used to distinguish between the enhanced features from enhancement network and clean features, which could guide enhancement network to output towards the realistic distribution. With the joint optimization of the recognition, enhancement and adversarial loss, the compositional scheme is expected to learn more robust representations for the recognition task automatically. Systematic experiments on AISHELL-1 show that the proposed method improves the noise robustness of end-to-end systems and achieves the relative error rate reduction of 4.6% over the multi-condition training.

引用

页码：491 / 495

页数：5

共 50 条

[1] Adversarial joint training with self-attention mechanism for robust end-to-end speech recognition
Li, Lujun
Kang, Yikai
Shi, Yuchen
Kurzinger, Ludwig
Watzel, Tobias
Rigoll, Gerhard
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
[2] Adversarial joint training with self-attention mechanism for robust end-to-end speech recognition
Lujun Li
Yikai Kang
Yuchen Shi
Ludwig Kürzinger
Tobias Watzel
Gerhard Rigoll
EURASIP Journal on Audio, Speech, and Music Processing, 2021
[3] Adversarial Regularization for Attention Based End-to-End Robust Speech Recognition
Sun, Sining
Guo, Pengcheng
Xie, Lei
Hwang, Mei-Yuh
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1826 - 1838
[4] STREAMING END-TO-END SPEECH RECOGNITION WITH JOINTLY TRAINED NEURAL FEATURE ENHANCEMENT
Kim, Chanwoo
Garg, Abhinav
Gowda, Dhananjaya
Mun, Seongkyu
Han, Changwoo
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6773 - 6777
[5] End-to-End Speech Translation with Adversarial Training
Li, Xuancai
Chen, Kehai
Zhao, Tiejun
Yang, Muyun
WORKSHOP ON AUTOMATIC SIMULTANEOUS TRANSLATION CHALLENGES, RECENT ADVANCES, AND FUTURE DIRECTIONS, 2020, : 10 - 14
[6] COMBINING END-TO-END AND ADVERSARIAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION
Drexler, Jennifer
Glass, James
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 361 - 368
[7] ADVERSARIAL TRAINING OF END-TO-END SPEECH RECOGNITION USING A CRITICIZING LANGUAGE MODEL
Liu, Alexander H.
Lee, Hung-yi
Lee, Lin-shan
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6176 - 6180
[8] Accented Speech Recognition Based on End-to-End Domain Adversarial Training of Neural Networks
Na, Hyeong-Ju
Park, Jeong-Sik
APPLIED SCIENCES-BASEL, 2021, 11 (18):
[9] SPEECH ENHANCEMENT USING END-TO-END SPEECH RECOGNITION OBJECTIVES
Subramanian, Aswin Shanmugam
Wang, Xiaofei
Baskar, Murali Karthick
Watanabe, Shinji
Taniguchi, Toru
Tran, Dung
Fujita, Yuya
2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 234 - 238
[10] END-TO-END TRAINING OF A LARGE VOCABULARY END-TO-END SPEECH RECOGNITION SYSTEM
Kim, Chanwoo
Kim, Sungsoo
Kim, Kwangyoun
Kumar, Mehul
Kim, Jiyeon
Lee, Kyungmin
Han, Changwoo
Garg, Abhinav
Kim, Eunhyang
Shin, Minkyoo
Singh, Shatrughan
Heck, Larry
Gowda, Dhananjaya
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 562 - 569

← 1 2 3 4 5 →