Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition

被引：34

作者：

Weng, Chao ^{[1
]}

Cui, Jia ^{[1
]}

Wang, Guangsen ^{[2
]}

Wang, Jun ^{[2
]}

Yu, Changzhu ^{[1
]}

Su, Dan ^{[2
]}

Yu, Dong ^{[1
]}

机构：

[1] Tencent AI Lab, Bellevue, WA 98004 USA

[2] Tencent AI Lab, Shenzhen, Peoples R China

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

attention based sequence-to-sequence models; end-to-end speech recognition; sequential minimum Bayes risk training; MBR;

D O I：

10.21437/Interspeech.2018-1030

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we propose two improvements to attention based sequence-to-sequence models for end-to-end speech recognition systems. For the first improvement, we propose to use an input-feeding architecture which feeds not only the previous context vector but also the previous decoder hidden state information as inputs to the decoder. The second improvement is based on a better hypothesis generation scheme for sequential minimum Bayes risk (MBR) training of sequence-to-sequence models where we introduce softmax smoothing into N-best generation during MBR training. We conduct the experiments on both Switchboard-300hrs and Switchboard+Fisher-2000hrs datasets and observe significant gains from both proposed improvements. Together with other training strategies such as dropout and scheduled sampling, our best model achieved WERs of 8.3%/15.5% on the Switchboard/CallHome subsets of Eval2000 without any external language models which is highly competitive among state-of-the-art English conversational speech recognition systems.

引用

页码：761 / 765

页数：5

共 50 条

[21] ACOUSTIC-TO-WORD RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS
Palaskar, Shruti
Metze, Florian
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 397 - 404
[22] End-to-End Speech Recognition of Tamil Language
Changrampadi, Mohamed Hashim
Shahina, A.
Narayanan, M. Badri
Khan, A. Nayeemulla
INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 32 (02) : 1309 - 1323
[23] PARAMETER UNCERTAINTY FOR END-TO-END SPEECH RECOGNITION
Braun, Stefan
Liu, Shih-Chii
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5636 - 5640
[24] Performance Monitoring for End-to-End Speech Recognition
Li, Ruizhi
Sell, Gregory
Hermansky, Hynek
INTERSPEECH 2019, 2019, : 2245 - 2249
[25] IMPROVING NON-AUTOREGRESSIVE END-TO-END SPEECH RECOGNITION WITH PRE-TRAINED ACOUSTIC AND LANGUAGE MODELS
Deng, Keqi
Yang, Zehui
Watanabe, Shinji
Higuchi, Yosuke
Cheng, Gaofeng
Zhang, Pengyuan
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8522 - 8526
[26] RELAXED ATTENTION: A SIMPLE METHOD TO BOOST PERFORMANCE OF END-TO-END AUTOMATIC SPEECH RECOGNITION
Lohrenz, Timo
Schwarz, Patrick
Li, Zhengyang
Fingscheidt, Tim
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 177 - 184
[27] INVESTIGATING END-TO-END SPEECH RECOGNITION FOR MANDARIN-ENGLISH CODE-SWITCHING
Shan, Changhao
Weng, Chao
Wang, Guangsen
Su, Dan
Luo, Min
Yu, Dong
Xie, Lei
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6056 - 6060
[28] Hybrid CTC-Attention Network-Based End-to-End Speech Recognition System for Korean Language
Park, Hosung
Kim, Changmin
Son, Hyunsoo
Seo, Soonshin
Kim, Ji-Hwan
JOURNAL OF WEB ENGINEERING, 2022, 21 (02): : 265 - 284
[29] Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition
Weng, Chao
Yu, Chengzhu
Cui, Jia
Zhang, Chunlei
Yu, Dong
INTERSPEECH 2020, 2020, : 966 - 970
[30] Multi-Stream End-to-End Speech Recognition
Li, Ruizhi
Wang, Xiaofei
Mallidi, Sri Harish
Watanabe, Shinji
Hori, Takaaki
Hermansky, Hynek
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (646-655) : 646 - 655

← 1 2 3 4 5 →