SUBWORD REGULARIZATION AND BEAM SEARCH DECODING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION

被引：0

作者：

Drexler, Jennifer ^{[1
]}

Glass, James ^{[1
]}

机构：

[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

automatic speech recognition; subword units; beam search; CTC; attention;

D O I：

10.1109/icassp.2019.8683531

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we experiment with the recently introduced subword regularization technique [ 1] in the context of end-to-end automatic speech recognition ( ASR). We present results from both attention-based and CTC-based ASR systems on two common benchmark datasets, the 80 hour Wall Street Journal corpus and 1,000 hour Librispeech corpus. We also introduce a novel subword beam search decoding algorithm that significantly improves the final performance of the CTC-based systems. Overall, we find that subword regularization improves the performance of both types of ASR systems, with the regularized attention-based model performing best overall.

引用

页码：6266 / 6270

页数：5

共 50 条

[41] Multi-Stream End-to-End Speech Recognition
Li, Ruizhi
Wang, Xiaofei
Mallidi, Sri Harish
Watanabe, Shinji
Hori, Takaaki
Hermansky, Hynek
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (646-655) : 646 - 655
[42] END-TO-END AUTOMATIC SPEECH RECOGNITION INTEGRATED WITH CTC-BASED VOICE ACTIVITY DETECTION
Yoshimura, Takenori
Hayashi, Tomoki
Takeda, Kazuya
Watanabe, Shinji
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6999 - 7003
[43] Reducing Multilingual Context Confusion for End-to-end Code-switching Automatic Speech Recognition
Zhang, Shuai
Yi, Jiangyan
Tian, Zhengkun
Tao, Jianhua
Yeung, Yu Ting
Deng, Liqun
INTERSPEECH 2022, 2022, : 3894 - 3898
[44] Emphasizing unseen words: New vocabulary acquisition for end-to-end speech recognition
Qu, Leyuan
Weber, Cornelius
Wermter, Stefan
NEURAL NETWORKS, 2023, 161 : 494 - 504
[45] End-to-End Mandarin Speech Recognition Combining CNN and BLSTM
Wang, Dong
Wang, Xiaodong
Lv, Shaohe
SYMMETRY-BASEL, 2019, 11 (05):
[46] ESPRESSO: A FAST END-TO-END NEURAL SPEECH RECOGNITION TOOLKIT
Wang, Yiming
Chen, Tongfei
Xu, Hainan
Ding, Shuoyang
Lv, Hang
Shao, Yiwen
Peng, Nanyun
Xie, Lei
Watanabe, Shinji
Khudanpur, Sanjeev
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 136 - 143
[47] SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition
Fu, Li
Li, Xiaoxiao
Wang, Runyu
Fan, Lu
Zhang, Zhengchen
Chen, Meng
Wu, Youzheng
He, Xiaodong
INTERSPEECH 2022, 2022, : 1006 - 1010
[48] Online Continual Learning of End-to-End Speech Recognition Models
Yang, Muqiao
Lane, Ian
Watanabe, Shinji
INTERSPEECH 2022, 2022, : 2668 - 2672
[49] Tunisian Dialectal End-to-end Speech Recognition based on DeepSpeech
Messaoudi, Abir
Haddad, Hatem
Fourati, Chayma
Hmida, Moez BenHaj
Mabrouk, Aymen Ben Elhaj
Graiet, Mohamed
AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 183 - 190
[50] VERY DEEP CONVOLUTIONAL NETWORKS FOR END-TO-END SPEECH RECOGNITION
Zhang, Yu
Chan, William
Jaitly, Navdeep
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4845 - 4849

← 1 2 3 4 5 →