SUBWORD REGULARIZATION AND BEAM SEARCH DECODING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION

被引：0

作者：

Drexler, Jennifer ^{[1
]}

Glass, James ^{[1
]}

机构：

[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

automatic speech recognition; subword units; beam search; CTC; attention;

D O I：

10.1109/icassp.2019.8683531

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we experiment with the recently introduced subword regularization technique [ 1] in the context of end-to-end automatic speech recognition ( ASR). We present results from both attention-based and CTC-based ASR systems on two common benchmark datasets, the 80 hour Wall Street Journal corpus and 1,000 hour Librispeech corpus. We also introduce a novel subword beam search decoding algorithm that significantly improves the final performance of the CTC-based systems. Overall, we find that subword regularization improves the performance of both types of ASR systems, with the regularized attention-based model performing best overall.

引用

页码：6266 / 6270

页数：5

共 50 条

[31] Modular End-to-End Automatic Speech Recognition Framework for Acoustic-to-Word Model
Liu, Qi
Chen, Zhehuai
Li, Hao
Huang, Mingkun
Lu, Yizhou
Yu, Kai
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 2174 - 2183
[32] TOWARDS A ROMANIAN END-TO-END AUTOMATIC SPEECH RECOGNITION BASED ON DEEPSPEECH2
Avram, Andrei-Marius
Pais, Vasile
Tufis, Dan
PROCEEDINGS OF THE ROMANIAN ACADEMY SERIES A-MATHEMATICS PHYSICS TECHNICAL SCIENCES INFORMATION SCIENCE, 2020, 21 (04): : 395 - 402
[33] Investigating the Impact of Spectral and Temporal Degradation on End-to-End Automatic Speech Recognition Performance
Ashihara, Takanori
Moriya, Takafumi
Kashino, Makio
INTERSPEECH 2021, 2021, : 1757 - 1761
[34] AN EVALUATION OF WORD-LEVEL CONFIDENCE ESTIMATION FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
Oneata, Dan
Caranica, Alexandru
Stan, Adriana
Cucu, Horia
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 258 - 265
[35] Phonetically Induced Subwords for End-to-End Speech Recognition
Papadourakis, Vasileios
Mueller, Markus
Liu, Jing
Mouchtaris, Athanasios
Omologo, Maurizio
INTERSPEECH 2021, 2021, : 1992 - 1996
[36] Fast offline transformer-based end-to-end automatic speech recognition for real-world applications
Oh, Yoo Rhee
Park, Kiyoung
Park, Jeon Gue
ETRI JOURNAL, 2022, 44 (03) : 476 - 490
[37] DEEP CONTEXT: END-TO-END CONTEXTUAL SPEECH RECOGNITION
Pundak, Golan
Sainath, Tara N.
Prabhavalkar, Rohit
Kannan, Anjuli
Zhao, Ding
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 418 - 425
[38] Online Hybrid CTC/Attention Architecture for End-to-end Speech Recognition
Miao, Haoran
Cheng, Gaofeng
Zhang, Pengyuan
Li, Ta
Yan, Yonghong
INTERSPEECH 2019, 2019, : 2623 - 2627
[39] PERSONALIZATION STRATEGIES FOR END-TO-END SPEECH RECOGNITION SYSTEMS
Gourav, Aditya
Liu, Linda
Gandhe, Ankur
Gu, Yile
Lan, Guitang
Huang, Xiangyang
Kalmane, Shashank
Tiwari, Gautam
Filimonov, Denis
Rastrow, Ariya
Stolcke, Andreas
Bulyko, Ivan
Alexa, Amazon
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7348 - 7352
[40] Lightweight End-to-End Architecture for Streaming Speech Recognition
Yang S.
Li X.
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2023, 36 (03): : 268 - 279

← 1 2 3 4 5 →