SUBWORD REGULARIZATION AND BEAM SEARCH DECODING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION

被引:0
|
作者
Drexler, Jennifer [1 ]
Glass, James [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
来源
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年
关键词
automatic speech recognition; subword units; beam search; CTC; attention;
D O I
10.1109/icassp.2019.8683531
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we experiment with the recently introduced subword regularization technique [ 1] in the context of end-to-end automatic speech recognition ( ASR). We present results from both attention-based and CTC-based ASR systems on two common benchmark datasets, the 80 hour Wall Street Journal corpus and 1,000 hour Librispeech corpus. We also introduce a novel subword beam search decoding algorithm that significantly improves the final performance of the CTC-based systems. Overall, we find that subword regularization improves the performance of both types of ASR systems, with the regularized attention-based model performing best overall.
引用
收藏
页码:6266 / 6270
页数:5
相关论文
共 50 条
  • [21] END-TO-END SPEECH RECOGNITION AND KEYWORD SEARCH ON LOW-RESOURCE LANGUAGES
    Rosenberg, Andrew
    Audhkhasi, Kartik
    Sethy, Abhinav
    Ramabhadran, Bhuvana
    Picheny, Michael
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5280 - 5284
  • [22] End-to-end neural automatic speech recognition system for low resource languages
    Dhahbi, Sami
    Saleem, Nasir
    Bourouis, Sami
    Berrima, Mouhebeddine
    Verdu, Elena
    EGYPTIAN INFORMATICS JOURNAL, 2025, 29
  • [23] An End-to-End model for Vietnamese speech recognition
    Van Huy Nguyen
    2019 IEEE - RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF), 2019, : 307 - 312
  • [24] End-to-End Speech Recognition For Arabic Dialects
    Seham Nasr
    Rehab Duwairi
    Muhannad Quwaider
    Arabian Journal for Science and Engineering, 2023, 48 : 10617 - 10633
  • [25] End-to-End Speech Recognition For Arabic Dialects
    Nasr, Seham
    Duwairi, Rehab
    Quwaider, Muhannad
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (08) : 10617 - 10633
  • [26] End-to-End Speech Recognition in Agglutinative Languages
    Mamyrbayev, Orken
    Alimhan, Keylan
    Zhumazhanov, Bagashar
    Turdalykyzy, Tolganay
    Gusmanova, Farida
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT II, 2020, 12034 : 391 - 401
  • [27] Hybrid CTC/Attention Architecture for End-to-End Speech Recognition
    Watanabe, Shinji
    Hori, Takaaki
    Kim, Suyoun
    Hershey, John R.
    Hayashi, Tomoki
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1240 - 1253
  • [28] Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection
    Moriya, Takafumi
    Sato, Hiroshi
    Ochiai, Tsubasa
    Delcroix, Marc
    Shinozaki, Takahiro
    IEEE ACCESS, 2023, 11 : 13906 - 13917
  • [29] A DENSITY RATIO APPROACH TO LANGUAGE MODEL FUSION IN END-TO-END AUTOMATIC SPEECH RECOGNITION
    McDermott, Erik
    Sak, Hasim
    Variani, Ehsan
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 434 - 441
  • [30] DECOUPLING PRONUNCIATION AND LANGUAGE FOR END-TO-END CODE-SWITCHING AUTOMATIC SPEECH RECOGNITION
    Zhang, Shuai
    Yi, Jiangyan
    Tian, Zhengkun
    Bai, Ye
    Tao, Jianhua
    Wen, Zhengqi
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6249 - 6253