SUBWORD REGULARIZATION AND BEAM SEARCH DECODING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION

被引:0
|
作者
Drexler, Jennifer [1 ]
Glass, James [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
来源
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年
关键词
automatic speech recognition; subword units; beam search; CTC; attention;
D O I
10.1109/icassp.2019.8683531
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we experiment with the recently introduced subword regularization technique [ 1] in the context of end-to-end automatic speech recognition ( ASR). We present results from both attention-based and CTC-based ASR systems on two common benchmark datasets, the 80 hour Wall Street Journal corpus and 1,000 hour Librispeech corpus. We also introduce a novel subword beam search decoding algorithm that significantly improves the final performance of the CTC-based systems. Overall, we find that subword regularization improves the performance of both types of ASR systems, with the regularized attention-based model performing best overall.
引用
收藏
页码:6266 / 6270
页数:5
相关论文
共 50 条
  • [41] Multi-Stream End-to-End Speech Recognition
    Li, Ruizhi
    Wang, Xiaofei
    Mallidi, Sri Harish
    Watanabe, Shinji
    Hori, Takaaki
    Hermansky, Hynek
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (646-655) : 646 - 655
  • [42] END-TO-END AUTOMATIC SPEECH RECOGNITION INTEGRATED WITH CTC-BASED VOICE ACTIVITY DETECTION
    Yoshimura, Takenori
    Hayashi, Tomoki
    Takeda, Kazuya
    Watanabe, Shinji
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6999 - 7003
  • [43] Reducing Multilingual Context Confusion for End-to-end Code-switching Automatic Speech Recognition
    Zhang, Shuai
    Yi, Jiangyan
    Tian, Zhengkun
    Tao, Jianhua
    Yeung, Yu Ting
    Deng, Liqun
    INTERSPEECH 2022, 2022, : 3894 - 3898
  • [44] Emphasizing unseen words: New vocabulary acquisition for end-to-end speech recognition
    Qu, Leyuan
    Weber, Cornelius
    Wermter, Stefan
    NEURAL NETWORKS, 2023, 161 : 494 - 504
  • [45] End-to-End Mandarin Speech Recognition Combining CNN and BLSTM
    Wang, Dong
    Wang, Xiaodong
    Lv, Shaohe
    SYMMETRY-BASEL, 2019, 11 (05):
  • [46] ESPRESSO: A FAST END-TO-END NEURAL SPEECH RECOGNITION TOOLKIT
    Wang, Yiming
    Chen, Tongfei
    Xu, Hainan
    Ding, Shuoyang
    Lv, Hang
    Shao, Yiwen
    Peng, Nanyun
    Xie, Lei
    Watanabe, Shinji
    Khudanpur, Sanjeev
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 136 - 143
  • [47] SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition
    Fu, Li
    Li, Xiaoxiao
    Wang, Runyu
    Fan, Lu
    Zhang, Zhengchen
    Chen, Meng
    Wu, Youzheng
    He, Xiaodong
    INTERSPEECH 2022, 2022, : 1006 - 1010
  • [48] Online Continual Learning of End-to-End Speech Recognition Models
    Yang, Muqiao
    Lane, Ian
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 2668 - 2672
  • [49] Tunisian Dialectal End-to-end Speech Recognition based on DeepSpeech
    Messaoudi, Abir
    Haddad, Hatem
    Fourati, Chayma
    Hmida, Moez BenHaj
    Mabrouk, Aymen Ben Elhaj
    Graiet, Mohamed
    AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 183 - 190
  • [50] VERY DEEP CONVOLUTIONAL NETWORKS FOR END-TO-END SPEECH RECOGNITION
    Zhang, Yu
    Chan, William
    Jaitly, Navdeep
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4845 - 4849