RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition

被引:0
|
作者
Zhou, Wei [1 ,2 ]
Beck, Eugen [1 ,2 ]
Berger, Simon [1 ,2 ]
Schlueter, Ralf [1 ,2 ]
Ney, Hermann [1 ,2 ]
机构
[1] Rhein Westfal TH Aachen, Dept Comp Sci, Machine Learning & Human Language Technol, D-52074 Aachen, Germany
[2] AppTek GmbH, D-52062 Aachen, Germany
来源
INTERSPEECH 2023 | 2023年
关键词
speech recognition; toolkit; sequence-to-sequence; decoder; beam search; RASR;
D O I
10.21437/Interspeech.2023-1062
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Modern public ASR tools usually provide rich support for training various sequence-to-sequence (S2S) models, but rather simple support for decoding open-vocabulary scenarios only. For closed-vocabulary scenarios, public tools supporting lexical-constrained decoding are usually only for classical ASR, or do not support all S2S models. To eliminate this restriction on research possibilities such as modeling unit choice, we present RASR2 in this work, a research-oriented generic S2S decoder implemented in C++. It offers a strong flexibility/compatibility for various S2S models, language models, label units/topologies and neural network architectures. It provides efficient decoding for both open- and closed-vocabulary scenarios based on a generalized search framework with rich support for different search modes and settings. We evaluate RASR2 with a wide range of experiments on both switchboard and Librispeech corpora. Our source code is public online.
引用
收藏
页码:4094 / 4098
页数:5
相关论文
共 50 条
  • [21] Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions
    Hannun, Awni
    Lee, Ann
    Xu, Qiantong
    Collobert, Ronan
    INTERSPEECH 2019, 2019, : 3785 - 3789
  • [22] Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition
    Novitasari, Sashi
    Tjandra, Andros
    Sakti, Sakriani
    Nakamura, Satoshi
    INTERSPEECH 2019, 2019, : 3835 - 3839
  • [23] LEVERAGING SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS FOR ENHANCING ACOUSTIC-TO-WORD SPEECH RECOGNITION
    Mimura, Masato
    Ueno, Sei
    Inaguma, Hirofumi
    Sakai, Shinsuke
    Kawahara, Tatsuya
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 477 - 484
  • [24] MULTILINGUAL SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION: ARCHITECTURE, TRANSFER LEARNING, AND LANGUAGE MODELING
    Cho, Jaejin
    Baskar, Murali Karthick
    Li, Ruizhi
    Wiesner, Matthew
    Mallidi, Sri Harish
    Yalta, Nelson
    Karafiat, Martin
    Watanabe, Shinji
    Hori, Takaaki
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 521 - 527
  • [25] IMPROVING SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION TRAINING WITH ON-THE-FLY DATA AUGMENTATION
    Nguyen, Thai-Son
    Stuker, Sebastian
    Niehues, Jan
    Waibel, Alex
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7689 - 7693
  • [26] On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR
    Lam, Tsz Kin
    Ohta, Mayumi
    Schamoni, Shigehiko
    Riezler, Stefan
    INTERSPEECH 2021, 2021, : 1299 - 1303
  • [27] Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese
    Zhou, Shiyu
    Dong, Linhao
    Xu, Shuang
    Xu, Bo
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 791 - 795
  • [28] CONFIDENCE ESTIMATION FOR ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
    Li, Qiujia
    Qiu, David
    Zhang, Yu
    Li, Bo
    He, Yanzhang
    Woodland, Philip C.
    Cao, Liangliang
    Strohman, Trevor
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6388 - 6392
  • [29] MINIMUM LATENCY TRAINING STRATEGIES FOR STREAMING SEQUENCE-TO-SEQUENCE ASR
    Inaguma, Hirofumi
    Gaur, Yashesh
    Lu, Liang
    Li, Jinyu
    Gong, Yifan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6064 - 6068
  • [30] MITIGATING THE IMPACT OF SPEECH RECOGNITION ERRORS ON CHATBOT USING SEQUENCE-TO-SEQUENCE MODEL
    Chen, Pin-Jung
    Hsu, I-Hung
    Huang, Yi-Yao
    Lee, Hung-Yi
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 497 - 503