RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition

被引:0
|
作者
Zhou, Wei [1 ,2 ]
Beck, Eugen [1 ,2 ]
Berger, Simon [1 ,2 ]
Schlueter, Ralf [1 ,2 ]
Ney, Hermann [1 ,2 ]
机构
[1] Rhein Westfal TH Aachen, Dept Comp Sci, Machine Learning & Human Language Technol, D-52074 Aachen, Germany
[2] AppTek GmbH, D-52062 Aachen, Germany
来源
INTERSPEECH 2023 | 2023年
关键词
speech recognition; toolkit; sequence-to-sequence; decoder; beam search; RASR;
D O I
10.21437/Interspeech.2023-1062
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Modern public ASR tools usually provide rich support for training various sequence-to-sequence (S2S) models, but rather simple support for decoding open-vocabulary scenarios only. For closed-vocabulary scenarios, public tools supporting lexical-constrained decoding are usually only for classical ASR, or do not support all S2S models. To eliminate this restriction on research possibilities such as modeling unit choice, we present RASR2 in this work, a research-oriented generic S2S decoder implemented in C++. It offers a strong flexibility/compatibility for various S2S models, language models, label units/topologies and neural network architectures. It provides efficient decoding for both open- and closed-vocabulary scenarios based on a generalized search framework with rich support for different search modes and settings. We evaluate RASR2 with a wide range of experiments on both switchboard and Librispeech corpora. Our source code is public online.
引用
收藏
页码:4094 / 4098
页数:5
相关论文
共 50 条
  • [1] RASR/NN: THE RWTH NEURAL NETWORK TOOLKIT FOR SPEECH RECOGNITION
    Wiesler, Simon
    Richard, Alexander
    Golik, Pavel
    Schlueter, Ralf
    Ney, Hermann
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [2] MULTIMODAL GROUNDING FOR SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION
    Caglayan, Ozan
    Sanabria, Ramon
    Palaskar, Shruti
    Barrault, Loic
    Metze, Florian
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8648 - 8652
  • [3] Synthesizing waveform sequence-to-sequence to augment training data for sequence-to-sequence speech recognition
    Ueno, Sei
    Mimura, Masato
    Sakai, Shinsuke
    Kawahara, Tatsuya
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2021, 42 (06) : 333 - 343
  • [4] Advancing sequence-to-sequence based speech recognition
    Tuske, Zoltan
    Audhkhasi, Kartik
    Saon, George
    INTERSPEECH 2019, 2019, : 3780 - 3784
  • [5] A Comparison of Sequence-to-Sequence Models for Speech Recognition
    Prabhavalkar, Rohit
    Rao, Kanishka
    Sainath, Tara N.
    Li, Bo
    Johnson, Leif
    Jaitly, Navdeep
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 939 - 943
  • [6] On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition
    Irie, Kazuki
    Prabhavalkar, Rohit
    Kannan, Anjuli
    Bruguier, Antoine
    Rybach, David
    Nguyen, Patrick
    INTERSPEECH 2019, 2019, : 3800 - 3804
  • [7] ON USING 2D SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
    Bahar, Parnia
    Zeyer, Albert
    Schlueter, Ralf
    Ney, Hermann
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5671 - 5675
  • [8] Analysis of Multilingual Sequence-to-Sequence Speech Recognition Systems
    Karafiat, Martin
    Baskar, Murali Karthick
    Watanabe, Shinji
    Hori, Takaaki
    Wiesner, Matthew
    Cernocky, Jan Honza
    INTERSPEECH 2019, 2019, : 2220 - 2224
  • [9] SUPERVISED ATTENTION IN SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
    Yang, Gene-Ping
    Tang, Hao
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7222 - 7226
  • [10] Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text
    Baskar, Murali Karthick
    Watanabe, Shinji
    Astudillo, Ramon
    Hori, Takaaki
    Burget, Lukas
    Cernocky, Jan
    INTERSPEECH 2019, 2019, : 3790 - 3794