RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition

被引:0
|
作者
Zhou, Wei [1 ,2 ]
Beck, Eugen [1 ,2 ]
Berger, Simon [1 ,2 ]
Schlueter, Ralf [1 ,2 ]
Ney, Hermann [1 ,2 ]
机构
[1] Rhein Westfal TH Aachen, Dept Comp Sci, Machine Learning & Human Language Technol, D-52074 Aachen, Germany
[2] AppTek GmbH, D-52062 Aachen, Germany
来源
INTERSPEECH 2023 | 2023年
关键词
speech recognition; toolkit; sequence-to-sequence; decoder; beam search; RASR;
D O I
10.21437/Interspeech.2023-1062
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Modern public ASR tools usually provide rich support for training various sequence-to-sequence (S2S) models, but rather simple support for decoding open-vocabulary scenarios only. For closed-vocabulary scenarios, public tools supporting lexical-constrained decoding are usually only for classical ASR, or do not support all S2S models. To eliminate this restriction on research possibilities such as modeling unit choice, we present RASR2 in this work, a research-oriented generic S2S decoder implemented in C++. It offers a strong flexibility/compatibility for various S2S models, language models, label units/topologies and neural network architectures. It provides efficient decoding for both open- and closed-vocabulary scenarios based on a generalized search framework with rich support for different search modes and settings. We evaluate RASR2 with a wide range of experiments on both switchboard and Librispeech corpora. Our source code is public online.
引用
收藏
页码:4094 / 4098
页数:5
相关论文
共 50 条
  • [31] A Comparison of Modeling Units in Sequence-to-Sequence Speech Recognition with the Transformer on Mandarin Chinese
    Zhou, Shiyu
    Dong, Linhao
    Xu, Shuang
    Xu, Bo
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT V, 2018, 11305 : 210 - 220
  • [32] SEQUENCE-TO-SEQUENCE AUTOMATIC SPEECH RECOGNITION WITH WORD EMBEDDING REGULARIZATION AND FUSED DECODING
    Liu, Alexander H.
    Sung, Tzu-Wei
    Chuang, Shun-Po
    Lee, Hung-yi
    Lee, Lin-shah
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7879 - 7883
  • [33] Sequence-to-Sequence Models for Emphasis Speech Translation
    Quoc Truong Do
    Sakti, Sakriani
    Nakamura, Satoshi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (10) : 1873 - 1883
  • [34] Direct speech-to-speech translation with a sequence-to-sequence model
    Jia, Ye
    Weiss, Ron J.
    Biadsy, Fadi
    Macherey, Wolfgang
    Johnson, Melvin
    Chen, Zhifeng
    Wu, Yonghui
    INTERSPEECH 2019, 2019, : 1123 - 1127
  • [35] Semi-supervised Training for Sequence-to-Sequence Speech Recognition Using Reinforcement Learning
    Chung, Hoon
    Jeon, Hyeong-Bae
    Park, Jeon Gue
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [36] Automatic Pronunciation Generator for Indonesian Speech Recognition System Based on Sequence-to-Sequence Model
    Hoesen, Devin
    Putri, Fanda Yuliana
    Lestari, Dessi Puji
    2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2019, : 7 - 12
  • [37] Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection
    Liu, Danni
    Spanakis, Gerasimos
    Niehues, Jan
    INTERSPEECH 2020, 2020, : 3620 - 3624
  • [38] Sequence-to-sequence Modelling for Categorical Speech Emotion Recognition Using Recurrent Neural Network
    Chen, Xiaomin
    Han, Wenjing
    Ruan, Huabin
    Liu, Jiamu
    Li, Haifeng
    Jiang, Dongmei
    2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [39] Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR
    Weninger, Felix
    Andres-Ferrer, Jesus
    Li, Xinwei
    Zhan, Puming
    INTERSPEECH 2019, 2019, : 3805 - 3809
  • [40] Seq2SeqPy: A Lightweight and Customizable Toolkit for Neural Sequence-to-Sequence Modeling
    Qader, Raheel
    Portet, Francois
    Labbe, Cyril
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 7140 - 7144