RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition

被引:0
|
作者
Zhou, Wei [1 ,2 ]
Beck, Eugen [1 ,2 ]
Berger, Simon [1 ,2 ]
Schlueter, Ralf [1 ,2 ]
Ney, Hermann [1 ,2 ]
机构
[1] Rhein Westfal TH Aachen, Dept Comp Sci, Machine Learning & Human Language Technol, D-52074 Aachen, Germany
[2] AppTek GmbH, D-52062 Aachen, Germany
来源
INTERSPEECH 2023 | 2023年
关键词
speech recognition; toolkit; sequence-to-sequence; decoder; beam search; RASR;
D O I
10.21437/Interspeech.2023-1062
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Modern public ASR tools usually provide rich support for training various sequence-to-sequence (S2S) models, but rather simple support for decoding open-vocabulary scenarios only. For closed-vocabulary scenarios, public tools supporting lexical-constrained decoding are usually only for classical ASR, or do not support all S2S models. To eliminate this restriction on research possibilities such as modeling unit choice, we present RASR2 in this work, a research-oriented generic S2S decoder implemented in C++. It offers a strong flexibility/compatibility for various S2S models, language models, label units/topologies and neural network architectures. It provides efficient decoding for both open- and closed-vocabulary scenarios based on a generalized search framework with rich support for different search modes and settings. We evaluate RASR2 with a wide range of experiments on both switchboard and Librispeech corpora. Our source code is public online.
引用
收藏
页码:4094 / 4098
页数:5
相关论文
共 50 条
  • [41] A Sequence-to-Sequence Framework Based on Transformer With Masked Language Model for Optical Music Recognition
    Wen, Cuihong
    Zhu, Longjiao
    IEEE ACCESS, 2022, 10 : 118243 - 118252
  • [42] SMILE: SEQUENCE-TO-SEQUENCE DOMAIN ADAPTATION WITH MINIMIZING LATENT ENTROPY FOR TEXT IMAGE RECOGNITION
    Chang, Yen-Cheng
    Chen, Yi-Chang
    Chang, Yu-Chuan
    Yeh, Yi-Ren
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 431 - 435
  • [43] TWO-STAGE PRE-TRAINING FOR SEQUENCE TO SEQUENCE SPEECH RECOGNITION
    Fan, Zhiyun
    Zhou, Shiyu
    Xu, Bo
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [44] A UNIFIED SEQUENCE-TO-SEQUENCE FRONT-END MODEL FOR MANDARIN TEXT-TO-SPEECH SYNTHESIS
    Pan, Junjie
    Yin, Xiang
    Zhang, Zhiling
    Liu, Shichao
    Zhang, Yang
    Ma, Zejun
    Wang, Yuxuan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6689 - 6693
  • [45] From Speech to Facial Activity: Towards Cross-modal Sequence-to-Sequence Attention Networks
    Stappen, Lukas
    Karas, Vincent
    Cummins, Nicholas
    Ringeval, Fabien
    Scherer, Klaus
    Schuller, Bjorn
    2019 IEEE 21ST INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP 2019), 2019,
  • [46] Seq2Seq-AFL: Fuzzing via sequence-to-sequence model
    Yang, Liqun
    Wei, Chaoren
    Yang, Jian
    Ma, Jinxin
    Guo, Hongcheng
    Cheng, Long
    Li, Zhoujun
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (10) : 4403 - 4421
  • [47] END TO END SPEECH RECOGNITION ERROR PREDICTION WITH SEQUENCE TO SEQUENCE LEARNING
    Serai, Prashant
    Stiff, Adam
    Fosler-Lussier, Eric
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6339 - 6343
  • [48] LANGUAGE MODEL INTEGRATION BASED ON MEMORY CONTROL FOR SEQUENCE TO SEQUENCE SPEECH RECOGNITION
    Cho, Jaejin
    Watanabe, Shinji
    Hori, Takaaki
    Baskar, Murali Karthick
    Inaguma, Hirofumi
    Villalba, Jesus
    Dehak, Najim
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6191 - 6195
  • [49] Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training
    Zhou, Kun
    Sisman, Berrak
    Li, Haizhou
    INTERSPEECH 2021, 2021, : 811 - 815
  • [50] Sequence-to-Sequence Learning for Deep Gaussian Process Based Speech Synthesis Using Self-Attention GP Layer
    Nakamura, Taiki
    Koriyama, Tomoki
    Saruwatari, Hiroshi
    INTERSPEECH 2021, 2021, : 121 - 125