RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition

被引：0

作者：

Zhou, Wei ^{[1
,2
]}

Beck, Eugen ^{[1
,2
]}

Berger, Simon ^{[1
,2
]}

Schlueter, Ralf ^{[1
,2
]}

Ney, Hermann ^{[1
,2
]}

机构：

[1] Rhein Westfal TH Aachen, Dept Comp Sci, Machine Learning & Human Language Technol, D-52074 Aachen, Germany

[2] AppTek GmbH, D-52062 Aachen, Germany

来源：

INTERSPEECH 2023 | 2023年

关键词：

speech recognition; toolkit; sequence-to-sequence; decoder; beam search; RASR;

D O I：

10.21437/Interspeech.2023-1062

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Modern public ASR tools usually provide rich support for training various sequence-to-sequence (S2S) models, but rather simple support for decoding open-vocabulary scenarios only. For closed-vocabulary scenarios, public tools supporting lexical-constrained decoding are usually only for classical ASR, or do not support all S2S models. To eliminate this restriction on research possibilities such as modeling unit choice, we present RASR2 in this work, a research-oriented generic S2S decoder implemented in C++. It offers a strong flexibility/compatibility for various S2S models, language models, label units/topologies and neural network architectures. It provides efficient decoding for both open- and closed-vocabulary scenarios based on a generalized search framework with rich support for different search modes and settings. We evaluate RASR2 with a wide range of experiments on both switchboard and Librispeech corpora. Our source code is public online.

引用

页码：4094 / 4098

页数：5

共 50 条

[31] A Comparison of Modeling Units in Sequence-to-Sequence Speech Recognition with the Transformer on Mandarin Chinese
Zhou, Shiyu
Dong, Linhao
Xu, Shuang
Xu, Bo
NEURAL INFORMATION PROCESSING (ICONIP 2018), PT V, 2018, 11305 : 210 - 220
[32] SEQUENCE-TO-SEQUENCE AUTOMATIC SPEECH RECOGNITION WITH WORD EMBEDDING REGULARIZATION AND FUSED DECODING
Liu, Alexander H.
Sung, Tzu-Wei
Chuang, Shun-Po
Lee, Hung-yi
Lee, Lin-shah
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7879 - 7883
[33] Sequence-to-Sequence Models for Emphasis Speech Translation
Quoc Truong Do
Sakti, Sakriani
Nakamura, Satoshi
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (10) : 1873 - 1883
[34] Direct speech-to-speech translation with a sequence-to-sequence model
Jia, Ye
Weiss, Ron J.
Biadsy, Fadi
Macherey, Wolfgang
Johnson, Melvin
Chen, Zhifeng
Wu, Yonghui
INTERSPEECH 2019, 2019, : 1123 - 1127
[35] Semi-supervised Training for Sequence-to-Sequence Speech Recognition Using Reinforcement Learning
Chung, Hoon
Jeon, Hyeong-Bae
Park, Jeon Gue
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[36] Automatic Pronunciation Generator for Indonesian Speech Recognition System Based on Sequence-to-Sequence Model
Hoesen, Devin
Putri, Fanda Yuliana
Lestari, Dessi Puji
2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2019, : 7 - 12
[37] Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection
Liu, Danni
Spanakis, Gerasimos
Niehues, Jan
INTERSPEECH 2020, 2020, : 3620 - 3624
[38] Sequence-to-sequence Modelling for Categorical Speech Emotion Recognition Using Recurrent Neural Network
Chen, Xiaomin
Han, Wenjing
Ruan, Huabin
Liu, Jiamu
Li, Haifeng
Jiang, Dongmei
2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
[39] Listen, Attend, Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR
Weninger, Felix
Andres-Ferrer, Jesus
Li, Xinwei
Zhan, Puming
INTERSPEECH 2019, 2019, : 3805 - 3809
[40] Seq2SeqPy: A Lightweight and Customizable Toolkit for Neural Sequence-to-Sequence Modeling
Qader, Raheel
Portet, Francois
Labbe, Cyril
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 7140 - 7144

← 1 2 3 4 5 →