Align-Denoise: Single-Pass Non-Autoregressive Speech Recognition

被引：5

作者：

Chen, Nanxin ^{[1
]}

Zelasko, Piotr ^{[1
,2
]}

Moro-Velazquez, Laureano ^{[1
]}

Villalba, Jesus ^{[1
,2
]}

Dehak, Najim ^{[1
,2
]}

机构：

[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA

[2] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA

来源：

INTERSPEECH 2021 | 2021年

关键词：

speech recognition; non-autoregressive model; deep learning; denoising autoencoder; iterative refinement; TRANSFORMER;

D O I：

10.21437/Interspeech.2021-1906

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Deep autoregressive models start to become comparable or superior to the conventional systems for automatic speech recognition. However, for the inference computation, they still suffer from inference speed issue due to their token-by-token decoding characteristic. Non-autoregressive models greatly improve decoding speed by supporting decoding within a constant number of iterations. For example, Align-Refine was proposed to improve the performance of the non-autoregressive system by refining the alignment iteratively. In this work, we propose a new perspective to connect Align-Refine and denoising autoencoder. We introduce a novel noisy distribution to sample the alignment directly instead of obtaining it from the decoder output. The experimental results reveal that the proposed Align-Denoise speeds up both training and inference with performance improvement up to 5% relatively using single-pass decoding.

引用

页码：3770 / 3774

页数：5

共 31 条

[1] [Anonymous], 2005, JMLR, DOI [10.1186/1471-2202-11-118, DOI 10.1186/1471-2202-6-12]
[2] Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition
Bai, Ye
Yi, Jiangyan
Tao, Jianhua
Tian, Zhengkun
Wen, Zhengqi
Zhang, Shuai
[J]. INTERSPEECH 2020, 2020, : 3381 - 3385
[3] Bai Ye, 2021, ARXIV210207594
[4] Baum LE, 1972, Inequalities, V3, P1
[5] Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621
[6] Chan William, 2020, ARXIV200208926
[7] Chen N., 2021, INT C LEARN REPR
[8] Non-Autoregressive Transformer for Speech Recognition
Chen, Nanxin
Watanabe, Shinji
Villalba, Jesus
Zelasko, Piotr
Dehak, Najim
[J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 121 - 125
[9] Chi E. A., 2020, ARXIV201014233
[10] Chorowski J, 2015, ADV NEUR IN, V28

← 1 2 3 4 →