Align-Denoise: Single-Pass Non-Autoregressive Speech Recognition

被引:5
作者
Chen, Nanxin [1 ]
Zelasko, Piotr [1 ,2 ]
Moro-Velazquez, Laureano [1 ]
Villalba, Jesus [1 ,2 ]
Dehak, Najim [1 ,2 ]
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[2] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
来源
INTERSPEECH 2021 | 2021年
关键词
speech recognition; non-autoregressive model; deep learning; denoising autoencoder; iterative refinement; TRANSFORMER;
D O I
10.21437/Interspeech.2021-1906
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Deep autoregressive models start to become comparable or superior to the conventional systems for automatic speech recognition. However, for the inference computation, they still suffer from inference speed issue due to their token-by-token decoding characteristic. Non-autoregressive models greatly improve decoding speed by supporting decoding within a constant number of iterations. For example, Align-Refine was proposed to improve the performance of the non-autoregressive system by refining the alignment iteratively. In this work, we propose a new perspective to connect Align-Refine and denoising autoencoder. We introduce a novel noisy distribution to sample the alignment directly instead of obtaining it from the decoder output. The experimental results reveal that the proposed Align-Denoise speeds up both training and inference with performance improvement up to 5% relatively using single-pass decoding.
引用
收藏
页码:3770 / 3774
页数:5
相关论文
共 31 条
  • [1] [Anonymous], 2005, JMLR, DOI [10.1186/1471-2202-11-118, DOI 10.1186/1471-2202-6-12]
  • [2] Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition
    Bai, Ye
    Yi, Jiangyan
    Tao, Jianhua
    Tian, Zhengkun
    Wen, Zhengqi
    Zhang, Shuai
    [J]. INTERSPEECH 2020, 2020, : 3381 - 3385
  • [3] Bai Ye, 2021, ARXIV210207594
  • [4] Baum LE, 1972, Inequalities, V3, P1
  • [5] Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621
  • [6] Chan William, 2020, ARXIV200208926
  • [7] Chen N., 2021, INT C LEARN REPR
  • [8] Non-Autoregressive Transformer for Speech Recognition
    Chen, Nanxin
    Watanabe, Shinji
    Villalba, Jesus
    Zelasko, Piotr
    Dehak, Najim
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 121 - 125
  • [9] Chi E. A., 2020, ARXIV201014233
  • [10] Chorowski J, 2015, ADV NEUR IN, V28