Align-Denoise: Single-Pass Non-Autoregressive Speech Recognition

被引:5
作者
Chen, Nanxin [1 ]
Zelasko, Piotr [1 ,2 ]
Moro-Velazquez, Laureano [1 ]
Villalba, Jesus [1 ,2 ]
Dehak, Najim [1 ,2 ]
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[2] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
来源
INTERSPEECH 2021 | 2021年
关键词
speech recognition; non-autoregressive model; deep learning; denoising autoencoder; iterative refinement; TRANSFORMER;
D O I
10.21437/Interspeech.2021-1906
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Deep autoregressive models start to become comparable or superior to the conventional systems for automatic speech recognition. However, for the inference computation, they still suffer from inference speed issue due to their token-by-token decoding characteristic. Non-autoregressive models greatly improve decoding speed by supporting decoding within a constant number of iterations. For example, Align-Refine was proposed to improve the performance of the non-autoregressive system by refining the alignment iteratively. In this work, we propose a new perspective to connect Align-Refine and denoising autoencoder. We introduce a novel noisy distribution to sample the alignment directly instead of obtaining it from the decoder output. The experimental results reveal that the proposed Align-Denoise speeds up both training and inference with performance improvement up to 5% relatively using single-pass decoding.
引用
收藏
页码:3770 / 3774
页数:5
相关论文
共 31 条
  • [31] ESPnet: End-to-End Speech Processing Toolkit
    Watanabe, Shinji
    Hori, Takaaki
    Karita, Shigeki
    Hayashi, Tomoki
    Nishitoba, Jiro
    Unno, Yuya
    Soplin, Nelson Enrique Yalta
    Heymann, Jahn
    Wiesner, Mattew
    Chen, Nanxin
    Renduchintala, Adithya
    Ochiai, Tsubasa
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2207 - 2211