Noise-robust Attention Learning for End-to-End Speech Recognition

被引:0
|
作者
Higuchi, Yosuke [1 ]
Tawara, Naohiro [2 ]
Ogawa, Atsunori [2 ]
Iwata, Tomoharu [2 ]
Kobayashi, Tetsunori [1 ]
Ogawa, Tetsuji [1 ]
机构
[1] Waseda Univ, Dept Commun & Comp Engn, Tokyo, Japan
[2] NTT Corp, NTT Commun Sci Labs, Kyoto, Japan
关键词
Attention mechanism; noise robustness; speech recognition; deep neural networks; DEEP NEURAL-NETWORKS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a method for improving the noise robustness of an end-to-end automatic speech recognition (ASR) model using attention weights. Several studies have adopted a combination of recurrent neural networks and attention mechanisms to achieve direct speech-to-text translation. In the real-world environment, however, noisy conditions make it difficult for the attention mechanisms to estimate the accurate alignment between the input speech frames and output characters, leading to the degradation of the recognition performance of the end-to-end model. In this work, we propose noise-robust attention learning (NRAL) which explicitly tells the attention mechanism where to "listen at" in a sequence of noisy speech features. Specifically, we train the attention weights estimated from a noisy speech to approximate the weights estimated from a clean speech. The experimental results based on the CHiME-4 task indicate that the proposed NRAL approach effectively improves the noise robustness of the end-to-end ASR model.
引用
收藏
页码:311 / 315
页数:5
相关论文
共 50 条
  • [1] Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition
    Hu, Yuchen
    Hou, Nana
    Chen, Chen
    Chng, Eng Siong
    INTERSPEECH 2023, 2023, : 2918 - 2922
  • [2] INTERACTIVE FEATURE FUSION FOR END-TO-END NOISE-ROBUST SPEECH RECOGNITION
    Hu, Yuchen
    Hou, Nana
    Chen, Chen
    Chng, Eng Siong
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6292 - 6296
  • [3] Two-stage deep spectrum fusion for noise-robust end-to-end speech recognition
    Fan, Cunhang
    Ding, Mingming
    Yi, Jiangyan
    Li, Jinpeng
    Lv, Zhao
    APPLIED ACOUSTICS, 2023, 212
  • [4] Noise Robust End-to-End Speech Recognition For Bangla Language
    Sumit, Sakhawat Hosain
    Al Muntasir, Tareq
    Zaman, M. M. Arefin
    Nandi, Rabindra Nath
    Sourov, Tanvir
    2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
  • [5] LEARNING NOISE INVARIANT FEATURES THROUGH TRANSFER LEARNING FOR ROBUST END-TO-END SPEECH RECOGNITION
    Zhang, Shucong
    Do, Cong-Thanh
    Doddipatla, Rama
    Renals, Steve
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7024 - 7028
  • [6] Adversarial Regularization for Attention Based End-to-End Robust Speech Recognition
    Sun, Sining
    Guo, Pengcheng
    Xie, Lei
    Hwang, Mei-Yuh
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1826 - 1838
  • [7] A companding front end for noise-robust automatic speech recognition
    Guinness, J
    Raj, B
    Schmidt-Nielsen, B
    Turicchia, L
    Sarpeshkar, R
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 249 - 252
  • [8] Multi-task Learning for End-to-end Noise-robust Bandwidth Extension
    Hou, Nana
    Xu, Chenglin
    Zhou, Joey Tianyi
    Chng, Eng Siong
    Li, Haizhou
    INTERSPEECH 2020, 2020, : 4069 - 4073
  • [9] TRIGGERED ATTENTION FOR END-TO-END SPEECH RECOGNITION
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5666 - 5670
  • [10] AN INVESTIGATION OF END-TO-END MODELS FOR ROBUST SPEECH RECOGNITION
    Prasad, Archiki
    Jyothi, Preethi
    Velmurugan, Rajbabu
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6893 - 6897