Noise-robust Attention Learning for End-to-End Speech Recognition

被引:0
|
作者
Higuchi, Yosuke [1 ]
Tawara, Naohiro [2 ]
Ogawa, Atsunori [2 ]
Iwata, Tomoharu [2 ]
Kobayashi, Tetsunori [1 ]
Ogawa, Tetsuji [1 ]
机构
[1] Waseda Univ, Dept Commun & Comp Engn, Tokyo, Japan
[2] NTT Corp, NTT Commun Sci Labs, Kyoto, Japan
关键词
Attention mechanism; noise robustness; speech recognition; deep neural networks; DEEP NEURAL-NETWORKS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a method for improving the noise robustness of an end-to-end automatic speech recognition (ASR) model using attention weights. Several studies have adopted a combination of recurrent neural networks and attention mechanisms to achieve direct speech-to-text translation. In the real-world environment, however, noisy conditions make it difficult for the attention mechanisms to estimate the accurate alignment between the input speech frames and output characters, leading to the degradation of the recognition performance of the end-to-end model. In this work, we propose noise-robust attention learning (NRAL) which explicitly tells the attention mechanism where to "listen at" in a sequence of noisy speech features. Specifically, we train the attention weights estimated from a noisy speech to approximate the weights estimated from a clean speech. The experimental results based on the CHiME-4 task indicate that the proposed NRAL approach effectively improves the noise robustness of the end-to-end ASR model.
引用
收藏
页码:311 / 315
页数:5
相关论文
共 50 条
  • [31] End-to-End Speech Recognition with Auditory Attention for Multi-Microphone Distance Speech Recognition
    Kim, Suyoun
    Lane, Ian
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3867 - 3871
  • [32] Towards Efficiently Learning Monotonic Alignments for Attention-Based End-to-End Speech Recognition
    Miao, Chenfeng
    Zou, Kun
    Zhuang, Ziyang
    Wei, Tao
    Ma, Jun
    Wang, Shaojun
    Xiao, Jing
    INTERSPEECH 2022, 2022, : 1051 - 1055
  • [33] Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning
    Li, Yuanchao
    Zhao, Tianyu
    Kawahara, Tatsuya
    INTERSPEECH 2019, 2019, : 2803 - 2807
  • [34] Selective Adaptation of End-to-End Speech Recognition using Hybrid CTC/Attention Architecture for Noise Robustness
    Cong-Thanh Do
    Zhang, Shucong
    Hain, Thomas
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 321 - 325
  • [35] END-TO-END ATTENTION-BASED LARGE VOCABULARY SPEECH RECOGNITION
    Bandanau, Dzmitry
    Chorowski, Jan
    Serdyuk, Dmitriy
    Brakel, Philemon
    Bengio, Yoshua
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 4945 - 4949
  • [36] Gaussian Prediction based Attention for Online End-to-End Speech Recognition
    Hou, Junfeng
    Zhang, Shiliang
    Dai, Lirong
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3692 - 3696
  • [37] Online Hybrid CTC/Attention Architecture for End-to-end Speech Recognition
    Miao, Haoran
    Cheng, Gaofeng
    Zhang, Pengyuan
    Li, Ta
    Yan, Yonghong
    INTERSPEECH 2019, 2019, : 2623 - 2627
  • [38] Speaker Adaptation for Attention-Based End-to-End Speech Recognition
    Meng, Zhong
    Gaur, Yashesh
    Li, Jinyu
    Gong, Yifan
    INTERSPEECH 2019, 2019, : 241 - 245
  • [39] Large Margin Training for Attention Based End-to-End Speech Recognition
    Wang, Peidong
    Cui, Jia
    Weng, Chao
    Yu, Dong
    INTERSPEECH 2019, 2019, : 246 - 250
  • [40] MODALITY ATTENTION FOR END-TO-END AUDIO-VISUAL SPEECH RECOGNITION
    Zhou, Pan
    Yang, Wenwen
    Chen, Wei
    Wang, Yanfeng
    Jia, Jia
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6565 - 6569