Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition

被引:34
|
作者
Weng, Chao [1 ]
Cui, Jia [1 ]
Wang, Guangsen [2 ]
Wang, Jun [2 ]
Yu, Changzhu [1 ]
Su, Dan [2 ]
Yu, Dong [1 ]
机构
[1] Tencent AI Lab, Bellevue, WA 98004 USA
[2] Tencent AI Lab, Shenzhen, Peoples R China
来源
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年
关键词
attention based sequence-to-sequence models; end-to-end speech recognition; sequential minimum Bayes risk training; MBR;
D O I
10.21437/Interspeech.2018-1030
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we propose two improvements to attention based sequence-to-sequence models for end-to-end speech recognition systems. For the first improvement, we propose to use an input-feeding architecture which feeds not only the previous context vector but also the previous decoder hidden state information as inputs to the decoder. The second improvement is based on a better hypothesis generation scheme for sequential minimum Bayes risk (MBR) training of sequence-to-sequence models where we introduce softmax smoothing into N-best generation during MBR training. We conduct the experiments on both Switchboard-300hrs and Switchboard+Fisher-2000hrs datasets and observe significant gains from both proposed improvements. Together with other training strategies such as dropout and scheduled sampling, our best model achieved WERs of 8.3%/15.5% on the Switchboard/CallHome subsets of Eval2000 without any external language models which is highly competitive among state-of-the-art English conversational speech recognition systems.
引用
收藏
页码:761 / 765
页数:5
相关论文
共 50 条
  • [1] ATTENTION-BASED END-TO-END SPEECH RECOGNITION ON VOICE SEARCH
    Shan, Changhao
    Zhang, Junbo
    Wang, Yujun
    Xie, Lei
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4764 - 4768
  • [2] IMPROVING END-TO-END SPEECH RECOGNITION WITH POLICY LEARNING
    Zhou, Yingbo
    Xiong, Caiming
    Socher, Richard
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5819 - 5823
  • [3] Residual Energy-Based Models for End-to-End Speech Recognition
    Li, Qiujia
    Zhang, Yu
    Li, Bo
    Cao, Liangliang
    Woodland, Philip C.
    INTERSPEECH 2021, 2021, : 4069 - 4073
  • [4] Multi-channel Attention for End-to-End Speech Recognition
    Braun, Stefan
    Neil, Daniel
    Anumula, Jithendar
    Ceolini, Enea
    Liu, Shih-Chii
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 17 - 21
  • [5] End-to-End Speech Recognition with Auditory Attention for Multi-Microphone Distance Speech Recognition
    Kim, Suyoun
    Lane, Ian
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3867 - 3871
  • [6] END-TO-END SPEECH RECOGNITION WITH WORD-BASED RNN LANGUAGE MODELS
    Hori, Takaaki
    Cho, Jaejin
    Watanabe, Shinji
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 389 - 396
  • [7] TRANSFORMER-BASED ONLINE CTC/ATTENTION END-TO-END SPEECH RECOGNITION ARCHITECTURE
    Miao, Haoran
    Cheng, Gaofeng
    Gao, Changfeng
    Zhang, Pengyuan
    Yan, Yonghong
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6084 - 6088
  • [8] STREAM ATTENTION-BASED MULTI-ARRAY END-TO-END SPEECH RECOGNITION
    Wang, Xiaofei
    Li, Ruizhi
    Mallidi, Sri Harish
    Hori, Takaaki
    Watanabe, Shinji
    Hermansky, Hynek
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7105 - 7109
  • [9] On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition
    Irie, Kazuki
    Prabhavalkar, Rohit
    Kannan, Anjuli
    Bruguier, Antoine
    Rybach, David
    Nguyen, Patrick
    INTERSPEECH 2019, 2019, : 3800 - 3804
  • [10] SEQUENCE TRAINING OF ENCODER-DECODER MODEL USING POLICY GRADIENT FOR END-TO-END SPEECH RECOGNITION
    Karita, Shigeki
    Ogawa, Atsunori
    Delcroix, Marc
    Nakatani, Tomohiro
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5839 - 5843