SELF-ATTENTION NETWORKS FOR CONNECTIONIST TEMPORAL CLASSIFICATION IN SPEECH RECOGNITION

被引:0
|
作者
Salazar, Julian [1 ]
Kirchhoff, Katrin [1 ,2 ]
Huang, Zhiheng [1 ]
机构
[1] Amazon AI, Seattle, WA 98109 USA
[2] Univ Washington, Seattle, WA 98195 USA
关键词
speech recognition; connectionist temporal classification; self-attention; multi-head attention; end-to-end;
D O I
10.1109/icassp.2019.8682539
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The success of self-attention in NLP has led to recent applications in end-to-end encoder-decoder architectures for speech recognition. Separately, connectionist temporal classification (CTC) has matured as an alignment-free, non-autoregressive approach to sequence transduction, either by itself or in various multitask and decoding frameworks. We propose SAN-CTC, a deep, fully self-attentional network for CTC, and show it is tractable and competitive for end-to-end speech recognition. SAN-CTC trains quickly and outperforms existing CTC models and most encoder-decoder models, with character error rates (CERs) of 4.7% in 1 day on WSJ eval92 and 2.8% in 1 week on LibriSpeech test-clean, with a fixed architecture and one GPU. Similar improvements hold for WERs after LM decoding. We motivate the architecture for speech, evaluate position and down-sampling approaches, and explore how label alphabets (character, phoneme, subword) affect attention heads and performance.
引用
收藏
页码:7115 / 7119
页数:5
相关论文
共 50 条
  • [1] Self-attention transfer networks for speech emotion recognition
    Ziping ZHAO
    Keru Wang
    Zhongtian BAO
    Zixing ZHANG
    Nicholas CUMMINS
    Shihuang SUN
    Haishuai WANG
    Jianhua TAO
    Bj?rn W.SCHULLER
    虚拟现实与智能硬件(中英文), 2021, 3 (01) : 43 - 54
  • [2] NEPALI SPEECH RECOGNITION USING SELF-ATTENTION NETWORKS
    Joshi, Basanta
    Shrestha, Rupesh
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2023, 19 (06): : 1769 - 1784
  • [3] Attention-enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition
    Zhao, Ziping
    Bao, Zhongtian
    Zhang, Zixing
    Cummins, Nicholas
    Wang, Haishuai
    Schuller, Bjorn W.
    INTERSPEECH 2019, 2019, : 206 - 210
  • [4] Self-attention for Speech Emotion Recognition
    Tarantino, Lorenzo
    Garner, Philip N.
    Lazaridis, Alexandros
    INTERSPEECH 2019, 2019, : 2578 - 2582
  • [5] Very Deep Self-Attention Networks for End-to-End Speech Recognition
    Ngoc-Quan Pham
    Thai-Son Nguyen
    Niehues, Jan
    Mueller, Markus
    Waibel, Alex
    INTERSPEECH 2019, 2019, : 66 - 70
  • [6] Speech emotion recognition using recurrent neural networks with directional self-attention
    Li, Dongdong
    Liu, Jinlin
    Yang, Zhuo
    Sun, Linyu
    Wang, Zhe
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 173
  • [7] Combining Gated Convolutional Networks and Self-Attention Mechanism for Speech Emotion Recognition
    Li, Chao
    Jiao, Jinlong
    Zhao, Yiqin
    Zhao, Ziping
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2019, : 105 - 109
  • [8] Tigrinya End-to-End Speech Recognition: A Hybrid Connectionist Temporal Classification-Attention Approach
    Ghebregiorgis, Bereket Desbele
    Tekle, Yonatan Yosef
    Kidane, Mebrahtu Fisshaye
    Keleta, Mussie Kaleab
    Ghebraeb, Rutta Fissehatsion
    Gebretatios, Daniel Tesfai
    PAN-AFRICAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PT I, PANAFRICON AI 2023, 2024, 2068 : 221 - 236
  • [9] Multilingual Speech Recognition with Self-Attention Structured Parameterization
    Zhu, Yun
    Haghani, Parisa
    Tripathi, Anshuman
    Ramabhadran, Bhuvana
    Farris, Brian
    Xu, Hainan
    Lu, Han
    Sak, Hasim
    Leal, Isabel
    Gaur, Neeraj
    Moreno, Pedro J.
    Zhang, Qian
    INTERSPEECH 2020, 2020, : 4741 - 4745
  • [10] ON THE USEFULNESS OF SELF-ATTENTION FOR AUTOMATIC SPEECH RECOGNITION WITH TRANSFORMERS
    Zhang, Shucong
    Loweimi, Erfan
    Bell, Peter
    Renals, Steve
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 89 - 96