SELF-ATTENTION NETWORKS FOR CONNECTIONIST TEMPORAL CLASSIFICATION IN SPEECH RECOGNITION

被引：0

作者：

Salazar, Julian ^{[1
]}

Kirchhoff, Katrin ^{[1
,2
]}

Huang, Zhiheng ^{[1
]}

机构：

[1] Amazon AI, Seattle, WA 98109 USA

[2] Univ Washington, Seattle, WA 98195 USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

speech recognition; connectionist temporal classification; self-attention; multi-head attention; end-to-end;

D O I：

10.1109/icassp.2019.8682539

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The success of self-attention in NLP has led to recent applications in end-to-end encoder-decoder architectures for speech recognition. Separately, connectionist temporal classification (CTC) has matured as an alignment-free, non-autoregressive approach to sequence transduction, either by itself or in various multitask and decoding frameworks. We propose SAN-CTC, a deep, fully self-attentional network for CTC, and show it is tractable and competitive for end-to-end speech recognition. SAN-CTC trains quickly and outperforms existing CTC models and most encoder-decoder models, with character error rates (CERs) of 4.7% in 1 day on WSJ eval92 and 2.8% in 1 week on LibriSpeech test-clean, with a fixed architecture and one GPU. Similar improvements hold for WERs after LM decoding. We motivate the architecture for speech, evaluate position and down-sampling approaches, and explore how label alphabets (character, phoneme, subword) affect attention heads and performance.

引用

页码：7115 / 7119

页数：5

共 50 条

[21] GAUSSIAN KERNELIZED SELF-ATTENTION FOR LONG SEQUENCE DATA AND ITS APPLICATION TO CTC-BASED SPEECH RECOGNITION [J].

Kashiwagi, Yosuke ;

Tsunoo, Emiru ;

Watanabe, Shinji .

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :6214-6218

[22] Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition [J].

Gong, Rong ;

Quillen, Carl ;

Sharma, Dushyant ;

Goderre, Andrew ;

Lainez, Jose ;

Milanovic, Ljubomir .

INTERSPEECH 2021, 2021, :3840-3844

[23] Self-Attention Enhanced Recurrent Neural Networks for Sentence Classification [J].

Kumar, Ankit ;

Rastogi , Reshma .

2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, :905-911

[24] SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition [J].

Gao, Zhifu ;

Zhang, Shiliang ;

Lei, Ming ;

McLoughlin, Ian .

INTERSPEECH 2020, 2020, :6-10

[25] Self-Attention Networks For Motion Posture Recognition Based On Data Fusion [J].

Ji, Zhihao ;

Xie, Qiang .

4TH INTERNATIONAL CONFERENCE ON INFORMATICS ENGINEERING AND INFORMATION SCIENCE (ICIEIS2021), 2022, 12161

[26] Self-Attention Networks for Human Activity Recognition Using Wearable Devices [J].

Betancourt, Carlos ;

Chen, Wen-Hui ;

Kuan, Chi-Wei .

2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, :1194-1199

[27] Deformable Self-Attention for Text Classification [J].

Ma, Qianli ;

Yan, Jiangyue ;

Lin, Zhenxi ;

Yu, Liuhong ;

Chen, Zipeng .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :1570-1581

[28] Improving Transformer-based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration [J].

Karita, Shigeki ;

Soplin, Nelson Enrique Yalta ;

Watanabe, Shinji ;

Delcroix, Marc ;

Ogawa, Atsunori ;

Nakatani, Tomohiro .

INTERSPEECH 2019, 2019, :1408-1412

[29] SPEECH DENOISING IN THE WAVEFORM DOMAIN WITH SELF-ATTENTION [J].

Kong, Zhifeng ;

Ping, Wei ;

Dantrey, Ambrish ;

Catanzaro, Bryan .

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :7867-7871

[30] Improving Hybrid CTC/Attention Architecture with Time-Restricted Self-Attention CTC for End-to-End Speech Recognition [J].

Wu, Long ;

Li, Ta ;

Wang, Li ;

Yan, Yonghong .

APPLIED SCIENCES-BASEL, 2019, 9 (21)

← 1 2 3 4 5 →