VERY DEEP CONVOLUTIONAL NETWORKS FOR END-TO-END SPEECH RECOGNITION

被引：0

作者：

Zhang, Yu ^{[1
]}

Chan, William ^{[2
]}

Jaitly, Navdeep ^{[3
]}

机构：

[1] MIT, Cambridge, MA 02139 USA

[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

[3] Google Brain, Mountain View, CA USA

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2017年

关键词：

Automatic Speech Recognition; End-to-End Speech Recognition; Very Deep Convolutional Neural Networks;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Sequence-to-sequence models have shown success in end-to-end speech recognition. However these models have only used shallow acoustic encoder networks. In our work, we successively train very deep convolutional networks to add more expressive power and better generalization for end-to-end ASR models. We apply network-in-network principles, batch normalization, residual connections and convolutional LSTMs to build very deep recurrent and convolutional structures. Our models exploit the spectral structure in the feature space and add computational depth without overfitting issues. We experiment with the WSJ ASR task and achieve 10.5% word error rate without any dictionary or language model using a 15 layer deep network.

引用

页码：4845 / 4849

页数：5

共 50 条

[41] End-to-End Speech Recognition and Disfluency Removal
Lou, Paria Jamshid
Johnson, Mark
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2051 - 2061
[42] Performance Monitoring for End-to-End Speech Recognition
Li, Ruizhi
Sell, Gregory
Hermansky, Hynek
INTERSPEECH 2019, 2019, : 2245 - 2249
[43] TOWARDS END-TO-END UNSUPERVISED SPEECH RECOGNITION
Liu, Alexander H.
Hsu, Wei-Ning
Auli, Michael
Baevski, Alexei
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 221 - 228
[44] TRIGGERED ATTENTION FOR END-TO-END SPEECH RECOGNITION
Moritz, Niko
Hori, Takaaki
Le Roux, Jonathan
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5666 - 5670
[45] An Overview of End-to-End Automatic Speech Recognition
Wang, Dong
Wang, Xiaodong
Lv, Shaohe
SYMMETRY-BASEL, 2019, 11 (08):
[46] End-to-End Speech Recognition in Agglutinative Languages
Mamyrbayev, Orken
Alimhan, Keylan
Zhumazhanov, Bagashar
Turdalykyzy, Tolganay
Gusmanova, Farida
INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT II, 2020, 12034 : 391 - 401
[47] End-to-end Korean Digits Speech Recognition
Roh, Jong-hyuk
Cho, Kwantae
Kim, Youngsam
Cho, Sangrae
2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1137 - 1139
[48] Deep End-to-End Representation Learning for Food Type Recognition from Speech
Sertolli, Benjamin
Cummins, Nicholas
Sengur, Abdulkadir
Schuller, Bjorn W.
ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 574 - 578
[49] End-To-End deep neural models for Automatic Speech Recognition for Polish Language
Pondel-Sycz, Karolina
Pietrzak, Agnieszka Paula
Szymla, Julia
INTERNATIONAL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2024, 70 (02) : 315 - 321
[50] An End-to-End Deep Learning Framework with Speech Emotion Recognition of Atypical Individuals
Tang, Dengke
Zeng, Junlin
Li, Ming
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 162 - 166

← 1 2 3 4 5 →