End-to-End Neural Segmental Models for Speech Recognition

被引：11

作者：

Tang, Hao ^{[1
]}

Lu, Liang ^{[1
]}

Kong, Lingpeng ^{[2
]}

Gimpel, Kevin ^{[1
]}

Livescu, Karen ^{[1
]}

Dyer, Chris ^{[2
,3
]}

Smith, Noah A. ^{[4
]}

Renals, Steve ^{[5
]}

机构：

[1] Toyota Technol Inst Chicago, Chicago, IL 60637 USA

[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

[3] Google DeepMind, London N1C 4AG, England

[4] Univ Washington, Seattle, WA 98195 USA

[5] Univ Edinburgh, Edinburgh EH8 9AB, Midlothian, Scotland

来源：

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING | 2017年 / 11卷 / 08期

基金：

英国工程与自然科学研究理事会;

关键词：

Connectionist temporal classification; end-to-end training; multitask training; segmental models; LANGUAGE;

D O I：

10.1109/JSTSP.2017.2752462

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time. Neural segmental models are segmental models that use neural network-based weight functions. Neural segmental models have achieved competitive results for speech recognition, and their end-to-end training has been explored in several studies. In this work, we review neural segmental models, which can be viewed as consisting of a neural network-based acoustic encoder and a finite-state transducer decoder. We study end-to-end segmental models with different weight functions, including ones based on frame-level neural classifiers and on segmental recurrent neural networks. We study how reducing the search space size impacts performance under different weight functions. We also compare several loss functions for end-to-end training. Finally, we explore training approaches, including multistage versus end-to-end training and multitask training that combines segmental and frame-level losses.

引用

页码：1254 / 1264

页数：11

共 50 条

[41] An End-to-End model for Vietnamese speech recognition
Van Huy Nguyen
2019 IEEE - RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF), 2019, : 307 - 312
[42] Review of End-to-End Streaming Speech Recognition
Wang, Aohui
Zhang, Long
Song, Wenyu
Meng, Jie
Computer Engineering and Applications, 2024, 59 (02) : 22 - 33
[43] End-to-End Speech Recognition For Arabic Dialects
Nasr, Seham
Duwairi, Rehab
Quwaider, Muhannad
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (08) : 10617 - 10633
[44] End-to-End Speech Recognition and Disfluency Removal
Lou, Paria Jamshid
Johnson, Mark
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2051 - 2061
[45] Performance Monitoring for End-to-End Speech Recognition
Li, Ruizhi
Sell, Gregory
Hermansky, Hynek
INTERSPEECH 2019, 2019, : 2245 - 2249
[46] TOWARDS END-TO-END UNSUPERVISED SPEECH RECOGNITION
Liu, Alexander H.
Hsu, Wei-Ning
Auli, Michael
Baevski, Alexei
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 221 - 228
[47] TRIGGERED ATTENTION FOR END-TO-END SPEECH RECOGNITION
Moritz, Niko
Hori, Takaaki
Le Roux, Jonathan
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5666 - 5670
[48] An Overview of End-to-End Automatic Speech Recognition
Wang, Dong
Wang, Xiaodong
Lv, Shaohe
SYMMETRY-BASEL, 2019, 11 (08):
[49] End-to-End Speech Recognition in Agglutinative Languages
Mamyrbayev, Orken
Alimhan, Keylan
Zhumazhanov, Bagashar
Turdalykyzy, Tolganay
Gusmanova, Farida
INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT II, 2020, 12034 : 391 - 401
[50] End-to-end Korean Digits Speech Recognition
Roh, Jong-hyuk
Cho, Kwantae
Kim, Youngsam
Cho, Sangrae
2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1137 - 1139

← 1 2 3 4 5 →