End-to-End Neural Segmental Models for Speech Recognition

被引：11

作者：

Tang, Hao ^{[1
]}

Lu, Liang ^{[1
]}

Kong, Lingpeng ^{[2
]}

Gimpel, Kevin ^{[1
]}

Livescu, Karen ^{[1
]}

Dyer, Chris ^{[2
,3
]}

Smith, Noah A. ^{[4
]}

Renals, Steve ^{[5
]}

机构：

[1] Toyota Technol Inst Chicago, Chicago, IL 60637 USA

[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

[3] Google DeepMind, London N1C 4AG, England

[4] Univ Washington, Seattle, WA 98195 USA

[5] Univ Edinburgh, Edinburgh EH8 9AB, Midlothian, Scotland

来源：

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING | 2017年 / 11卷 / 08期

基金：

英国工程与自然科学研究理事会;

关键词：

Connectionist temporal classification; end-to-end training; multitask training; segmental models; LANGUAGE;

D O I：

10.1109/JSTSP.2017.2752462

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time. Neural segmental models are segmental models that use neural network-based weight functions. Neural segmental models have achieved competitive results for speech recognition, and their end-to-end training has been explored in several studies. In this work, we review neural segmental models, which can be viewed as consisting of a neural network-based acoustic encoder and a finite-state transducer decoder. We study end-to-end segmental models with different weight functions, including ones based on frame-level neural classifiers and on segmental recurrent neural networks. We study how reducing the search space size impacts performance under different weight functions. We also compare several loss functions for end-to-end training. Finally, we explore training approaches, including multistage versus end-to-end training and multitask training that combines segmental and frame-level losses.

引用

页码：1254 / 1264

页数：11

共 50 条

[21] END-TO-END AUDIOVISUAL SPEECH RECOGNITION
Petridis, Stavros
Stafylakis, Themos
Ma, Pingchuan
Cai, Feipeng
Tzimiropoulos, Georgios
Pantic, Maja
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6548 - 6552
[22] END-TO-END ANCHORED SPEECH RECOGNITION
Wang, Yiming
Fan, Xing
Chen, I-Fan
Liu, Yuzong
Chen, Tongfei
Hoffmeister, Bjorn
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7090 - 7094
[23] Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
Zhang, Ying
Pezeshki, Mohammad
Brakel, Philemon
Zhang, Saizheng
Laurent, Cesar
Bengio, Yoshua
Courville, Aaron
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 410 - 414
[24] Unified Architecture for Multichannel End-to-End Speech Recognition With Neural Beamforming
Ochiai, Tsubasa
Watanabe, Shinji
Hori, Takaaki
Hershey, John R.
Xiao, Xiong
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1274 - 1288
[25] Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition
Parcollet, Titouan
Zhang, Ying
Morchid, Mohamed
Trabelsi, Chiheb
Linares, Georges
De Mori, Renato
Bengio, Yoshua
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 22 - 26
[26] Unidirectional Neural Network Architectures for End-to-End Automatic Speech Recognition
Moritz, Niko
Hori, Takaaki
Le Roux, Jonathan
INTERSPEECH 2019, 2019, : 76 - 80
[27] A Neural Time Alignment Module for End-to-End Automatic Speech Recognition
Jiang, Dongcheng
Zhang, Chao
Woodland, Philip C.
INTERSPEECH 2023, 2023, : 1374 - 1378
[28] END-TO-END SPEECH EMOTION RECOGNITION USING DEEP NEURAL NETWORKS
Tzirakis, Panagiotis
Zhang, Jiehao
Schuller, Bjoern W.
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5089 - 5093
[29] A COMPARISON OF END-TO-END MODELS FOR LONG-FORM SPEECH RECOGNITION
Chiu, Chung-Cheng
Han, Wei
Zhang, Yu
Pang, Ruoming
Kishchenko, Sergey
Nguyen, Patrick
Narayanan, Arun
Liao, Hank
Zhang, Shuyuan
Kannan, Anjuli
Prabhavalkar, Rohit
Chen, Zhifeng
Sainath, Tara
Wu, Yonghui
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 889 - 896
[30] Confidence-based Ensembles of End-to-End Speech Recognition Models
Gitman, Igor
Lavrukhin, Vitaly
Laptev, Aleksandr
Ginsburg, Boris
INTERSPEECH 2023, 2023, : 1414 - 1418

← 1 2 3 4 5 →