End-to-End Neural Segmental Models for Speech Recognition

被引:11
|
作者
Tang, Hao [1 ]
Lu, Liang [1 ]
Kong, Lingpeng [2 ]
Gimpel, Kevin [1 ]
Livescu, Karen [1 ]
Dyer, Chris [2 ,3 ]
Smith, Noah A. [4 ]
Renals, Steve [5 ]
机构
[1] Toyota Technol Inst Chicago, Chicago, IL 60637 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[3] Google DeepMind, London N1C 4AG, England
[4] Univ Washington, Seattle, WA 98195 USA
[5] Univ Edinburgh, Edinburgh EH8 9AB, Midlothian, Scotland
基金
英国工程与自然科学研究理事会;
关键词
Connectionist temporal classification; end-to-end training; multitask training; segmental models; LANGUAGE;
D O I
10.1109/JSTSP.2017.2752462
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time. Neural segmental models are segmental models that use neural network-based weight functions. Neural segmental models have achieved competitive results for speech recognition, and their end-to-end training has been explored in several studies. In this work, we review neural segmental models, which can be viewed as consisting of a neural network-based acoustic encoder and a finite-state transducer decoder. We study end-to-end segmental models with different weight functions, including ones based on frame-level neural classifiers and on segmental recurrent neural networks. We study how reducing the search space size impacts performance under different weight functions. We also compare several loss functions for end-to-end training. Finally, we explore training approaches, including multistage versus end-to-end training and multitask training that combines segmental and frame-level losses.
引用
收藏
页码:1254 / 1264
页数:11
相关论文
共 50 条
  • [31] Residual Energy-Based Models for End-to-End Speech Recognition
    Li, Qiujia
    Zhang, Yu
    Li, Bo
    Cao, Liangliang
    Woodland, Philip C.
    INTERSPEECH 2021, 2021, : 4069 - 4073
  • [32] Do End-to-End Speech Recognition Models Care About Context?
    Borgholt, Lasse
    Havtorn, Jakob D.
    Agic, Zeljko
    Sogaard, Anders
    Maaloe, Lars
    Igel, Christian
    INTERSPEECH 2020, 2020, : 4352 - 4356
  • [33] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
    Liu, Da-Rong
    Yang, Chi-Yu
    Wu, Szu-Lin
    Lee, Hung-Yi
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
  • [34] On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition
    Li, Jinyu
    Wu, Yu
    Gaur, Yashesh
    Wang, Chengyi
    Zhao, Rui
    Liu, Shujie
    INTERSPEECH 2020, 2020, : 1 - 5
  • [35] END-TO-END TRAINING OF A LARGE VOCABULARY END-TO-END SPEECH RECOGNITION SYSTEM
    Kim, Chanwoo
    Kim, Sungsoo
    Kim, Kwangyoun
    Kumar, Mehul
    Kim, Jiyeon
    Lee, Kyungmin
    Han, Changwoo
    Garg, Abhinav
    Kim, Eunhyang
    Shin, Minkyoo
    Singh, Shatrughan
    Heck, Larry
    Gowda, Dhananjaya
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 562 - 569
  • [36] SYNCHRONOUS TRANSFORMERS FOR END-TO-END SPEECH RECOGNITION
    Tian, Zhengkun
    Yi, Jiangyan
    Bai, Ye
    Tao, Jianhua
    Zhang, Shuai
    Wen, Zhengqi
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7884 - 7888
  • [37] End-to-End Speech Recognition For Arabic Dialects
    Seham Nasr
    Rehab Duwairi
    Muhannad Quwaider
    Arabian Journal for Science and Engineering, 2023, 48 : 10617 - 10633
  • [38] End-to-End Speech Recognition of Tamil Language
    Changrampadi, Mohamed Hashim
    Shahina, A.
    Narayanan, M. Badri
    Khan, A. Nayeemulla
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 32 (02): : 1309 - 1323
  • [39] PARAMETER UNCERTAINTY FOR END-TO-END SPEECH RECOGNITION
    Braun, Stefan
    Liu, Shih-Chii
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5636 - 5640
  • [40] END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS
    Petridis, Stavros
    Li, Zuwei
    Pantic, Maja
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2592 - 2596