End-to-End Neural Segmental Models for Speech Recognition

被引:11
|
作者
Tang, Hao [1 ]
Lu, Liang [1 ]
Kong, Lingpeng [2 ]
Gimpel, Kevin [1 ]
Livescu, Karen [1 ]
Dyer, Chris [2 ,3 ]
Smith, Noah A. [4 ]
Renals, Steve [5 ]
机构
[1] Toyota Technol Inst Chicago, Chicago, IL 60637 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[3] Google DeepMind, London N1C 4AG, England
[4] Univ Washington, Seattle, WA 98195 USA
[5] Univ Edinburgh, Edinburgh EH8 9AB, Midlothian, Scotland
基金
英国工程与自然科学研究理事会;
关键词
Connectionist temporal classification; end-to-end training; multitask training; segmental models; LANGUAGE;
D O I
10.1109/JSTSP.2017.2752462
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time. Neural segmental models are segmental models that use neural network-based weight functions. Neural segmental models have achieved competitive results for speech recognition, and their end-to-end training has been explored in several studies. In this work, we review neural segmental models, which can be viewed as consisting of a neural network-based acoustic encoder and a finite-state transducer decoder. We study end-to-end segmental models with different weight functions, including ones based on frame-level neural classifiers and on segmental recurrent neural networks. We study how reducing the search space size impacts performance under different weight functions. We also compare several loss functions for end-to-end training. Finally, we explore training approaches, including multistage versus end-to-end training and multitask training that combines segmental and frame-level losses.
引用
收藏
页码:1254 / 1264
页数:11
相关论文
共 50 条
  • [1] Segmental Recurrent Neural Networks for End-to-end Speech Recognition
    Lu, Liang
    Kong, Lingpeng
    Dyer, Chris
    Smith, Noah A.
    Renals, Steve
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 385 - 389
  • [2] End-To-End deep neural models for Automatic Speech Recognition for Polish Language
    Pondel-Sycz, Karolina
    Pietrzak, Agnieszka Paula
    Szymla, Julia
    INTERNATIONAL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2024, 70 (02) : 315 - 321
  • [3] Insights on Neural Representations for End-to-End Speech Recognition
    Ollerenshaw, Anna
    Jalal, Asif
    Hain, Thomas
    INTERSPEECH 2021, 2021, : 4079 - 4083
  • [4] EXPLORING NEURAL TRANSDUCERS FOR END-TO-END SPEECH RECOGNITION
    Battenberg, Eric
    Chen, Jitong
    Child, Rewon
    Coates, Adam
    Gaur, Yashesh
    Li, Yi
    Liu, Hairong
    Satheesh, Sanjeev
    Sriram, Anuroop
    Zhu, Zhenyao
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 206 - 213
  • [5] Combination of end-to-end and hybrid models for speech recognition
    Wong, Jeremy H. M.
    Gaur, Yashesh
    Zhao, Rui
    Lu, Liang
    Sun, Eric
    Li, Jinyu
    Gong, Yifan
    INTERSPEECH 2020, 2020, : 1783 - 1787
  • [6] AN INVESTIGATION OF END-TO-END MODELS FOR ROBUST SPEECH RECOGNITION
    Prasad, Archiki
    Jyothi, Preethi
    Velmurugan, Rajbabu
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6893 - 6897
  • [7] End-to-End Speech Emotion Recognition Based on Neural Network
    Zhu, Bing
    Zhou, Wenkai
    Wang, Yutian
    Wang, Hui
    Cai, Juan Juan
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT 2017), 2017, : 1634 - 1638
  • [8] Towards End-to-End Speech Recognition with Recurrent Neural Networks
    Graves, Alex
    Jaitly, Navdeep
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 1764 - 1772
  • [9] ESPRESSO: A FAST END-TO-END NEURAL SPEECH RECOGNITION TOOLKIT
    Wang, Yiming
    Chen, Tongfei
    Xu, Hainan
    Ding, Shuoyang
    Lv, Hang
    Shao, Yiwen
    Peng, Nanyun
    Xie, Lei
    Watanabe, Shinji
    Khudanpur, Sanjeev
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 136 - 143
  • [10] Improving End-to-End Models for Children's Speech Recognition
    Patel, Tanvina
    Scharenborg, Odette
    APPLIED SCIENCES-BASEL, 2024, 14 (06):