Towards end-to-end speech recognition with transfer learning

被引:24
|
作者
Qin, Chu-Xiong [1 ,2 ]
Qu, Dan [1 ]
Zhang, Lian-Hai [1 ]
机构
[1] Natl Digital Switching Syst Engn & Technol R&D Ct, Zhengzhou, Henan, Peoples R China
[2] Xidian Univ, State Key Lab Integrated Serv Networks, Xian, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech recognition; End-to-end; Transfer learning;
D O I
10.1186/s13636-018-0141-9
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A transfer learning-based end-to-end speech recognition approach is presented in two levels in our framework. Firstly, a feature extraction approach combining multilingual deep neural network (DNN) training with matrix factorization algorithm is introduced to extract high-level features. Secondly, the advantage of connectionist temporal classification (CTC) is transferred to the target attention-based model through a joint CTC-attention model composed of shallow recurrent neural networks (RNNs) on top of the proposed features. The experimental results show that the proposed transfer learning approach achieved the best performance among all end-to-end methods and could be comparable to the state-of-the-art speech recognition system for TIMIT when further jointly decoded with a RNN language model.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Towards end-to-end speech recognition with transfer learning
    Chu-Xiong Qin
    Dan Qu
    Lian-Hai Zhang
    EURASIP Journal on Audio, Speech, and Music Processing, 2018
  • [2] Investigation of Transfer Learning for End-to-End Russian Speech Recognition
    Kipyatkova, Irina
    SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 349 - 357
  • [3] TOWARDS END-TO-END UNSUPERVISED SPEECH RECOGNITION
    Liu, Alexander H.
    Hsu, Wei-Ning
    Auli, Michael
    Baevski, Alexei
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 221 - 228
  • [4] Transfer Learning Approaches for Streaming End-to-End Speech Recognition System
    Joshi, Vikas
    Zhao, Rui
    Mehta, Rupesh R.
    Kumar, Kshitiz
    Li, Jinyu
    INTERSPEECH 2020, 2020, : 2152 - 2156
  • [5] IMPROVING UNSUPERVISED STYLE TRANSFER IN END-TO-END SPEECH SYNTHESIS WITH END-TO-END SPEECH RECOGNITION
    Liu, Da-Rong
    Yang, Chi-Yu
    Wu, Szu-Lin
    Lee, Hung-Yi
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 640 - 647
  • [6] Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation
    Wang, Changhan
    Pino, Juan
    Gu, Jiatao
    INTERSPEECH 2020, 2020, : 4731 - 4735
  • [7] INCREMENTAL LEARNING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Fu, Li
    Li, Xiaoxiao
    Zi, Libo
    Zhang, Zhengchen
    Wu, Youzheng
    He, Xiaodong
    Zhou, Bowen
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 320 - 327
  • [8] IMPROVING END-TO-END SPEECH RECOGNITION WITH POLICY LEARNING
    Zhou, Yingbo
    Xiong, Caiming
    Socher, Richard
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5819 - 5823
  • [9] LEARNING NOISE INVARIANT FEATURES THROUGH TRANSFER LEARNING FOR ROBUST END-TO-END SPEECH RECOGNITION
    Zhang, Shucong
    Do, Cong-Thanh
    Doddipatla, Rama
    Renals, Steve
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7024 - 7028
  • [10] TOWARDS LANGUAGE-UNIVERSAL END-TO-END SPEECH RECOGNITION
    Kim, Suyoun
    Seltzer, Michael L.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4914 - 4918