DEEP LEARNING OF SPLIT TEMPORAL CONTEXT FOR AUTOMATIC SPEECH RECOGNITION

被引:0
作者
Baccouche, Moez [1 ]
Besset, Benoit [1 ]
Collen, Patrice [1 ]
Le Blouch, Olivier [1 ]
机构
[1] France Telecom, Orange Labs, F-35510 Cesson Sevigne, France
来源
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年
关键词
Speech recognition; neural networks; deep learning; split temporal context;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper follows the recent advances in speech recognition which recommend replacing the standard hybrid GMM/HMM approach by deep neural architectures. These models were shown to drastically improve recognition performances, due to their ability to capture the underlying structure of data. However, they remain particularly complex since the entire temporal context of a given phoneme is learned with a single model, which must therefore have a very large number of trainable weights. This work proposes an alternative solution that splits the temporal context into blocks, each learned with a separate deep model. We demonstrate that this approach significantly reduces the number of parameters compared to the classical deep learning procedure, and obtains better results on the TIMIT dataset, among the best of state-of-the-art (with a 20.20% PER). We also show that our approach is able to assimilate data of different nature, ranging from wide to narrow bandwidth signals.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Automatic Speech Recognition with Deep Neural Networks for Impaired Speech
    Espana-Bonet, Cristina
    Fonollosa, Jose A. R.
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 97 - 107
  • [2] DISTRIBUTED DEEP LEARNING STRATEGIES FOR AUTOMATIC SPEECH RECOGNITION
    Zhang, Wei
    Cui, Xiaodong
    Finkler, Ulrich
    Kingsbury, Brian
    Saon, George
    Kung, David
    Picheny, Michael
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5706 - 5710
  • [3] A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition
    Zhang, Wei
    Cui, Xiaodong
    Finkler, Ulrich
    Saon, George
    Kayi, Abdullah
    Buyuktosunoglu, Alper
    Kingsbury, Brian
    Kung, David
    Picheny, Michael
    INTERSPEECH 2019, 2019, : 2628 - 2632
  • [4] Speech Vision: An End-to-End Deep Learning-Based Dysarthric Automatic Speech Recognition System
    Shahamiri, Seyed Reza
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2021, 29 : 852 - 861
  • [5] Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language
    Mukhamadiyev, Abdinabi
    Khujayarov, Ilyos
    Djuraev, Oybek
    Cho, Jinsoo
    SENSORS, 2022, 22 (10)
  • [6] Improving Deep Learning based Automatic Speech Recognition for Gujarati
    Raval, Deepang
    Pathak, Vyom
    Patel, Muktan
    Bhatt, Brijesh
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (03)
  • [7] Automatic Speech Recognition: A survey of deep learning techniques and approaches
    Ahlawat, Harsh
    Aggarwal, Naveen
    Gupta, Deepti
    International Journal of Cognitive Computing in Engineering, 2025, 6 : 201 - 237
  • [8] Study of Deep Learning and CMU Sphinx in Automatic Speech Recognition
    Dhankar, Abhishek
    2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 2296 - 2301
  • [9] Evaluating deep learning architectures for Speech Emotion Recognition
    Fayek, Haytham M.
    Lech, Margaret
    Cavedon, Lawrence
    NEURAL NETWORKS, 2017, 92 : 60 - 68
  • [10] Automatic context window composition for distant speech recognition
    Ravanelli, Mirco
    Omologo, Maurizio
    SPEECH COMMUNICATION, 2018, 101 : 34 - 44