DEEP LEARNING OF SPLIT TEMPORAL CONTEXT FOR AUTOMATIC SPEECH RECOGNITION

被引:0
作者
Baccouche, Moez [1 ]
Besset, Benoit [1 ]
Collen, Patrice [1 ]
Le Blouch, Olivier [1 ]
机构
[1] France Telecom, Orange Labs, F-35510 Cesson Sevigne, France
来源
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年
关键词
Speech recognition; neural networks; deep learning; split temporal context;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper follows the recent advances in speech recognition which recommend replacing the standard hybrid GMM/HMM approach by deep neural architectures. These models were shown to drastically improve recognition performances, due to their ability to capture the underlying structure of data. However, they remain particularly complex since the entire temporal context of a given phoneme is learned with a single model, which must therefore have a very large number of trainable weights. This work proposes an alternative solution that splits the temporal context into blocks, each learned with a separate deep model. We demonstrate that this approach significantly reduces the number of parameters compared to the classical deep learning procedure, and obtains better results on the TIMIT dataset, among the best of state-of-the-art (with a 20.20% PER). We also show that our approach is able to assimilate data of different nature, ranging from wide to narrow bandwidth signals.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] Increasing Context for Estimating Confidence Scores in Automatic Speech Recognition
    Ragni, Anton
    Gales, Mark J. F.
    Rose, Oliver
    Knill, Katherine M.
    Kastanos, Alexandros
    Li, Qiujia
    Ness, Preben M.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1319 - 1329
  • [42] Introducing Temporal Asymmetries in Feature Extraction for Automatic Speech Recognition
    Sivaram, G. S. V. S.
    Hermansky, Hynek
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 890 - 893
  • [43] DISCRIMINATIVE PIECEWISE LINEAR TRANSFORMATION BASED ON DEEP LEARNING FOR NOISE ROBUST AUTOMATIC SPEECH RECOGNITION
    Kashiwagi, Yosuke
    Saito, Daisuke
    Minematsu, Nobuaki
    Hirose, Keikichi
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 350 - 355
  • [44] Novel Automatic Bank Check Recognition Based on Deep Learning
    Lamssaoui, Siham
    Benaboud, Hafssa
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON NETWORKING, INFORMATION SYSTEMS & SECURITY (NISS19), 2019,
  • [45] Customized deep learning based Turkish automatic speech recognition system supported by language model
    Gormez, Yasin
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [46] EFFICIENT DEEP LEARNING FOR PATHOLOGICAL SPEECH RECOGNITION
    Pham, Tuan D.
    2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 103 - 104
  • [47] Persian speech recognition using deep learning
    Veisi, Hadi
    Haji Mani, Armita
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (04) : 893 - 905
  • [48] Arabic Speech Recognition with Deep Learning: A Review
    Algihab, Wajdan
    Alawwad, Noura
    Aldawish, Anfal
    AlHumoud, Sarah
    SOCIAL COMPUTING AND SOCIAL MEDIA: DESIGN, HUMAN BEHAVIOR AND ANALYTICS, SCSM 2019, PT I, 2019, 11578 : 15 - 31
  • [49] Speech Emotion Recognition Using Deep Learning
    Alagusundari, N.
    Anuradha, R.
    ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, VOL 1, AITA 2023, 2024, 843 : 313 - 325
  • [50] Speech Emotion Recognition Using Deep Learning
    Ahmed, Waqar
    Riaz, Sana
    Iftikhar, Khunsa
    Konur, Savas
    ARTIFICIAL INTELLIGENCE XL, AI 2023, 2023, 14381 : 191 - 197