DEEP LEARNING OF SPLIT TEMPORAL CONTEXT FOR AUTOMATIC SPEECH RECOGNITION

被引:0
作者
Baccouche, Moez [1 ]
Besset, Benoit [1 ]
Collen, Patrice [1 ]
Le Blouch, Olivier [1 ]
机构
[1] France Telecom, Orange Labs, F-35510 Cesson Sevigne, France
来源
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年
关键词
Speech recognition; neural networks; deep learning; split temporal context;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper follows the recent advances in speech recognition which recommend replacing the standard hybrid GMM/HMM approach by deep neural architectures. These models were shown to drastically improve recognition performances, due to their ability to capture the underlying structure of data. However, they remain particularly complex since the entire temporal context of a given phoneme is learned with a single model, which must therefore have a very large number of trainable weights. This work proposes an alternative solution that splits the temporal context into blocks, each learned with a separate deep model. We demonstrate that this approach significantly reduces the number of parameters compared to the classical deep learning procedure, and obtains better results on the TIMIT dataset, among the best of state-of-the-art (with a 20.20% PER). We also show that our approach is able to assimilate data of different nature, ranging from wide to narrow bandwidth signals.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Deep and Wide: Multiple Layers in Automatic Speech Recognition
    Morgan, Nelson
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 7 - 13
  • [32] Speech Emotion Recognition Using Deep Learning Techniques: A Review
    Khalil, Ruhul Amin
    Jones, Edward
    Babar, Mohammad Inayatullah
    Jan, Tariqullah
    Zafar, Mohammad Haseeb
    Alhussain, Thamer
    IEEE ACCESS, 2019, 7 : 117327 - 117345
  • [33] Deep Learning Enabled Semantic Communications With Speech Recognition and Synthesis
    Weng, Zhenzi
    Qin, Zhijin
    Tao, Xiaoming
    Pan, Chengkang
    Liu, Guangyi
    Li, Geoffrey Ye
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2023, 22 (09) : 6227 - 6240
  • [34] Audio-visual speech recognition using deep learning
    Noda, Kuniaki
    Yamaguchi, Yuki
    Nakadai, Kazuhiro
    Okuno, Hiroshi G.
    Ogata, Tetsuya
    APPLIED INTELLIGENCE, 2015, 42 (04) : 722 - 737
  • [35] An Acoustic Model For English Speech Recognition Based On Deep Learning
    Ling, Zhang
    2019 11TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2019), 2019, : 610 - 614
  • [36] Deep learning: from speech recognition to language and multimodal processing
    Deng, Li
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2016, 5
  • [37] A deep interpretable representation learning method for speech emotion recognition
    Jing, Erkang
    Liu, Yezheng
    Chai, Yidong
    Sun, Jianshan
    Samtani, Sagar
    Jiang, Yuanchun
    Qian, Yang
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (06)
  • [38] Automatic speech recognition: a survey
    Malik, Mishaim
    Malik, Muhammad Kamran
    Mehmood, Khawar
    Makhdoom, Imran
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 9411 - 9457
  • [39] Recognition of Arabic Accents From English Spoken Speech Using Deep Learning Approach
    Habbash, Mansoor
    Mnasri, Sami
    Alghamdi, Mansoor
    Alrashidi, Malek
    Tarawneh, Ahmad S.
    Gumair, Abdullah
    Hassanat, Ahmad B.
    IEEE ACCESS, 2024, 12 : 37219 - 37230
  • [40] Automatic speech recognition systems
    Catariov, A
    Information Technologies 2004, 2004, 5822 : 83 - 93