DEEP LEARNING OF SPLIT TEMPORAL CONTEXT FOR AUTOMATIC SPEECH RECOGNITION

被引：0

作者：

Baccouche, Moez ^{[1
]}

Besset, Benoit ^{[1
]}

Collen, Patrice ^{[1
]}

Le Blouch, Olivier ^{[1
]}

机构：

[1] France Telecom, Orange Labs, F-35510 Cesson Sevigne, France

来源：

2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年

关键词：

Speech recognition; neural networks; deep learning; split temporal context;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper follows the recent advances in speech recognition which recommend replacing the standard hybrid GMM/HMM approach by deep neural architectures. These models were shown to drastically improve recognition performances, due to their ability to capture the underlying structure of data. However, they remain particularly complex since the entire temporal context of a given phoneme is learned with a single model, which must therefore have a very large number of trainable weights. This work proposes an alternative solution that splits the temporal context into blocks, each learned with a separate deep model. We demonstrate that this approach significantly reduces the number of parameters compared to the classical deep learning procedure, and obtains better results on the TIMIT dataset, among the best of state-of-the-art (with a 20.20% PER). We also show that our approach is able to assimilate data of different nature, ranging from wide to narrow bandwidth signals.

引用

页数：5

共 50 条

[31] Deep and Wide: Multiple Layers in Automatic Speech Recognition
Morgan, Nelson
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 7 - 13
[32] Speech Emotion Recognition Using Deep Learning Techniques: A Review
Khalil, Ruhul Amin
Jones, Edward
Babar, Mohammad Inayatullah
Jan, Tariqullah
Zafar, Mohammad Haseeb
Alhussain, Thamer
IEEE ACCESS, 2019, 7 : 117327 - 117345
[33] Deep Learning Enabled Semantic Communications With Speech Recognition and Synthesis
Weng, Zhenzi
Qin, Zhijin
Tao, Xiaoming
Pan, Chengkang
Liu, Guangyi
Li, Geoffrey Ye
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2023, 22 (09) : 6227 - 6240
[34] Audio-visual speech recognition using deep learning
Noda, Kuniaki
Yamaguchi, Yuki
Nakadai, Kazuhiro
Okuno, Hiroshi G.
Ogata, Tetsuya
APPLIED INTELLIGENCE, 2015, 42 (04) : 722 - 737
[35] An Acoustic Model For English Speech Recognition Based On Deep Learning
Ling, Zhang
2019 11TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2019), 2019, : 610 - 614
[36] Deep learning: from speech recognition to language and multimodal processing
Deng, Li
APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2016, 5
[37] A deep interpretable representation learning method for speech emotion recognition
Jing, Erkang
Liu, Yezheng
Chai, Yidong
Sun, Jianshan
Samtani, Sagar
Jiang, Yuanchun
Qian, Yang
INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (06)
[38] Automatic speech recognition: a survey
Malik, Mishaim
Malik, Muhammad Kamran
Mehmood, Khawar
Makhdoom, Imran
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 9411 - 9457
[39] Recognition of Arabic Accents From English Spoken Speech Using Deep Learning Approach
Habbash, Mansoor
Mnasri, Sami
Alghamdi, Mansoor
Alrashidi, Malek
Tarawneh, Ahmad S.
Gumair, Abdullah
Hassanat, Ahmad B.
IEEE ACCESS, 2024, 12 : 37219 - 37230
[40] Automatic speech recognition systems
Catariov, A
Information Technologies 2004, 2004, 5822 : 83 - 93

← 1 2 3 4 5 →