DEEP LEARNING OF SPLIT TEMPORAL CONTEXT FOR AUTOMATIC SPEECH RECOGNITION

被引：0

作者：

Baccouche, Moez ^{[1
]}

Besset, Benoit ^{[1
]}

Collen, Patrice ^{[1
]}

Le Blouch, Olivier ^{[1
]}

机构：

[1] France Telecom, Orange Labs, F-35510 Cesson Sevigne, France

来源：

2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年

关键词：

Speech recognition; neural networks; deep learning; split temporal context;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper follows the recent advances in speech recognition which recommend replacing the standard hybrid GMM/HMM approach by deep neural architectures. These models were shown to drastically improve recognition performances, due to their ability to capture the underlying structure of data. However, they remain particularly complex since the entire temporal context of a given phoneme is learned with a single model, which must therefore have a very large number of trainable weights. This work proposes an alternative solution that splits the temporal context into blocks, each learned with a separate deep model. We demonstrate that this approach significantly reduces the number of parameters compared to the classical deep learning procedure, and obtains better results on the TIMIT dataset, among the best of state-of-the-art (with a 20.20% PER). We also show that our approach is able to assimilate data of different nature, ranging from wide to narrow bandwidth signals.

引用

页数：5

共 50 条

[21] Emotion Recognition from Human Speech Using Temporal Information and Deep Learning
Kim, John W.
Saurous, Rif A.
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 937 - 940
[22] HindiSpeech-Net: a deep learning based robust automatic speech recognition system for Hindi language
Sharma, Usha
Om, Hari
Mishra, A. N.
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (11) : 16173 - 16193
[23] On the Correlation and Transferability of Features between Automatic Speech Recognition and Speech Emotion Recognition
Fayek, Haytham M.
Lech, Margaret
Cavedon, Lawrence
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3618 - 3622
[24] HindiSpeech-Net: a deep learning based robust automatic speech recognition system for Hindi language
Usha Sharma
Hari Om
A. N. Mishra
Multimedia Tools and Applications, 2023, 82 : 16173 - 16193
[25] On Comparison of Deep Learning Architectures for Distant Speech Recognition
Sustika, Rika
Yuliani, Asri R.
Zaenudin, Efendi
Pardede, Hilman F.
2017 2ND INTERNATIONAL CONFERENCES ON INFORMATION TECHNOLOGY, INFORMATION SYSTEMS AND ELECTRICAL ENGINEERING (ICITISEE): OPPORTUNITIES AND CHALLENGES ON BIG DATA FUTURE INNOVATION, 2017, : 17 - 21
[26] Lightweight Deep Learning Framework for Speech Emotion Recognition
Akinpelu, Samson
Viriri, Serestina
Adegun, Adekanmi
IEEE ACCESS, 2023, 11 : 77086 - 77098
[27] Deep Learning of Speech Features for Improved Phonetic Recognition
Lee, Jaehyung
Lee, Soo-Young
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1256 - 1259
[28] A temporal auditory model with adaptation for automatic speech recognition
Haque, Serajul
Togneri, Roberto
Zaknich, Anthony
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1141 - +
[29] SAYS WHO? DEEP LEARNING MODELS FOR JOINT SPEECH RECOGNITION, SEGMENTATION AND DIARIZATION
Sarkar, Amitrajit
Dasgupta, Surajit
Naskar, Sudip Kumar
Bandyopadhyay, Sivaji
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5229 - 5233
[30] Deep-Learning-Based BCI for Automatic Imagined Speech Recognition Using SPWVD
Kamble, Ashwin
Ghare, Pradnya H.
Kumar, Vinay
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72

← 1 2 3 4 5 →