DEEP LEARNING OF SPLIT TEMPORAL CONTEXT FOR AUTOMATIC SPEECH RECOGNITION

被引：0

作者：

Baccouche, Moez ^{[1
]}

Besset, Benoit ^{[1
]}

Collen, Patrice ^{[1
]}

Le Blouch, Olivier ^{[1
]}

机构：

[1] France Telecom, Orange Labs, F-35510 Cesson Sevigne, France

来源：

2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年

关键词：

Speech recognition; neural networks; deep learning; split temporal context;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper follows the recent advances in speech recognition which recommend replacing the standard hybrid GMM/HMM approach by deep neural architectures. These models were shown to drastically improve recognition performances, due to their ability to capture the underlying structure of data. However, they remain particularly complex since the entire temporal context of a given phoneme is learned with a single model, which must therefore have a very large number of trainable weights. This work proposes an alternative solution that splits the temporal context into blocks, each learned with a separate deep model. We demonstrate that this approach significantly reduces the number of parameters compared to the classical deep learning procedure, and obtains better results on the TIMIT dataset, among the best of state-of-the-art (with a 20.20% PER). We also show that our approach is able to assimilate data of different nature, ranging from wide to narrow bandwidth signals.

引用

页数：5

共 50 条

[41] Increasing Context for Estimating Confidence Scores in Automatic Speech Recognition
Ragni, Anton
Gales, Mark J. F.
Rose, Oliver
Knill, Katherine M.
Kastanos, Alexandros
Li, Qiujia
Ness, Preben M.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1319 - 1329
[42] Introducing Temporal Asymmetries in Feature Extraction for Automatic Speech Recognition
Sivaram, G. S. V. S.
Hermansky, Hynek
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 890 - 893
[43] DISCRIMINATIVE PIECEWISE LINEAR TRANSFORMATION BASED ON DEEP LEARNING FOR NOISE ROBUST AUTOMATIC SPEECH RECOGNITION
Kashiwagi, Yosuke
Saito, Daisuke
Minematsu, Nobuaki
Hirose, Keikichi
2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 350 - 355
[44] Novel Automatic Bank Check Recognition Based on Deep Learning
Lamssaoui, Siham
Benaboud, Hafssa
PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON NETWORKING, INFORMATION SYSTEMS & SECURITY (NISS19), 2019,
[45] Customized deep learning based Turkish automatic speech recognition system supported by language model
Gormez, Yasin
PEERJ COMPUTER SCIENCE, 2024, 10
[46] EFFICIENT DEEP LEARNING FOR PATHOLOGICAL SPEECH RECOGNITION
Pham, Tuan D.
2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 103 - 104
[47] Persian speech recognition using deep learning
Veisi, Hadi
Haji Mani, Armita
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (04) : 893 - 905
[48] Arabic Speech Recognition with Deep Learning: A Review
Algihab, Wajdan
Alawwad, Noura
Aldawish, Anfal
AlHumoud, Sarah
SOCIAL COMPUTING AND SOCIAL MEDIA: DESIGN, HUMAN BEHAVIOR AND ANALYTICS, SCSM 2019, PT I, 2019, 11578 : 15 - 31
[49] Speech Emotion Recognition Using Deep Learning
Alagusundari, N.
Anuradha, R.
ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, VOL 1, AITA 2023, 2024, 843 : 313 - 325
[50] Speech Emotion Recognition Using Deep Learning
Ahmed, Waqar
Riaz, Sana
Iftikhar, Khunsa
Konur, Savas
ARTIFICIAL INTELLIGENCE XL, AI 2023, 2023, 14381 : 191 - 197

← 1 2 3 4 5 →