DEEP LEARNING OF SPLIT TEMPORAL CONTEXT FOR AUTOMATIC SPEECH RECOGNITION

被引：0

作者：

Baccouche, Moez ^{[1
]}

Besset, Benoit ^{[1
]}

Collen, Patrice ^{[1
]}

Le Blouch, Olivier ^{[1
]}

机构：

[1] France Telecom, Orange Labs, F-35510 Cesson Sevigne, France

来源：

2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年

关键词：

Speech recognition; neural networks; deep learning; split temporal context;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper follows the recent advances in speech recognition which recommend replacing the standard hybrid GMM/HMM approach by deep neural architectures. These models were shown to drastically improve recognition performances, due to their ability to capture the underlying structure of data. However, they remain particularly complex since the entire temporal context of a given phoneme is learned with a single model, which must therefore have a very large number of trainable weights. This work proposes an alternative solution that splits the temporal context into blocks, each learned with a separate deep model. We demonstrate that this approach significantly reduces the number of parameters compared to the classical deep learning procedure, and obtains better results on the TIMIT dataset, among the best of state-of-the-art (with a 20.20% PER). We also show that our approach is able to assimilate data of different nature, ranging from wide to narrow bandwidth signals.

引用

页数：5

共 50 条

[1] Automatic Speech Recognition with Deep Neural Networks for Impaired Speech
Espana-Bonet, Cristina
Fonollosa, Jose A. R.
ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 97 - 107
[2] DISTRIBUTED DEEP LEARNING STRATEGIES FOR AUTOMATIC SPEECH RECOGNITION
Zhang, Wei
Cui, Xiaodong
Finkler, Ulrich
Kingsbury, Brian
Saon, George
Kung, David
Picheny, Michael
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5706 - 5710
[3] A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition
Zhang, Wei
Cui, Xiaodong
Finkler, Ulrich
Saon, George
Kayi, Abdullah
Buyuktosunoglu, Alper
Kingsbury, Brian
Kung, David
Picheny, Michael
INTERSPEECH 2019, 2019, : 2628 - 2632
[4] Speech Vision: An End-to-End Deep Learning-Based Dysarthric Automatic Speech Recognition System
Shahamiri, Seyed Reza
IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2021, 29 : 852 - 861
[5] Automatic Speech Recognition Method Based on Deep Learning Approaches for Uzbek Language
Mukhamadiyev, Abdinabi
Khujayarov, Ilyos
Djuraev, Oybek
Cho, Jinsoo
SENSORS, 2022, 22 (10)
[6] Improving Deep Learning based Automatic Speech Recognition for Gujarati
Raval, Deepang
Pathak, Vyom
Patel, Muktan
Bhatt, Brijesh
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (03)
[7] Automatic Speech Recognition: A survey of deep learning techniques and approaches
Ahlawat, Harsh
Aggarwal, Naveen
Gupta, Deepti
International Journal of Cognitive Computing in Engineering, 2025, 6 : 201 - 237
[8] Study of Deep Learning and CMU Sphinx in Automatic Speech Recognition
Dhankar, Abhishek
2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 2296 - 2301
[9] Evaluating deep learning architectures for Speech Emotion Recognition
Fayek, Haytham M.
Lech, Margaret
Cavedon, Lawrence
NEURAL NETWORKS, 2017, 92 : 60 - 68
[10] Automatic context window composition for distant speech recognition
Ravanelli, Mirco
Omologo, Maurizio
SPEECH COMMUNICATION, 2018, 101 : 34 - 44

← 1 2 3 4 5 →