Experimenting with Hybrid TDNN/HMM Acoustic Models for Russian Speech Recognition

被引：3

作者：

Kipyatkova, Irina ^{[1
,2
]}

机构：

[1] Russian Acad Sci SPIIRAS, St Petersburg Inst Informat & Automat, St Petersburg, Russia

[2] St Petersburg State Univ Aerosp Instrumentat SUAI, St Petersburg, Russia

来源：

SPEECH AND COMPUTER, SPECOM 2017 | 2017年 / 10458卷

基金：

俄罗斯基础研究基金会;

关键词：

Time delay neural networks; Acoustic models; Automatic speech recognition; Russian speech; NEURAL-NETWORKS;

D O I：

10.1007/978-3-319-66429-3_35

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we study an application of time delay neural networks (TDNNs) in acoustic modeling for large vocabulary continuous Russian speech recognition. We created TDNNs with various numbers of hidden layers and units in the hidden layers with p-norm nonlinearity. Training of acoustic models was carried out on our own Russian speech corpus containing phonetically balanced phrases. Duration of the speech corpus is more than 30 h. Testing of TDNN-based acoustic models was performed in the very large vocabulary continuous Russian speech recognition task. Conducted experiments showed that TDNN models outperformed baseline deep neural network models in terms of the word error rate.

引用

页码：362 / 369

页数：8

共 28 条

[1] [Anonymous], 1996, P5084095 STAND, P230
[2] Context adaptive neural network for rapid adaptation of deep CNN based acoustic models
Delcroix, Marc
Kinoshita, Keisuke
Ogawa, Atsunori
Yoshioka, Takuya
Tran, Dung
Nakatani, Tomohiro
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1573 - 1577
[3] Deep learning: from speech recognition to language and multimodal processing
Deng, Li
[J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2016, 5
[4] Gapochkin, 2014, SCI TIME, V1, P29
[5] Geiger JT, 2014, INTERSPEECH, P631
[6] Deep Neural Networks for Acoustic Modeling in Speech Recognition
Hinton, Geoffrey
Deng, Li
Yu, Dong
Dahl, George E.
Mohamed, Abdel-rahman
Jaitly, Navdeep
Senior, Andrew
Vanhoucke, Vincent
Patrick Nguyen
Sainath, Tara N.
Kingsbury, Brian
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
[7] Jokisch O., 2009, P SPECOM 2009
[8] Information enquiry kiosk with multimodal user interface
Karpov A.A.
Ronzhin A.L.
[J]. Pattern Recognition and Image Analysis, 2009, 19 (03) : 546 - 558
[9] Large vocabulary Russian speech recognition using syntactico-statistical language modeling
Karpov, Alexey
Markov, Konstantin
Kipyatkova, Irina
Vazhenina, Dania
Ronzhin, Andrey
[J]. SPEECH COMMUNICATION, 2014, 56 : 213 - 228
[10] Kipyatkova Irina, 2013, Speech and Computer. 15th International Conference, SPECOM 2013, P219, DOI 10.1007/978-3-319-01931-4_29

← 1 2 3 →