Experimenting with Hybrid TDNN/HMM Acoustic Models for Russian Speech Recognition

被引:3
作者
Kipyatkova, Irina [1 ,2 ]
机构
[1] Russian Acad Sci SPIIRAS, St Petersburg Inst Informat & Automat, St Petersburg, Russia
[2] St Petersburg State Univ Aerosp Instrumentat SUAI, St Petersburg, Russia
来源
SPEECH AND COMPUTER, SPECOM 2017 | 2017年 / 10458卷
基金
俄罗斯基础研究基金会;
关键词
Time delay neural networks; Acoustic models; Automatic speech recognition; Russian speech; NEURAL-NETWORKS;
D O I
10.1007/978-3-319-66429-3_35
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we study an application of time delay neural networks (TDNNs) in acoustic modeling for large vocabulary continuous Russian speech recognition. We created TDNNs with various numbers of hidden layers and units in the hidden layers with p-norm nonlinearity. Training of acoustic models was carried out on our own Russian speech corpus containing phonetically balanced phrases. Duration of the speech corpus is more than 30 h. Testing of TDNN-based acoustic models was performed in the very large vocabulary continuous Russian speech recognition task. Conducted experiments showed that TDNN models outperformed baseline deep neural network models in terms of the word error rate.
引用
收藏
页码:362 / 369
页数:8
相关论文
共 28 条
  • [1] [Anonymous], 1996, P5084095 STAND, P230
  • [2] Context adaptive neural network for rapid adaptation of deep CNN based acoustic models
    Delcroix, Marc
    Kinoshita, Keisuke
    Ogawa, Atsunori
    Yoshioka, Takuya
    Tran, Dung
    Nakatani, Tomohiro
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1573 - 1577
  • [3] Deep learning: from speech recognition to language and multimodal processing
    Deng, Li
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2016, 5
  • [4] Gapochkin, 2014, SCI TIME, V1, P29
  • [5] Geiger JT, 2014, INTERSPEECH, P631
  • [6] Deep Neural Networks for Acoustic Modeling in Speech Recognition
    Hinton, Geoffrey
    Deng, Li
    Yu, Dong
    Dahl, George E.
    Mohamed, Abdel-rahman
    Jaitly, Navdeep
    Senior, Andrew
    Vanhoucke, Vincent
    Patrick Nguyen
    Sainath, Tara N.
    Kingsbury, Brian
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
  • [7] Jokisch O., 2009, P SPECOM 2009
  • [8] Information enquiry kiosk with multimodal user interface
    Karpov A.A.
    Ronzhin A.L.
    [J]. Pattern Recognition and Image Analysis, 2009, 19 (03) : 546 - 558
  • [9] Large vocabulary Russian speech recognition using syntactico-statistical language modeling
    Karpov, Alexey
    Markov, Konstantin
    Kipyatkova, Irina
    Vazhenina, Dania
    Ronzhin, Andrey
    [J]. SPEECH COMMUNICATION, 2014, 56 : 213 - 228
  • [10] Kipyatkova Irina, 2013, Speech and Computer. 15th International Conference, SPECOM 2013, P219, DOI 10.1007/978-3-319-01931-4_29