Improving Acoustic Models for Russian Spontaneous Speech Recognition

被引:10
|
作者
Prudnikov, Alexey [1 ,2 ]
Medennikov, Ivan [2 ,3 ]
Mendelev, Valentin [1 ]
Korenevsky, Maxim [1 ,2 ]
Khokhlov, Yuri [3 ]
机构
[1] Speech Technol Ctr Ltd, St Petersburg, Russia
[2] ITMO Univ, St Petersburg, Russia
[3] STC Innovat Ltd, St Petersburg, Russia
来源
SPEECH AND COMPUTER (SPECOM 2015) | 2015年 / 9319卷
关键词
Speech recognition; Russian spontaneous speech; Deep neural networks; Speaker adaptation; I-vectors; Bottleneck features; ADAPTATION;
D O I
10.1007/978-3-319-23132-7_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The aim of the paper is to investigate the ways to improve acoustic models for Russian spontaneous speech recognition. We applied the main steps of the Kaldi Switchboard recipe to a Russian dataset but obtained low accuracy with respect to the results for English spontaneous telephone speech. We found two methods to be especially useful for Russian spontaneous speech: the i-vector based deep neural network adaptation and speaker-dependent bottleneck features which provide 8.6% and 11.9% relative word error rate reduction over the baseline system respectively.
引用
收藏
页码:234 / 242
页数:9
相关论文
共 50 条
  • [21] Improving speech recognition using data augmentation and acoustic model fusion
    Rebai, Ilyes
    BenAyed, Yessine
    Mahdi, Walid
    Lorre, Jean-Pierre
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 : 316 - 322
  • [22] Acoustic model adaptation using in-domain background models for dysarthric speech recognition
    Sharma, Harsh Vardhan
    Hasegawa-Johnson, Mark
    COMPUTER SPEECH AND LANGUAGE, 2013, 27 (06): : 1147 - 1162
  • [23] Simultaneous Adaptation of Acoustic and Language Models for Emotional Speech Recognition Using Tweet Data
    Kosaka, Tetsuo
    Saeki, Kazuya
    Aizawa, Yoshitaka
    Kato, Masaharu
    Nose, Takashi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (03) : 363 - 373
  • [24] DNN-Based Acoustic Modeling for Russian Speech Recognition Using Kaldi
    Kipyatkova, Irina
    Karpov, Alexey
    SPEECH AND COMPUTER, 2016, 9811 : 246 - 253
  • [25] Acoustic feature analysis and discriminative modeling of filled pauses for spontaneous speech recognition
    Wu, CH
    Yan, GL
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 36 (2-3): : 91 - 104
  • [26] Acoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition
    Chung-Hsien Wu
    Gwo-Lang Yan
    Journal of VLSI signal processing systems for signal, image and video technology, 2004, 36 : 91 - 104
  • [27] Target-directed mixture dynamic models for spontaneous speech recognition
    Ma, JZ
    Deng, L
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (01): : 47 - 58
  • [28] Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition
    Sak, Hasim
    Senior, Andrew
    Rao, Kanishka
    Beaufays, Francoise
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1468 - 1472
  • [29] Lecture Speech Recognition by Combining Word Graphs of Various Acoustic Models
    Kosaka, Tetsuo
    Goto, Keisuke
    Ito, Takashi
    Kato, Masaharu
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2978 - 2981
  • [30] Acoustic models of the elderly for large-vocabulary continuous speech recognition
    Baba, A
    Yoshizawa, S
    Yamada, M
    Lee, A
    Shikano, K
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2004, 87 (07): : 49 - 57