Improving Acoustic Models for Russian Spontaneous Speech Recognition

被引:10
|
作者
Prudnikov, Alexey [1 ,2 ]
Medennikov, Ivan [2 ,3 ]
Mendelev, Valentin [1 ]
Korenevsky, Maxim [1 ,2 ]
Khokhlov, Yuri [3 ]
机构
[1] Speech Technol Ctr Ltd, St Petersburg, Russia
[2] ITMO Univ, St Petersburg, Russia
[3] STC Innovat Ltd, St Petersburg, Russia
来源
SPEECH AND COMPUTER (SPECOM 2015) | 2015年 / 9319卷
关键词
Speech recognition; Russian spontaneous speech; Deep neural networks; Speaker adaptation; I-vectors; Bottleneck features; ADAPTATION;
D O I
10.1007/978-3-319-23132-7_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The aim of the paper is to investigate the ways to improve acoustic models for Russian spontaneous speech recognition. We applied the main steps of the Kaldi Switchboard recipe to a Russian dataset but obtained low accuracy with respect to the results for English spontaneous telephone speech. We found two methods to be especially useful for Russian spontaneous speech: the i-vector based deep neural network adaptation and speaker-dependent bottleneck features which provide 8.6% and 11.9% relative word error rate reduction over the baseline system respectively.
引用
收藏
页码:234 / 242
页数:9
相关论文
共 50 条
  • [1] Advances in STC Russian Spontaneous Speech Recognition System
    Medennikov, Ivan
    Prudnikov, Alexey
    SPEECH AND COMPUTER, 2016, 9811 : 116 - 123
  • [2] Specific acoustic models for spontaneous and dictated style in indonesian speech recognition
    Vista, C. B.
    Satriawan, C. H.
    Lestari, D. P.
    Widyantoro, D. H.
    2ND INTERNATIONAL CONFERENCE ON COMPUTING AND APPLIED INFORMATICS 2017, 2018, 978
  • [3] Speaker adaptive training and mixup regularization for neural network acoustic models in automatic speech recognition
    Tomashenko, Natalia
    Khokhlov, Yuri
    Esteve, Yannick
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2414 - 2418
  • [4] Interpolation of Acoustic Models for Speech Recognition
    Fraga-Silva, Thiago
    Gauvain, Jean-Luc
    Lamel, Lori
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3346 - 3350
  • [5] Improving Acoustic Models for Dysarthric Speech Recognition using Time Delay Neural Networks
    Misbullah, Alim
    Lin, Hai-Hsing
    Chang, Chia-Yuan
    Yeh, Hsiu-Wei
    Weng, Ko-Cheng
    2020 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS (ICELTICS 2020), 2020, : 118 - 121
  • [6] Improving Discriminative Training for Robust Acoustic Models in Large Vocabulary Continuous Speech Recognition
    Pylkkonen, Janne
    Kurimo, Mikko
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1210 - 1213
  • [7] Acoustic-to-Phrase Models for Speech Recognition
    Gaur, Yashesh
    Li, Jinyu
    Meng, Zhong
    Gong, Yifan
    INTERSPEECH 2019, 2019, : 2240 - 2244
  • [8] Compact Acoustic Models for Embedded Speech Recognition
    Levy, Christophe
    Linares, Georges
    Bonastre, Jean-Francois
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2009,
  • [9] Compact Acoustic Models for Embedded Speech Recognition
    Christophe Lévy
    Georges Linarès
    Jean-François Bonastre
    EURASIP Journal on Audio, Speech, and Music Processing, 2009
  • [10] Improving of Acoustic Model for the Mongolian Speech Recognition System
    Bao, Feilong
    Gao, Guanglai
    PROCEEDINGS OF THE 2009 CHINESE CONFERENCE ON PATTERN RECOGNITION AND THE FIRST CJK JOINT WORKSHOP ON PATTERN RECOGNITION, VOLS 1 AND 2, 2009, : 616 - 620