Improving Acoustic Models for Russian Spontaneous Speech Recognition

被引:10
|
作者
Prudnikov, Alexey [1 ,2 ]
Medennikov, Ivan [2 ,3 ]
Mendelev, Valentin [1 ]
Korenevsky, Maxim [1 ,2 ]
Khokhlov, Yuri [3 ]
机构
[1] Speech Technol Ctr Ltd, St Petersburg, Russia
[2] ITMO Univ, St Petersburg, Russia
[3] STC Innovat Ltd, St Petersburg, Russia
来源
SPEECH AND COMPUTER (SPECOM 2015) | 2015年 / 9319卷
关键词
Speech recognition; Russian spontaneous speech; Deep neural networks; Speaker adaptation; I-vectors; Bottleneck features; ADAPTATION;
D O I
10.1007/978-3-319-23132-7_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The aim of the paper is to investigate the ways to improve acoustic models for Russian spontaneous speech recognition. We applied the main steps of the Kaldi Switchboard recipe to a Russian dataset but obtained low accuracy with respect to the results for English spontaneous telephone speech. We found two methods to be especially useful for Russian spontaneous speech: the i-vector based deep neural network adaptation and speaker-dependent bottleneck features which provide 8.6% and 11.9% relative word error rate reduction over the baseline system respectively.
引用
收藏
页码:234 / 242
页数:9
相关论文
共 50 条
  • [31] Language Independent and Unsupervised Acoustic Models for Speech Recognition and Keyword Spotting
    Knill, Kate M.
    Gales, Mark J. F.
    Ragni, Anton
    Rath, Shakti P.
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 16 - 20
  • [32] HYBRID ACOUSTIC MODELS FOR DISTANT AND MULTICHANNEL LARGE VOCABULARY SPEECH RECOGNITION
    Swietojanski, Pawel
    Ghoshal, Arnab
    Renals, Steve
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 285 - 290
  • [33] A Study on Improving Acoustic Model for Robust and Far-Field Speech Recognition
    Xue, Shaofei
    Yan, Zhijie
    Yu, Tao
    Liu, Zhang
    2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
  • [34] ADAPTIVE BEAMFORMING AND ADAPTIVE TRAINING OF DNN ACOUSTIC MODELS FOR ENHANCED MULTICHANNEL NOISY SPEECH RECOGNITION
    Prudnikov, Alexey
    Korenevsky, Maxim
    Aleinik, Sergei
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 401 - 408
  • [35] Spontaneous Thai Speech Recognition
    Woszczyna, Monika
    Charoenpornsawat, Paisarn
    Schultz, Tanja
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1882 - 1885
  • [36] Large Vocabulary Continuous Speech Recognition With Reservoir-Based Acoustic Models
    Triefenbach, Fabian
    Demuynck, Kris
    Martens, Jean-Pierre
    IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (03) : 311 - 315
  • [37] Conversion from Phoneme Based to Grapheme Based Acoustic Models for Speech Recognition
    Zgank, Andrej
    Kacic, Zdravko
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1587 - 1590
  • [38] A COMPARISON BETWEEN DEEP NEURAL NETS AND KERNEL ACOUSTIC MODELS FOR SPEECH RECOGNITION
    Lu, Zhiyun
    Guo, Dong
    Garakani, Alireza Bagheri
    Liu, Kuan
    May, Avner
    Bellet, Aurelien
    Fan, Linxi
    Collins, Michael
    Kingsbury, Brian
    Picheny, Michael
    Sha, Fei
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5070 - 5074
  • [39] Cost-Efficient Development of Acoustic Models for Speech Recognition of Related Languages
    Nouza, Jan
    Cerva, Petr
    Kucharova, Michaela
    RADIOENGINEERING, 2013, 22 (03) : 866 - 873
  • [40] PRIVACY ATTACKS FOR AUTOMATIC SPEECH RECOGNITION ACOUSTIC MODELS IN A FEDERATED LEARNING FRAMEWORK
    Tomashenko, Natalia
    Mdhaffar, Salima
    Tommasi, Marc
    Esteve, Yannick
    Bonastre, Jean-Francois
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6972 - 6976