Improving Acoustic Models for Russian Spontaneous Speech Recognition

被引：10

作者：

Prudnikov, Alexey ^{[1
,2
]}

Medennikov, Ivan ^{[2
,3
]}

Mendelev, Valentin ^{[1
]}

Korenevsky, Maxim ^{[1
,2
]}

Khokhlov, Yuri ^{[3
]}

机构：

[1] Speech Technol Ctr Ltd, St Petersburg, Russia

[2] ITMO Univ, St Petersburg, Russia

[3] STC Innovat Ltd, St Petersburg, Russia

来源：

SPEECH AND COMPUTER (SPECOM 2015) | 2015年 / 9319卷

关键词：

Speech recognition; Russian spontaneous speech; Deep neural networks; Speaker adaptation; I-vectors; Bottleneck features; ADAPTATION;

D O I：

10.1007/978-3-319-23132-7_29

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The aim of the paper is to investigate the ways to improve acoustic models for Russian spontaneous speech recognition. We applied the main steps of the Kaldi Switchboard recipe to a Russian dataset but obtained low accuracy with respect to the results for English spontaneous telephone speech. We found two methods to be especially useful for Russian spontaneous speech: the i-vector based deep neural network adaptation and speaker-dependent bottleneck features which provide 8.6% and 11.9% relative word error rate reduction over the baseline system respectively.

引用

页码：234 / 242

页数：9

共 50 条

[1] Advances in STC Russian Spontaneous Speech Recognition System
Medennikov, Ivan
Prudnikov, Alexey
SPEECH AND COMPUTER, 2016, 9811 : 116 - 123
[2] Specific acoustic models for spontaneous and dictated style in indonesian speech recognition
Vista, C. B.
Satriawan, C. H.
Lestari, D. P.
Widyantoro, D. H.
2ND INTERNATIONAL CONFERENCE ON COMPUTING AND APPLIED INFORMATICS 2017, 2018, 978
[3] Speaker adaptive training and mixup regularization for neural network acoustic models in automatic speech recognition
Tomashenko, Natalia
Khokhlov, Yuri
Esteve, Yannick
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2414 - 2418
[4] Interpolation of Acoustic Models for Speech Recognition
Fraga-Silva, Thiago
Gauvain, Jean-Luc
Lamel, Lori
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3346 - 3350
[5] Improving Acoustic Models for Dysarthric Speech Recognition using Time Delay Neural Networks
Misbullah, Alim
Lin, Hai-Hsing
Chang, Chia-Yuan
Yeh, Hsiu-Wei
Weng, Ko-Cheng
2020 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS (ICELTICS 2020), 2020, : 118 - 121
[6] Improving Discriminative Training for Robust Acoustic Models in Large Vocabulary Continuous Speech Recognition
Pylkkonen, Janne
Kurimo, Mikko
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1210 - 1213
[7] Acoustic-to-Phrase Models for Speech Recognition
Gaur, Yashesh
Li, Jinyu
Meng, Zhong
Gong, Yifan
INTERSPEECH 2019, 2019, : 2240 - 2244
[8] Compact Acoustic Models for Embedded Speech Recognition
Levy, Christophe
Linares, Georges
Bonastre, Jean-Francois
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2009,
[9] Compact Acoustic Models for Embedded Speech Recognition
Christophe Lévy
Georges Linarès
Jean-François Bonastre
EURASIP Journal on Audio, Speech, and Music Processing, 2009
[10] Improving of Acoustic Model for the Mongolian Speech Recognition System
Bao, Feilong
Gao, Guanglai
PROCEEDINGS OF THE 2009 CHINESE CONFERENCE ON PATTERN RECOGNITION AND THE FIRST CJK JOINT WORKSHOP ON PATTERN RECOGNITION, VOLS 1 AND 2, 2009, : 616 - 620

← 1 2 3 4 5 →