Improving Acoustic Models for Russian Spontaneous Speech Recognition

被引：10

作者：

Prudnikov, Alexey ^{[1
,2
]}

Medennikov, Ivan ^{[2
,3
]}

Mendelev, Valentin ^{[1
]}

Korenevsky, Maxim ^{[1
,2
]}

Khokhlov, Yuri ^{[3
]}

机构：

[1] Speech Technol Ctr Ltd, St Petersburg, Russia

[2] ITMO Univ, St Petersburg, Russia

[3] STC Innovat Ltd, St Petersburg, Russia

来源：

SPEECH AND COMPUTER (SPECOM 2015) | 2015年 / 9319卷

关键词：

Speech recognition; Russian spontaneous speech; Deep neural networks; Speaker adaptation; I-vectors; Bottleneck features; ADAPTATION;

D O I：

10.1007/978-3-319-23132-7_29

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The aim of the paper is to investigate the ways to improve acoustic models for Russian spontaneous speech recognition. We applied the main steps of the Kaldi Switchboard recipe to a Russian dataset but obtained low accuracy with respect to the results for English spontaneous telephone speech. We found two methods to be especially useful for Russian spontaneous speech: the i-vector based deep neural network adaptation and speaker-dependent bottleneck features which provide 8.6% and 11.9% relative word error rate reduction over the baseline system respectively.

引用

页码：234 / 242

页数：9

共 50 条

[21] Improving speech recognition using data augmentation and acoustic model fusion
Rebai, Ilyes
BenAyed, Yessine
Mahdi, Walid
Lorre, Jean-Pierre
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 : 316 - 322
[22] Acoustic model adaptation using in-domain background models for dysarthric speech recognition
Sharma, Harsh Vardhan
Hasegawa-Johnson, Mark
COMPUTER SPEECH AND LANGUAGE, 2013, 27 (06): : 1147 - 1162
[23] Simultaneous Adaptation of Acoustic and Language Models for Emotional Speech Recognition Using Tweet Data
Kosaka, Tetsuo
Saeki, Kazuya
Aizawa, Yoshitaka
Kato, Masaharu
Nose, Takashi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (03) : 363 - 373
[24] DNN-Based Acoustic Modeling for Russian Speech Recognition Using Kaldi
Kipyatkova, Irina
Karpov, Alexey
SPEECH AND COMPUTER, 2016, 9811 : 246 - 253
[25] Acoustic feature analysis and discriminative modeling of filled pauses for spontaneous speech recognition
Wu, CH
Yan, GL
JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 36 (2-3): : 91 - 104
[26] Acoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition
Chung-Hsien Wu
Gwo-Lang Yan
Journal of VLSI signal processing systems for signal, image and video technology, 2004, 36 : 91 - 104
[27] Target-directed mixture dynamic models for spontaneous speech recognition
Ma, JZ
Deng, L
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (01): : 47 - 58
[28] Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition
Sak, Hasim
Senior, Andrew
Rao, Kanishka
Beaufays, Francoise
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1468 - 1472
[29] Lecture Speech Recognition by Combining Word Graphs of Various Acoustic Models
Kosaka, Tetsuo
Goto, Keisuke
Ito, Takashi
Kato, Masaharu
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2978 - 2981
[30] Acoustic models of the elderly for large-vocabulary continuous speech recognition
Baba, A
Yoshizawa, S
Yamada, M
Lee, A
Shikano, K
ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2004, 87 (07): : 49 - 57

← 1 2 3 4 5 →