BUT OpenSAT 2017 speech recognition system

被引:1
作者
Karafiat, Martin [1 ]
Baskar, Murali Karthick
Szoke, Igor
Malenovsky, Vladimir
Vesely, Karel
Grezl, Frantisek
Burget, Lukas
Cernocky, Jan Honza
机构
[1] Brno Univ Technol, Speech FIT, Brno, Czech Republic
来源
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年
关键词
speech recognition; multilingual training; BLSTM; data augmentation; robustness;
D O I
10.21437/Interspeech.2018-2457
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The paper describes BUT Automatic Speech Recognition (ASR) systems for two domains in OpenSAT evaluations: Low Resourced Languages and Public Safety Communications. The first was challenging due to lack of training data, therefore multilingual approaches for BLSTM training were employed and recently published Residual Memory Networks requiring less training data were used. Combination of both approaches led to superior performance. The second domain was challenging due to recording in extreme conditions: specific channel, speaker under stress, high levels of noise. A data augmentation process was very important to get reasonably good performance.
引用
收藏
页码:2638 / 2642
页数:5
相关论文
共 26 条
[1]  
[Anonymous], 2002, CAMBRIDGE U ENG DEP
[2]  
[Anonymous], 1995, SPEECH CODING SYNTHE
[3]  
[Anonymous], 2011, WORKSH AUT SPEECH RE
[4]  
[Anonymous], 2012, P INTERSPEECH
[5]  
[Anonymous], 2014, Technical Report MSR-TR-2014-112
[6]  
BASKAR MK, 2017, PROCEEDINGS OF ICASS, P4810
[7]  
Ghahremani P., 2014, AC SPEECH SIGN PROC
[8]  
Ghoshal A, 2013, INT CONF ACOUST SPEE, P7319, DOI 10.1109/ICASSP.2013.6639084
[9]  
Grezl F., 2014, P 4 INT WORKSH SPOK, P39
[10]  
Grézl F, 2009, INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, P2915