Multilingually trained bottleneck features in spoken language recognition

被引:38
作者
Fer, Radek [1 ,2 ]
Matejka, Pavel [1 ,2 ]
Grezl, Frantisek [1 ,2 ]
Plchot, Oldrich [1 ,2 ]
Vesely, Karel [1 ,2 ]
Cernocky, Jan Honza [1 ,2 ]
机构
[1] Brno Univ Technol, Speech FIT, Bozetechova 2, Brno 61266, Czech Republic
[2] Brno Univ Technol, Ctr Excellence IT4I, Bozetechova 2, Brno 61266, Czech Republic
关键词
Multilingual training; Bottleneck features; Spoken language recognition;
D O I
10.1016/j.csl.2017.06.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multilingual training of neural networks has proven to be simple yet effective way to deal with multilingual training corpora. It allows to use several resources to jointly train a language independent representation of features, which can be encoded into low dimensional feature set by embedding narrow bottleneck layer to the network. In this paper, we analyze such features on the task of spoken language recognition (SLR), focusing on practical aspects of training bottleneck networks and analyzing their integration in SLR. By comparing properties of mono and multilingual features we show the suitability of multilingual training for SLR. The state-of-the-art performance of these features is demonstrated on the NIST LRE09 database. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:252 / 267
页数:16
相关论文
共 42 条
[1]  
Anderson O., 1994, P IEEE INT C AC SPEE, V1
[2]  
[Anonymous], TECHNICAL REPORT
[3]  
[Anonymous], P INTERSPEECH
[4]  
[Anonymous], P INTERSPEECH
[5]  
Brummer N., 2014, TECHNICAL REPORT
[6]   Multitask learning [J].
Caruana, R .
MACHINE LEARNING, 1997, 28 (01) :41-75
[7]  
Corredor-Ardoy C., 1997, Proceedings of the European Conference on Speech Communication and Technology(EuroSpeech), P355
[8]   Front-End Factor Analysis for Speaker Verification [J].
Dehak, Najim ;
Kenny, Patrick J. ;
Dehak, Reda ;
Dumouchel, Pierre ;
Ouellet, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798
[9]  
Dupont S, 2005, 2005 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), P29
[10]  
Fér R, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P389