Cross-language Transfer Speech Recognition using Deep Learning

被引:0
作者
Zhao, Yue [1 ]
Xu, Yan M. [3 ]
Sun, Mei J. [2 ]
Xu, Xiao N. [3 ]
Wang, Hui [3 ]
Yang, Guo S. [3 ]
Ji, Qiang [4 ]
机构
[1] Minzu Univ China, Dept Automat, Beijing 100081, Peoples R China
[2] China Ship Res & Dev Acad, Beijing 100192, Peoples R China
[3] Minzu Univ China, Sch Informat Engn, Beijing 100081, Peoples R China
[4] Rensselaer Polytech Inst, Dept Elect Comp & Syst Engn, Troy, NY 12180 USA
来源
11TH IEEE INTERNATIONAL CONFERENCE ON CONTROL AND AUTOMATION (ICCA) | 2014年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-language transfer speech recognition aims to transform phoneme models for a source language to recognize a target language lacking labeled data and other linguistic resources. In this paper, sparse auto-encoder, a deep learning method, is introduced to derive shared speech features between source and target language using semi-supervised learning. It can extract the shared representation of phonemes between the source and target languages so that the target phones can be mapped to the appropriate phones of the source languages. The experimental results showed this method performs better on cross-language phones recognition than the method based on multilayer perceptron.
引用
收藏
页码:1422 / 1426
页数:5
相关论文
共 11 条
[1]  
[Anonymous], 2012, JMLR WORKSHOP C P
[2]   Monolingual and crosslingual comparison of tandem features derived from articulatory and phone MLPs [J].
Cetin, Oezguer ;
Magimai-Doss, Mathew ;
Livescu, Karen ;
Kantor, Arthur ;
King, Simon ;
Bartels, Chris ;
Frankel, Joe .
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, :36-+
[3]  
Frankel J, 2008, LECT NOTES COMPUT SC, V4892, P227
[4]  
Kempton T, 2011, 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, P3172
[5]   IRLbot: Scaling to 6 Billion Pages and Beyond [J].
Lee, Hsin-Tsang ;
Leonard, Derek ;
Wang, Xiaoming ;
Loguinov, Dmitri .
ACM TRANSACTIONS ON THE WEB, 2009, 3 (03)
[6]  
Loof J., 2009, P INTERSPEECH, P88
[7]  
Lotner N., 2012, IEEE 27 CONV EL EL E, P1
[8]  
Ngiam J., 2011, P 28 INT C MACHINE L, P689
[9]   Stacked Autoencoders for Unsupervised Feature Learning and Multiple Organ Detection in a Pilot Study Using 4D Patient Data [J].
Shin, Hoo-Chang ;
Orton, Matthew R. ;
Collins, David J. ;
Doran, Simon J. ;
Leach, Martin O. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1930-1943
[10]   Experiments on Cross-Language Attribute Detection and Phone Recognition With Minimal Target-Specific Training Data [J].
Siniscalchi, Sabato Marco ;
Lyu, Dau-Cheng ;
Svendsen, Torbjorn ;
Lee, Chin-Hui .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (03) :875-887