Overcoming Data Sparsity in Acoustic Modeling of Low-Resource Language by Borrowing Data and Model Parameters from High-Resource Languages

被引:3
作者
Abraham, Basil [1 ]
Umesh, S. [1 ]
Joy, Neethu Mariam [1 ]
机构
[1] Indian Inst Technol, Madras, Tamil Nadu, India
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
Terms:speech recognition; low-resource; cross-lingual; data pooling; CNN; DNN; AUTOMATIC SPEECH RECOGNITION;
D O I
10.21437/Interspeech.2016-963
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose two techniques to improve the acoustic model of a low-resource language by: (i) Pooling data from closely related languages using a phoneme mapping algorithm to build acoustic models like subspace Gaussian mixture model (SGMM), phone cluster adaptive training (Phone-CAT), deep neural network (DNN) and convolutional neural network (CNN). Using the low-resource language data, we then adapt the afore mentioned models towards that language. (ii) Using models built from high-resource languages, we first borrow subspace model parameters from SGMM/Phone-CAT; or hidden layers from DNN/CNN. The language specific parameters are then estimated using the low resource language data. The experiments were performed on four Indian languages namely Assamese, Bengali, Hindi and Tamil. Relative improvements of 10 to 30% were obtained over corresponding monolingual models in each case.
引用
收藏
页码:3037 / 3041
页数:5
相关论文
共 24 条
[1]  
Abraham B, 2014, IEEE W SP LANG TECH, P36, DOI 10.1109/SLT.2014.7078546
[2]  
[Anonymous], 2011, IEEE 2011 WORKSHOP
[3]   Automatic speech recognition for under-resourced languages: A survey [J].
Besacier, Laurent ;
Barnard, Etienne ;
Karpov, Alexey ;
Schultz, Tanja .
SPEECH COMMUNICATION, 2014, 56 :85-100
[4]   MULTILINGUAL ACOUSTIC MODELING FOR SPEECH RECOGNITION BASED ON SUBSPACE GAUSSIAN MIXTURE MODELS [J].
Burget, Lukas ;
Schwarz, Petr ;
Agarwal, Mohit ;
Akyazi, Pinar ;
Feng, Kai ;
Ghoshal, Arnab ;
Glembek, Ondrej ;
Goel, Nagendra ;
Karafiat, Martin ;
Povey, Daniel ;
Rastrow, Ariya ;
Rose, Richard C. ;
Thomas, Samuel .
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :4334-4337
[5]  
Byrne W, 2000, INT CONF ACOUST SPEE, P1029
[6]  
Cui J., CONTEXT, V10, P10
[7]  
Fraga-Silva T., 2015, P INTERSPEECH, P47
[8]   Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].
Hinton, Geoffrey ;
Deng, Li ;
Yu, Dong ;
Dahl, George E. ;
Mohamed, Abdel-rahman ;
Jaitly, Navdeep ;
Senior, Andrew ;
Vanhoucke, Vincent ;
Patrick Nguyen ;
Sainath, Tara N. ;
Kingsbury, Brian .
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97
[9]  
Huang JT, 2013, INT CONF ACOUST SPEE, P7304, DOI 10.1109/ICASSP.2013.6639081
[10]  
Knill KM, 2013, 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P138, DOI 10.1109/ASRU.2013.6707719