Joint Estimation of Articulatory Features and Acoustic models for Low-Resource Languages

被引:2
作者
Abraham, Basil [1 ]
Umesh, S. [1 ]
Joy, Neethu Mariam [1 ]
机构
[1] Indian Inst Technol Madras, Madras, Tamil Nadu, India
来源
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年
关键词
speech recognition; articulatory features; low resource languages; deep neural networks (DNN);
D O I
10.21437/Interspeech.2017-1028
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Using articulatory features for speech recognition improves the performance of low-resource languages. One way to obtain articulatory features is by using an articulatory classifier (pseudo articulatory features). The performance of the articulatory features depends on the efficacy of this classifier. But, training such a robust classifier for a low-resource language is constrained due to the limited amount of training data. We can overcome this by training the articulatory classifier using a high resource language. This classifier can then he used to generate articulatory features for the low-resource language. However, this technique fails when high and low-resource languages have mismatches in their environmental conditions. In this paper, we address both the aforementioned problems by jointly estimating the articulatory features and low-resource acoustic model. The experiments were performed on two low-resource Indian languages namely, Hindi and Tamil. English was used as the high-resource language. A relative improvement of 23% and 10% were obtained for Hindi and Tamil, respectively.
引用
收藏
页码:2153 / 2157
页数:5
相关论文
共 21 条
[1]   An automated technique to generate phone-to-articulatory label mapping [J].
Abraham, Basil ;
Umesh, S. .
SPEECH COMMUNICATION, 2017, 86 :107-120
[2]  
[Anonymous], 2011, IEEE 2011 WORKSHOP
[3]  
[Anonymous], 2014, TECH REP
[4]  
Cetin O., 2007, AC SPEECH SIGN PROC, V4, pIV
[5]   Monolingual and crosslingual comparison of tandem features derived from articulatory and phone MLPs [J].
Cetin, Oezguer ;
Magimai-Doss, Mathew ;
Livescu, Karen ;
Kantor, Arthur ;
King, Simon ;
Bartels, Chris ;
Frankel, Joe .
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, :36-+
[6]  
Deng L, 1994, AC SPEECH SIGN PROC, P1
[7]  
Eide E., 1993, ICASSP-93. 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing (Cat. No.92CH3252-4), P483, DOI 10.1109/ICASSP.1993.319347
[8]  
Elenius K., 1991, EUROSPEECH
[9]   An HMM-based speech recognizer using overlapping articulatory features [J].
Erler, K ;
Freeman, GH .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 100 (04) :2500-2513
[10]  
Frankel J., 2007, P INT