Towards Improving the Performance of Language Identification System for Indian Languages

被引:0
作者
Anto, Abitha [1 ]
Sreekumar, K. T. [2 ]
Kumar, Santhosh C. [2 ]
Raj, Reghu P. C. [1 ]
机构
[1] Govt Engn Coll, Dept Comp Sci & Engn, Palakkad, Kerala, India
[2] Amrita Vishwa Vidyapeetham, Dept Elect & Commun Engn, Machine Intelligence Res Lab, Coimbatore, Tamil Nadu, India
来源
2014 FIRST INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS AND COMMUNICATIONS (ICCSC) | 2014年
关键词
Phonotactic features; Phone Recognition followed by Language Modeling (PRLM); Language Model; n-gram;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present the details of a phonotactic language identification (LID) system developed for five Indian languages, English (Indian), Hindi, Malayalam, Tamil and Kannada. Since there are no publicly available speech databases for English, Malayalam and Kannada, we developed the database for each of the target languages by downloading the audio files from YouTube videos and removing the non-speech signals manually. The system was tested using a test data set consisting of 40 utterances with duration of 30, 10, and 3 secs. in each of 5 target languages. The performance evaluation was done separately accordingly to the NIST benchmarking sessions, for 30 s, 10 s and 3 s segments separately. For the baseline system, we got an overall EER of 10.41 %, 19.56 % and 31.45 % for 30, 10, and 3 secs segments when tested with a 3-gram language model. The use of 4-gram language model has helped enhance the performance of the LID system to 9.81 %, 19.38 % and 32.77% respectively for 30, 10 and 3 secs test segments. Further, by using the n-gram smoothing, we were able to improve the EER of the LID system, 9.02 %, 18.70 % and 29.24 % for 3-gram language models and 8.88 %, 16.46 % and 32.03 % for 4-gram language models, respectively for 30, 10, and 3 sec test segments. The study shows that the use of 4-gram language models can help enhance the performance of LID systems for Indian languages.
引用
收藏
页码:42 / 46
页数:5
相关论文
共 10 条
[1]   Language Identification: A Tutorial [J].
Ambikairajah, Eliathamby ;
Li, Haizhou ;
Wang, Liang ;
Yin, Bo ;
Sethu, Vidhyasaharan .
IEEE CIRCUITS AND SYSTEMS MAGAZINE, 2011, 11 (02) :82-108
[2]  
[Anonymous], P ICSLP 98
[3]  
[Anonymous], 2002, INTERSPEECH
[4]   A vector space modeling approach to spoken language identification [J].
Li, Haizhou ;
Ma, Bin ;
Lee, Chin-Hui .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (01) :271-284
[5]   Spoken Language Recognition: From Fundamentals to Practice [J].
Li, Haizhou ;
Ma, Bin ;
Lee, Kong Aik .
PROCEEDINGS OF THE IEEE, 2013, 101 (05) :1136-1159
[6]  
Ma B., 2006, COMPUTATIONAL LINGUI, V11, P159
[7]  
Matejka P., 2005, INTERSPEECH, P2237
[8]  
Suo HB, 2007, ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 1, PROCEEDINGS, P678
[9]  
Tong R, 2006, INT CONF ACOUST SPEE, P205
[10]   Comparison of four approaches to automatic language identification of telephone speech [J].
Zissman, MA .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (01) :31-44