Identification of Language using Mel-Frequency Cepstral Coefficients (MFCC)

被引:46
作者
Koolagudi, Shashidhar G. [1 ]
Rastogi, Deepika [1 ]
Rao, K. Sreenivasa [2 ]
机构
[1] Graph Era Univ, Sch Comp, Dehra Dun 248002, Uttarakhand, India
[2] Indian Inst Technol, Kharagpur 721302, W Bengal, India
来源
INTERNATIONAL CONFERENCE ON MODELLING OPTIMIZATION AND COMPUTING | 2012年 / 38卷
关键词
Gaussian Mixture Model; Language identification; Mel-frequency Cepstral Coefficient; Spectral features;
D O I
10.1016/j.proeng.2012.06.392
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper focuses on the task of identifying a language from speech signal. In this paper, we have use Mel-frequency cepstral coefficient as features. Language identification models are developed for fifteen Indian languages namely Assamese, Bangla, Guajarati, Hindi, Kannada, Kashmiri, Malayalam, Marathi, Nepali, Oriya, Punjabi, Rajasthani, Tamil, Telugu and Urdu using these spectral features. The identification of above mentioned languages is carried out using Gaussian mixture model. A Semi natural read database is used for obtaining the language specific information. MFCC is obtained by using linear cosine transform of log power spectrum on a nonlinear mel-frequency scale. This paper shows that the performance of Language identification system is better when trained and tested with twenty nine features as compared to six, eight, thirteen, nineteen and twenty one MECC features. It means more the number of features we use better the result we get. The average language recognition rate over fifteen Indian languages is around 88\%. (C) 2012 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of Noorul Islam Centre for Higher Education
引用
收藏
页码:3391 / 3398
页数:8
相关论文
共 6 条
[1]  
Hieronymus James L, SPOKEN LANGUAGE IDEN
[2]  
Koolagudi Shashidhar G., 2011, P IEEE INT C DEV COM
[3]  
Kumar P., 2010, J TELECOMMUNICATIONS, V1
[4]  
Mohanty Sanghamitra, LANGUAGE IDENTIFICAT
[5]  
Samouelian A., AUTOMATIC LANGUAGE I
[6]  
Zissman IMarc A., 1996, IEEE T SPEECH AUDIO, V4