A hierarchical language identification system for Indian languages

被引：41

作者：

Jothilakshmi, S. ^{[1
]}

Ramalingam, V. ^{[1
]}

Palanivel, S. ^{[1
]}

机构：

[1] Annamalai Univ, Dept Comp Sci & Engn, Annamalainagar 608002, Tamil Nadu, India

来源：

DIGITAL SIGNAL PROCESSING | 2012年 / 22卷 / 03期

关键词：

Language identification; Mel frequency cepstral coefficients; Shifted delta cepstral coefficients; Hidden Markov model; Gaussian mixture model; Neural networks; Indian languages; SPOKEN; RECOGNITION; SPEECH;

D O I：

10.1016/j.dsp.2011.11.008

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Automatic spoken Language IDentification (LID) is the task of identifying the language from a short duration of speech signal uttered by an unknown speaker. In this work, an attempt has been made to develop a two level language identification system for Indian languages using acoustic features. In the first level, the system identifies the family of the spoken language, and then it is fed to the second level which aims at identifying the particular language in the corresponding family. The performance of the system is analyzed for various acoustic features and different classifiers. The suitable acoustic feature and the pattern classification model are suggested for effective identification of Indian languages. The system has been modeled using hidden Markov model (HMM), Gaussian mixture model (GMM) and artificial neural networks (ANN). We studied the discriminative power of the system for the features mel frequency cepstral coefficients (MFCC). MFCC with delta and acceleration coefficients and shifted delta cepstral (SDC) coefficients. Then the LID performance as a function of the different training and testing set sizes has been studied. To carry out the experiments, a new database has been created for 9 Indian languages. It is shown that GMM based LID system using MFCC with delta and acceleration coefficients is performing well with 80.56% accuracy. The performance of GMM based LID system with SDC is also considerable. (C) 2012 Elsevier Inc. All rights reserved.

引用

页码：544 / 553

页数：10

共 24 条

[1] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES
DAVIS, SB
MERMELSTEIN, P
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04): : 357 - 366
[2] Gleason T.P., 2004, P OD SPEAK LANG REC, P297
[3] Jayaram A.K.V.S., 2003, P IEEE INT C AC SPEE, P32
[4] A vector space modeling approach to spoken language identification
Li, Haizhou
Ma, Bin
Lee, Chin-Hui
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (01): : 271 - 284
[5] Lin C.-K., 2006, Proc. IEEE International Symposium on Power Line Communications and Its Applications (ISPLC), P196
[6] Reviewing automatic language identification
Muthusamy, Yeshwant K.
Barnard, Etienne
Cole, Ronald A.
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 1994, 11 (04) : 33 - 41
[7] Language identification using acoustic log-likelihoods of syllable-like units
Nagarajan, T.
Murthy, H. A.
[J]. SPEECH COMMUNICATION, 2006, 48 (08) : 913 - 926
[8] NAGARAJAN T, 2004, THESIS INDIAN I TECH
[9] Nagarajan T., 2003, WORKSH SPOK LANG PRO, P101
[10] Analysis and Selection of Prosodic Features for Language Identification
Ng, Raymond W. M.
Lee, Tan
Leung, Cheung-Chi
Ma, Bin
Li, Haizhou
[J]. 2009 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2009, : 123 - 128

← 1 2 3 →