Significance of GMM-UBM based Modelling for Indian Language Identification

被引:11
作者
Kumar, Ravi, V [1 ]
Vydana, Hari Krishna [1 ]
Vuppala, Anil Kumar [1 ]
机构
[1] IIIT Hyderabad, Speech & Vis Lab, Hyderabad 500032, Andhra Pradesh, India
来源
ELEVENTH INTERNATIONAL CONFERENCE ON COMMUNICATION NETWORKS, ICCN 2015/INDIA ELEVENTH INTERNATIONAL CONFERENCE ON DATA MINING AND WAREHOUSING, ICDMW 2015/NDIA ELEVENTH INTERNATIONAL CONFERENCE ON IMAGE AND SIGNAL PROCESSING, ICISP 2015 | 2015年 / 54卷
关键词
Language identification; Phonotactics; Spectral features; Gaussian mixture modelling (GMM); Universal background model (UBM);
D O I
10.1016/j.procs.2015.06.027
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most of the Indian languages are originated from Devanagari, the script of the Sanskrit language. In-spite of similarity in phoneme sets, every language its own influence on the phonotactic constraints of speech in that language. A modelling technique that is capable of capturing the slightest variations imparted by the language is a pre-requisite for developing a language identification system (LID). Use of Gaussian mixture modelling technique with a large number of mixture components demands a large training data for each language class, which is hard to collect and handle. In this work, phonotactic variations imparted by the different languages are modelled using Gaussian mixture modelling with a universal background model (GMM-UBM) technique. In GMM-UBM based modelling certain amount of data from all the language classes is pooled to develop a universal background model (UBM) and the model is adapted to each class. Spectral features (MFCC) are employed to represent the language specific phonotactic information of speech in different languages. During the present study, LID systems are developed using the speech samples from IITKGP-MLILSC. In this work, performance of the proposed GMM-UBM based LID system is compared with conventional GMM based LID system. An average improvement of 7-8% is observed due to the use of UBM-based modelling of developing a LID system. (C) 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:231 / 236
页数:6
相关论文
共 9 条
[1]   Language Identification: A Tutorial [J].
Ambikairajah, Eliathamby ;
Li, Haizhou ;
Wang, Liang ;
Yin, Bo ;
Sethu, Vidhyasaharan .
IEEE CIRCUITS AND SYSTEMS MAGAZINE, 2011, 11 (02) :82-108
[2]  
Maity S., 2012, NAT C COMM NCC, P1, DOI DOI 10.1109/NCC.2012.6176831
[3]   Extraction and representation of prosodic features for language and speaker recognition [J].
Mary, Leena ;
Yegnanarayana, B. .
SPEECH COMMUNICATION, 2008, 50 (10) :782-796
[4]  
Nagarajan T., 2004, Ph.D. thesis
[5]  
Nandi D, 2013, OR COCOSDA HELD JOIN, P1
[6]   Pitch synchronous and glottal closure based speech analysis for language recognition [J].
Rao, K. ;
Maity, Sudhamay ;
Reddy, V. .
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2013, 16 (04) :413-430
[7]   Identification of Indian languages using multi-level spectral and prosodic features [J].
Ramu Reddy V. ;
Maity S. ;
Sreenivasa Rao K. .
Sreenivasa Rao, K. (ksrao@iitkgp.ac.in), 1600, Kluwer Academic Publishers (16) :489-511
[8]   Speaker verification using adapted Gaussian mixture models [J].
Reynolds, DA ;
Quatieri, TF ;
Dunn, RB .
DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) :19-41
[9]   Automatic language identification [J].
Zissman, MA ;
Berkling, KM .
SPEECH COMMUNICATION, 2001, 35 (1-2) :115-124