Identification of Indian languages using multi-level spectral and prosodic features

被引:53
作者
Ramu Reddy V. [1 ]
Maity S. [2 ]
Sreenivasa Rao K. [2 ]
机构
[1] TCS Innovation Labs
[2] School of Information Technology, Indian Institute of Technology Kharagpur
来源
Sreenivasa Rao, K. (ksrao@iitkgp.ac.in) | 1600年 / Kluwer Academic Publishers卷 / 16期
关键词
Gaussian mixture models; Intonation; Language identification; Language recognition; Prosodic features; Rhythm; Stress;
D O I
10.1007/s10772-013-9198-0
中图分类号
学科分类号
摘要
In this paper spectral and prosodic features extracted from different levels are explored for analyzing the language specific information present in speech. In this work, spectral features extracted from frames of 20 ms (block processing), individual pitch cycles (pitch synchronous analysis) and glottal closure regions are used for discriminating the languages. Prosodic features extracted from syllable, tri-syllable and multi-word (phrase) levels are proposed in addition to spectral features for capturing the language specific information. In this study, language specific prosody is represented by intonation, rhythm and stress features at syllable and tri-syllable (words) levels, whereas temporal variations in fundamental frequency (F 0 contour), durations of syllables and temporal variations in intensities (energy contour) are used to represent the prosody at multi-word (phrase) level. For analyzing the language specific information in the proposed features, Indian language speech database (IITKGP-MLILSC) is used. Gaussian mixture models are used to capture the language specific information from the proposed features. The evaluation results indicate that language identification performance is improved with combination of features. Performance of proposed features is also analyzed on standard Oregon Graduate Institute Multi-Language Telephone-based Speech (OGI-MLTS) database. © 2013 Springer Science+Business Media New York.
引用
收藏
页码:489 / 511
页数:22
相关论文
共 84 条
[1]  
Ambikairajah E., Li H., Wang L., Yin B., Sethu V., Language identification: A tutorial, IEEE Circuits and Systems Magazine, 11, 2, pp. 82-108, (2011)
[2]  
Benesty J., Sondhi M.M., Huang Y., Springer Handbook of Speech Processing, (2007)
[3]  
Bhaskararao P., Salient phonetic features of Indian languages in speech technology, Sadhana, 36, 5, pp. 587-599, (2005)
[4]  
Carrasquillo P.A.T., Reynolds D.A., Deller J.R., Language identification using Gaussian mixture model tokenization, Proceedings of IEEE Int. Conf. Acoust.
[5]  
Speech, and Signal Processing, pp. 757-760, (2002)
[6]  
Cimarusti D., Eves R.B., Development of an automatic identification system of spoken languages: Phase i, Proceedings of IEEE Int. Conf. Acoust.
[7]  
Speech, and Signal Processing, pp. 1661-1663, (1982)
[8]  
Cole R.A., Inouye J.W.T., Muthusamy Y.K., Gopalakrishnan M., Language identification with neural networks: A feasibility study, Proc. IEEE Pacific Rim Conf. Communications, Computers and Signal Processing, pp. 525-529, (1989)
[9]  
Corredor-Ardoy C., Gauvain J., Adda-Decker M., Lamel L., Language identification with language-independent acoustic models, Proc. EUROSPEECH-1997, pp. 55-58, (1997)
[10]  
Cummins F., Gers F., Schmidhuber J., Comparing prosody across languages, Tech. Rep. I. D. S. I. A. Technical Report IDSIA-07-99, (1999)