Improving Indian Spoken-Language Identification by Feature Selection in Duration Mismatch Framework

被引:0
作者
Bakshi A. [1 ]
Kopparapu S.K. [2 ]
机构
[1] UMIT, SNDT University, Mumbai
[2] TCS Research, TATA Consultancy Services, Yantra Park, Thane
关键词
Classifier fusion; Feature selection; Indian language; Spoken language identification;
D O I
10.1007/s42979-021-00750-1
中图分类号
学科分类号
摘要
Paper presents novel duration normalized feature selection technique and two-step modified hierarchical classifier to improve the accuracy of spoken language identification (SLID) using Indian languages for duration mismatched condition. Feature selection averages random forest-based importance vectors of open SMILE features of different duration utterances. Although it improves the SLID system’s accuracy for mismatched training and testing durations, the performance is significantly reduced for short-duration utterances. A cascade of inter-family and intra-family classifiers with an additional class to improve false language family estimation. All India Radio data set with nine Indian languages and different utterance durations was used as speech material. Experimental results showed that 150 optimal features with the proposed modified hierarchical classifier showed the highest accuracy of 96.9 % and 84.4 % for 30 s and 0.2 s utterances for the same train-test duration. However, we achieved an accuracy of 98.3 % and 61.9 % for 15 and 0.2 s test duration when trained with 30 s duration utterance. Comparative analysis showed a significant improvement in accuracy than several SLID systems in the literature. © 2021, The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd.
引用
收藏
相关论文
共 30 条
[1]  
Aarti B., Kopparapu S.K., Spoken Indian language identification: a review of features and databases, Sadhana (Acad Proc Eng Sci)., 43, (2018)
[2]  
Torres-Carrasquillo P.A., Singer E., Kohler M.A., Greene R.J., Reynolds D.A., Deller J.R., Approaches to language identification using Gaussian mixture models and shifted delta cepstral features, INTERSPEECH, (2002)
[3]  
Das H., Roy P., Bottleneck feature-based hybrid deep autoencoder approach for Indian language identification, Arab J Sci Eng., (2020)
[4]  
ChinaBhanja C., Laskar M.A., Laskar R.H., A pre-classification-based language identification for Northeast Indian languages using prosody and spectral features, Circuits Syst. Signal Process, 38, 5, (2019)
[5]  
Koolagudi S.G., Rastogi D., Spoken language identification using spectral features, Contemporary Computing. IC3 2012. Communications in Computer and Information Science
[6]  
Guha S., Das A., Singh P.K., Ahmadian A., Senu N., Sarkar R., Hybrid feature selection method based on harmony search and naked mole-rat algorithms for spoken language identification from audio signals, IEEE Access., 8, (2020)
[7]  
Travadi R., Segbroeck M.V., Narayanan S.S., Modified-Prior I-Vector Estimation for Language Identification of Short Duration Utterances, (2014)
[8]  
Wang M., Song Y., Jiang B., Dai L., McLoughlin I., Exemplar based language recognition method for short-duration speech segments, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7354-7358, (2013)
[9]  
Dehak N., Torres-Carrasquillo P., Reynolds D., Dehak R., Language Recognition via I-Vectors and Dimensionality Reduction, (2011)
[10]  
Poddar A., Sahidullah M., Saha G., Performance comparison of speaker recognition systems in presence of duration variability, 2015 Annual IEEE India Conference (INDICON), pp. 1-6, (2015)