Indian language identification using time-frequency image textural descriptors and GWO-based feature selection

被引：12

作者：

Chowdhury, Amit A. ^{[1
]}

Borkar, Vaibhav S. ^{[1
]}

Birajdar, Gajanan K. ^{[1
]}

机构：

[1] Ramrao Adik Inst Technol, Dept Elect Engn, Navi Mumbai 400706, Maharashtra, India

来源：

JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE | 2020年 / 32卷 / 01期

关键词：

Indian language identification; spectrogram; CLBP; LBPHF; grey wolf optimisation; feature selection; neural network classifier; WOLF OPTIMIZATION;

D O I：

10.1080/0952813X.2019.1631392

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

An ability to categorise and recognise a spoken language is an essential task in a multi-lingual society like India. Language identification (LID) is the process of identifying the language spoken by some unknown speaker using a given speech sample. In this article, textural descriptors extracted from spectrogram image and evolutionary feature selection is presented for Indian language identification. Language-specific long-term cues and prosodic information present in various frequency zones of the spectrogram image can efficiently modelled using textural descriptors. Firstly, an input audio sample is converted into a spectrogram visual representation which characterises the band of frequencies of a signal with respect to time. Then, completed local binary pattern (CLBP), local binary pattern histogram Fourier (LBPHF) and discrete Wavelet transform based texture descriptors are used to extract the features from the spectrogram image. Later, using grey wolf optimiser (GWO) feature selection, irrelevant and redundant features are removed, and only optimal features are selected from the dataset. GWO-based feature selection supports to construct the classification model with optimal features and the performance of the classifier is optimised. Finally, using the artificial neural network classifier and Indic-TTS database 96.9659% accuracy was obtained.

引用

页码：111 / 132

页数：22

共 52 条

[1]

ADDADECKER M, 2009, SPOKEN LANGUAGE PROC, P279

[2]

Ahonen T, 2009, LECT NOTES COMPUT SC, V5575, P61, DOI 10.1007/978-3-642-02230-2_7

[3]

[Anonymous], 2018, Ethnologue

[4]

BAKSHI A, 2018, SADHANA, V43, P43

[5]

Balleda Jyotsana, 2000, INTERSPEECH, P1033

[6] Semantic speech recognition in the Basque context Part II: language identification for under-resourced languages [J].

Barroso, Nora ;

Lopez de Ipina, Karmele ;

Hernandez, Carmen ;

Ezeiza, Aitzol ;

Grana, Manuel .

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (01) :41-47

[7] Speech and music classification using spectrogram based statistical descriptors and extreme learning machine [J].

Birajdar, Gajanan K. ;

Patil, Mukesh D. .

MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (11) :15141-15168

[8]

CHEL H, 2011, 1 INT C COMP SCI ENG, V204, P1

[9] Rheological wall slip velocity prediction model based on artificial neural network [J].

Chin, Ren Jie ;

Lai, Sai Hin ;

Ibrahim, Shaliza ;

Jaafar, Wan Zurina Wan ;

Elshafie, Ahmed .

JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2019, 31 (04) :659-676

[10]

Dash M., 1997, Intelligent Data Analysis, V1

← 1 2 3 4 5 6 →