Indian language identification using time-frequency image textural descriptors and GWO-based feature selection

被引:12
作者
Chowdhury, Amit A. [1 ]
Borkar, Vaibhav S. [1 ]
Birajdar, Gajanan K. [1 ]
机构
[1] Ramrao Adik Inst Technol, Dept Elect Engn, Navi Mumbai 400706, Maharashtra, India
关键词
Indian language identification; spectrogram; CLBP; LBPHF; grey wolf optimisation; feature selection; neural network classifier; WOLF OPTIMIZATION;
D O I
10.1080/0952813X.2019.1631392
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An ability to categorise and recognise a spoken language is an essential task in a multi-lingual society like India. Language identification (LID) is the process of identifying the language spoken by some unknown speaker using a given speech sample. In this article, textural descriptors extracted from spectrogram image and evolutionary feature selection is presented for Indian language identification. Language-specific long-term cues and prosodic information present in various frequency zones of the spectrogram image can efficiently modelled using textural descriptors. Firstly, an input audio sample is converted into a spectrogram visual representation which characterises the band of frequencies of a signal with respect to time. Then, completed local binary pattern (CLBP), local binary pattern histogram Fourier (LBPHF) and discrete Wavelet transform based texture descriptors are used to extract the features from the spectrogram image. Later, using grey wolf optimiser (GWO) feature selection, irrelevant and redundant features are removed, and only optimal features are selected from the dataset. GWO-based feature selection supports to construct the classification model with optimal features and the performance of the classifier is optimised. Finally, using the artificial neural network classifier and Indic-TTS database 96.9659% accuracy was obtained.
引用
收藏
页码:111 / 132
页数:22
相关论文
共 52 条
  • [1] ADDADECKER M, 2009, SPOKEN LANGUAGE PROC, P279
  • [2] Ahonen T, 2009, LECT NOTES COMPUT SC, V5575, P61, DOI 10.1007/978-3-642-02230-2_7
  • [3] [Anonymous], 2018, Ethnologue
  • [4] BAKSHI A, 2018, SADHANA, V43, P43
  • [5] Balleda J., 2000, 6 INT C SPOK LANG PR, P1033
  • [6] Semantic speech recognition in the Basque context Part II: language identification for under-resourced languages
    Barroso, Nora
    Lopez de Ipina, Karmele
    Hernandez, Carmen
    Ezeiza, Aitzol
    Grana, Manuel
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (01) : 41 - 47
  • [7] Speech and music classification using spectrogram based statistical descriptors and extreme learning machine
    Birajdar, Gajanan K.
    Patil, Mukesh D.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (11) : 15141 - 15168
  • [8] CHEL H, 2011, 1 INT C COMP SCI ENG, V204, P1
  • [9] Rheological wall slip velocity prediction model based on artificial neural network
    Chin, Ren Jie
    Lai, Sai Hin
    Ibrahim, Shaliza
    Jaafar, Wan Zurina Wan
    Elshafie, Ahmed
    [J]. JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2019, 31 (04) : 659 - 676
  • [10] Dash M., 1997, Intelligent Data Analysis, V1