Optimal prosodic feature extraction and classification in parametric excitation source information for Indian language identification using neural network based Q-learning algorithm

被引:8
作者
Das, Himanish Shekhar [1 ]
Roy, Pinki [1 ]
机构
[1] Natl Inst Technol Silchar, Dept Comp Sci & Engn, Silchar 788010, Assam, India
关键词
Automatic language identification (LID); Prosodic feature; Iterative adaptive inverse filtering (IAIF); Short time Fourier transform (STFT); And neural network based Q-learning (NNQL); FRONT-END; SPOKEN; RECOGNITION; PERFORMANCE;
D O I
10.1007/s10772-018-09582-6
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Automatic language identification (LID) system has extensively recognized in a real world multilanguage speech specific applications. The formation speech is relying on the vocal tract area which explores the excitation source information for LID task. In this paper, LID system utilizes sub segmental, segmental and supra segmental features from Linear Prediction residual of speech signal, represents various native language speech excitation source information. The glottal flow derivative of speech signal is obtained through iterative adaptive inverse filtering method. Moreover, the prosodic features of speech signal are extracted using short time Fourier transform due to its capability to process non-stationary signals. Finally, the deep neural network based Q-learning (DNNQL) algorithm has been employed for identification of the class label for a specific language. Experimental validation of the proposed approach is carried out using Indian language recorded database. Finally, the proposed LID system approach is performing well with 97.3% accuracy compared to other machine learning based approaches.
引用
收藏
页码:67 / 77
页数:11
相关论文
共 34 条
[1]   Language Identification: A Tutorial [J].
Ambikairajah, Eliathamby ;
Li, Haizhou ;
Wang, Liang ;
Yin, Bo ;
Sethu, Vidhyasaharan .
IEEE CIRCUITS AND SYSTEMS MAGAZINE, 2011, 11 (02) :82-108
[2]  
[Anonymous], 2015, 16 ANN C INT SPEECH
[3]  
[Anonymous], ARTIFICIAL INTELLIGE
[4]   Agreeing to disagree: active learning with noisy labels without crowdsourcing [J].
Bouguelia, Mohamed-Rafik ;
Nowaczyk, Slawomir ;
Santosh, K. C. ;
Verikas, Antanas .
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2018, 9 (08) :1307-1319
[5]  
Diez M, 2013, INTERSPEECH, P64
[6]   On the Projection of PLLRs for Unbounded Feature Distributions in Spoken Language Recognition [J].
Diez, Mireia ;
Varona, Amparo ;
Penagarikano, Mikel ;
Javier Rodriguez-Fuentes, Luis ;
Bordel, German .
IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) :1073-1077
[7]  
Diez M, 2012, IEEE W SP LANG TECH, P274, DOI 10.1109/SLT.2012.6424235
[8]   Study of senone-based deep neural network approaches for spoken language recognition [J].
Ferrer L. ;
Lei Y. ;
McLaren M. ;
Scheffer N. .
IEEE/ACM Transactions on Audio Speech and Language Processing, 2016, 24 (01) :105-116
[9]   From language identification to language distance [J].
Gamallo, Pablo ;
Ramom Pichel, Jose ;
Alegria, Inaki .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2017, 484 :152-162
[10]   Frame-by-frame language identification in short utterances using deep neural networks [J].
Gonzalez-Dominguez, Javier ;
Lopez-Moreno, Ignacio ;
Moreno, Pedro J. ;
Gonzalez-Rodriguez, Joaquin .
NEURAL NETWORKS, 2015, 64 :49-58