AN INVESTIGATION OF LSTM-CTC BASED JOINT ACOUSTIC MODEL FOR INDIAN LANGUAGE IDENTIFICATION

被引:0
作者
Mandava, Tirusha [1 ]
Vuddagiri, Ravi Kumar [1 ]
Vydana, Hari Krishna [1 ]
Vuppala, Anil Kumar [1 ]
机构
[1] Int Inst Informat Technol, Speech Proc Lab, Hyderabad, India
来源
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019) | 2019年
关键词
Equal error rate; Joint acoustic model; Language identification system; Multi-head; Self-attention mechanism; RECOGNITION; CHALLENGE; FEATURES;
D O I
10.1109/asru46091.2019.9003784
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, phonetic features derived from the joint acoustic model (JAM) of a multilingual end to end automatic speech recognition system are proposed for Indian language identification (LID). These features utilize contextual information learned by the JAM through long short-term memory-connectionist temporal classification (LSTM-CTC) framework. Hence, these features are referred to as CTC features. A multi-head self-attention network is trained using these features, which aggregates the frame-level features by selecting prominent frames through a parametrized attention layer. The proposed features have been tested on IIITH-ILSC database that consists of 22 official Indian languages and Indian English. Experimental results demonstrate that CTC features outperformed i-vector and phonetic temporal neural LID systems and produced an 8.70% equal error rate. The fusion of shifted delta cepstral and CTC feature-based LID systems at the model level and feature level further improved the performance.
引用
收藏
页码:389 / 396
页数:8
相关论文
共 31 条
[1]  
Allen F., 2005, PROC IEEE WORKSHOP M, P1
[2]   Language Identification: A Tutorial [J].
Ambikairajah, Eliathamby ;
Li, Haizhou ;
Wang, Liang ;
Yin, Bo ;
Sethu, Vidhyasaharan .
IEEE CIRCUITS AND SYSTEMS MAGAZINE, 2011, 11 (02) :82-108
[3]  
[Anonymous], P INT C SPOK LANG PR
[4]  
[Anonymous], P WORKSH SPOK LANG T
[5]  
[Anonymous], INTERSPEECH
[6]   A Unified Parser for Developing Indian Language Text to Speech Synthesizers [J].
Baby, Arun ;
Nishanthi, N. L. ;
Thomas, Anju Leela ;
Murthy, Hema A. .
TEXT, SPEECH, AND DIALOGUE, 2016, 9924 :514-521
[7]  
Dehak N., 2011, P INTERSPEECH, P1
[8]  
Diez M, 2012, IEEE W SP LANG TECH, P274, DOI 10.1109/SLT.2012.6424235
[9]  
Fér R, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P389
[10]   Study of senone-based deep neural network approaches for spoken language recognition [J].
Ferrer L. ;
Lei Y. ;
McLaren M. ;
Scheffer N. .
IEEE/ACM Transactions on Audio Speech and Language Processing, 2016, 24 (01) :105-116