AN INVESTIGATION OF LSTM-CTC BASED JOINT ACOUSTIC MODEL FOR INDIAN LANGUAGE IDENTIFICATION

被引：0

作者：

Mandava, Tirusha ^{[1
]}

Vuddagiri, Ravi Kumar ^{[1
]}

Vydana, Hari Krishna ^{[1
]}

Vuppala, Anil Kumar ^{[1
]}

机构：

[1] Int Inst Informat Technol, Speech Proc Lab, Hyderabad, India

来源：

2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019) | 2019年

关键词：

Equal error rate; Joint acoustic model; Language identification system; Multi-head; Self-attention mechanism; RECOGNITION; CHALLENGE; FEATURES;

D O I：

10.1109/asru46091.2019.9003784

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, phonetic features derived from the joint acoustic model (JAM) of a multilingual end to end automatic speech recognition system are proposed for Indian language identification (LID). These features utilize contextual information learned by the JAM through long short-term memory-connectionist temporal classification (LSTM-CTC) framework. Hence, these features are referred to as CTC features. A multi-head self-attention network is trained using these features, which aggregates the frame-level features by selecting prominent frames through a parametrized attention layer. The proposed features have been tested on IIITH-ILSC database that consists of 22 official Indian languages and Indian English. Experimental results demonstrate that CTC features outperformed i-vector and phonetic temporal neural LID systems and produced an 8.70% equal error rate. The fusion of shifted delta cepstral and CTC feature-based LID systems at the model level and feature level further improved the performance.

引用

页码：389 / 396

页数：8

共 31 条

[1]

Allen F., 2005, PROC IEEE WORKSHOP M, P1

[2] Language Identification: A Tutorial [J].

Ambikairajah, Eliathamby ;

Li, Haizhou ;

Wang, Liang ;

Yin, Bo ;

Sethu, Vidhyasaharan .

IEEE CIRCUITS AND SYSTEMS MAGAZINE, 2011, 11 (02) :82-108

[3]

[Anonymous], P INT C SPOK LANG PR

[4]

[Anonymous], P WORKSH SPOK LANG T

[5]

[Anonymous], INTERSPEECH

[6] A Unified Parser for Developing Indian Language Text to Speech Synthesizers [J].

Baby, Arun ;

Nishanthi, N. L. ;

Thomas, Anju Leela ;

Murthy, Hema A. .

TEXT, SPEECH, AND DIALOGUE, 2016, 9924 :514-521

[7]

Dehak N., 2011, P INTERSPEECH, P1

[8]

Diez M, 2012, IEEE W SP LANG TECH, P274, DOI 10.1109/SLT.2012.6424235

[9]

Fér R, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P389

[10] Study of senone-based deep neural network approaches for spoken language recognition [J].

Ferrer L. ;

Lei Y. ;

McLaren M. ;

Scheffer N. .

IEEE/ACM Transactions on Audio Speech and Language Processing, 2016, 24 (01) :105-116

← 1 2 3 4 →