An End-to-end Approach to Language Identification in Short Utterances using Convolutional Neural Networks

被引：0

作者：

Lozano-Diez, Alicia ^{[1
]}

Zazo-Candil, Ruben ^{[1
]}

Gonzalez-Dominguez, Javier ^{[1
]}

Toledano, Doroteo T. ^{[1
]}

Gonzalez-Rodriguez, Joaquin ^{[1
]}

机构：

[1] Univ Autonoma Madrid, ATVS Biometr Recognit Grp, Madrid, Spain

来源：

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年

关键词：

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this work, we propose an end-to-end approach to the language identification (LID) problem based on Convolutional Deep Neural Networks (CDNNs). The use of CDNNs is mainly motivated by the ability they have shown when modeling speech signals, and their relatively low-cost with respect to other deep architectures in terms of number of free parameters. We evaluate different configurations in a subset of 8 languages within the NIST Language Recognition Evaluation 2009 Voice of America (VOA) dataset, for the task of short test durations (segments up to 3 seconds of speech). The proposed CDNN-based systems achieve comparable performances to our baseline i-vector system, while reducing drastically the number of parameters to tune (at least 100 times fewer parameters). Then, we combine these CDNN-based systems and the i-vector baseline with a simple fusion at score level. This combination outperforms our best standalone system (up to 11% of relative improvement in terms of EER).

引用

页码：403 / 407

页数：5

共 50 条

[1] End-to-End Language Identification Using a Residual Convolutional Neural Network with Attentive Temporal Pooling
Monteiro, Joao
Alam, Jahangir
Bhattacharya, Gautam
Falk, Tiago H.
2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
[2] End-to-end Language Identification using Attention-based Recurrent Neural Networks
Geng, Wang
Wang, Wenfu
Zhao, Yuanyuan
Cai, Xinyuan
Xu, Bo
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2944 - 2948
[3] End-to-End Chinese Dialects Identification in Short Utterances using CNN-BiGRU
Zhang, Qiuxian
Ma, Yong
Gu, Mingliang
Jin, Yun
Qi, Zhaodi
Ma, Xinxin
Zhou, Qing
PROCEEDINGS OF 2019 IEEE 8TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC 2019), 2019, : 340 - 344
[4] Focal Loss for End-to-end Short Utterances Chinese Dialect Identification
Zhang, Qiuxian
Yi, Jiangyan
Tao, Jianhua
Gu, Mingliang
Ma, Yong
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 397 - 401
[5] End-to-End Text Recognition with Convolutional Neural Networks
Wang, Tao
Wu, David J.
Coates, Adam
Ng, Andrew Y.
2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 3304 - 3308
[6] An End-to-End Real-Time Face Identification and Attendance System using Convolutional Neural Networks
Rai, Aashish
Karnani, Rashmi
Chudasama, Vishal
Upla, Kishor
2019 IEEE 16TH INDIA COUNCIL INTERNATIONAL CONFERENCE (IEEE INDICON 2019), 2019,
[7] End-to-End Speaker Identification in Noisy and Reverberant Environments Using Raw Waveform Convolutional Neural Networks
Salvati, Daniele
Drioli, Carlo
Foresti, Gian Luca
INTERSPEECH 2019, 2019, : 4335 - 4339
[8] An End-to-End Text-Independent Speaker Identification System on Short Utterances
Ji, Ruifang
Cai, Xinyuan
Xu, Bo
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3628 - 3632
[9] Image Shadow Removal Using End-To-End Deep Convolutional Neural Networks
Fan, Hui
Han, Meng
Li, Jinjiang
APPLIED SCIENCES-BASEL, 2019, 9 (05):
[10] An End-to-End Compression Framework Based on Convolutional Neural Networks
Jiang, Feng
Tao, Wen
Liu, Shaohui
Ren, Jie
Guo, Xun
Zhao, Debin
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (10) : 3007 - 3018

← 1 2 3 4 5 →