An End-to-end Approach to Language Identification in Short Utterances using Convolutional Neural Networks

被引:0
|
作者
Lozano-Diez, Alicia [1 ]
Zazo-Candil, Ruben [1 ]
Gonzalez-Dominguez, Javier [1 ]
Toledano, Doroteo T. [1 ]
Gonzalez-Rodriguez, Joaquin [1 ]
机构
[1] Univ Autonoma Madrid, ATVS Biometr Recognit Grp, Madrid, Spain
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work, we propose an end-to-end approach to the language identification (LID) problem based on Convolutional Deep Neural Networks (CDNNs). The use of CDNNs is mainly motivated by the ability they have shown when modeling speech signals, and their relatively low-cost with respect to other deep architectures in terms of number of free parameters. We evaluate different configurations in a subset of 8 languages within the NIST Language Recognition Evaluation 2009 Voice of America (VOA) dataset, for the task of short test durations (segments up to 3 seconds of speech). The proposed CDNN-based systems achieve comparable performances to our baseline i-vector system, while reducing drastically the number of parameters to tune (at least 100 times fewer parameters). Then, we combine these CDNN-based systems and the i-vector baseline with a simple fusion at score level. This combination outperforms our best standalone system (up to 11% of relative improvement in terms of EER).
引用
收藏
页码:403 / 407
页数:5
相关论文
共 50 条
  • [1] End-to-End Language Identification Using a Residual Convolutional Neural Network with Attentive Temporal Pooling
    Monteiro, Joao
    Alam, Jahangir
    Bhattacharya, Gautam
    Falk, Tiago H.
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [2] End-to-end Language Identification using Attention-based Recurrent Neural Networks
    Geng, Wang
    Wang, Wenfu
    Zhao, Yuanyuan
    Cai, Xinyuan
    Xu, Bo
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2944 - 2948
  • [3] End-to-End Chinese Dialects Identification in Short Utterances using CNN-BiGRU
    Zhang, Qiuxian
    Ma, Yong
    Gu, Mingliang
    Jin, Yun
    Qi, Zhaodi
    Ma, Xinxin
    Zhou, Qing
    PROCEEDINGS OF 2019 IEEE 8TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC 2019), 2019, : 340 - 344
  • [4] Focal Loss for End-to-end Short Utterances Chinese Dialect Identification
    Zhang, Qiuxian
    Yi, Jiangyan
    Tao, Jianhua
    Gu, Mingliang
    Ma, Yong
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 397 - 401
  • [5] End-to-End Text Recognition with Convolutional Neural Networks
    Wang, Tao
    Wu, David J.
    Coates, Adam
    Ng, Andrew Y.
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 3304 - 3308
  • [6] An End-to-End Real-Time Face Identification and Attendance System using Convolutional Neural Networks
    Rai, Aashish
    Karnani, Rashmi
    Chudasama, Vishal
    Upla, Kishor
    2019 IEEE 16TH INDIA COUNCIL INTERNATIONAL CONFERENCE (IEEE INDICON 2019), 2019,
  • [7] End-to-End Speaker Identification in Noisy and Reverberant Environments Using Raw Waveform Convolutional Neural Networks
    Salvati, Daniele
    Drioli, Carlo
    Foresti, Gian Luca
    INTERSPEECH 2019, 2019, : 4335 - 4339
  • [8] An End-to-End Text-Independent Speaker Identification System on Short Utterances
    Ji, Ruifang
    Cai, Xinyuan
    Xu, Bo
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3628 - 3632
  • [9] Image Shadow Removal Using End-To-End Deep Convolutional Neural Networks
    Fan, Hui
    Han, Meng
    Li, Jinjiang
    APPLIED SCIENCES-BASEL, 2019, 9 (05):
  • [10] An End-to-End Compression Framework Based on Convolutional Neural Networks
    Jiang, Feng
    Tao, Wen
    Liu, Shaohui
    Ren, Jie
    Guo, Xun
    Zhao, Debin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (10) : 3007 - 3018