Data-pooling and multi-task learning for enhanced performance of speech recognition systems in multiple low resourced languages

被引:1
|
作者
Madhavaraj, A. [1 ]
Ramakrishnan, A. G. [1 ]
机构
[1] Indian Inst Sci, MILE Lab, Elect Engn, Bangalore 560012, Karnataka, India
关键词
Multi-task learning; data-pooling; deep neural networks; phone mapping; alignments; senone posteriors; cross-lingual training; multilingual training; parameter sharing; speech recognition; Gujarati; Tamil; Telugu;
D O I
10.1109/ncc.2019.8732237
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
We present two approaches to improve the performance of automatic speech recognition (ASR) systems for Gujarati, Tamil and Telugu. In the first approach using data-pooling with phone mapping (DP-PM), a deep neural network (DNN) is trained to predict the senones for the target language; then we use the feature vectors and their alignments from other source languages to map the phones from the source to the target language. The lexicons of the source languages are then modified using this phone mapping and an ASR system for the target language is trained using both the target and the modified source data. This DPPM approach gives relative improvements in word error rates (WER) of 5.1% for Gujarati, 3.1% for Tamil and 3.4% for Telugu, over the corresponding baseline figures. In the second approach using multi-task DNN (MT-DNN) modeling, we use feature vectors from all the languages and train a DNN with three output layers, each predicting the senones of one of the languages. Objective functions of the output layers are modified such that during training, only those DNN layers responsible for predicting the senones of a language are updated, if the feature vector belongs to that language. This MT-DNN approach achieves relative improvements in WER of 5.7%, 3.3% and 5.2% for Gujarati, Tamil and Telugu, respectively.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Attribute Knowledge Integration for Speech Recognition Based on Multi-task Learning Neural Networks
    Zheng, Hao
    Yang, Zhanlei
    Qiao, Liwei
    Li, Jianping
    Liu, Wenju
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 543 - 547
  • [32] Speech Emotion Recognition Based on Multi-Task Learning Using a Convolutional Neural Network
    Kim, Nam Kyun
    Lee, Jiwon
    Ha, Hun Kyu
    Lee, Geon Woo
    Lee, Jung Hyuk
    Kim, Hong Kook
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 704 - 707
  • [33] Multi-task Learning with Auxiliary Cross-attention Transformer for Low-Resource Multi-dialect Speech Recognition
    Dan, Zhengjia
    Zhao, Yue
    Bi, Xiaojun
    Wu, Licheng
    Ji, Qiang
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT I, 2022, 13551 : 107 - 118
  • [34] Multi-Task Deep Learning for User Intention Understanding in Speech Interaction Systems
    Ning, Yishuang
    Jia, Jia
    Wu, Zhiyong
    Li, Runnan
    An, Yongsheng
    Wang, Yanfeng
    Meng, Helen
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 161 - 167
  • [35] MULTI-TASK JOINT-LEARNING OF DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Qian, Yanmin
    Yin, Maofan
    You, Yongbin
    Yu, Kai
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 310 - 316
  • [36] SELECTIVE MULTI-TASK LEARNING FOR SPEECH EMOTION RECOGNITION USING CORPORA OF DIFFERENT STYLES
    Zhang, Heran
    Mimura, Masato
    Kawahara, Tatsuya
    Ishizuka, Kenkichi
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7707 - 7711
  • [37] Safe Screening for Multi-Task Feature Learning with Multiple Data Matrices
    Wang, Jie
    Ye, Jieping
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 1747 - 1756
  • [38] Multi-task Learning Based on Multiple Data Sources for Cancer Detection
    Hong, Siyi
    2021 3RD INTERNATIONAL CONFERENCE ON MACHINE LEARNING, BIG DATA AND BUSINESS INTELLIGENCE (MLBDBI 2021), 2021, : 486 - 491
  • [39] Design of multi-feature class models for Speech Recognition Security Systems with under-resourced languages
    Barroso, N.
    de Ipina, K. Lopez
    Hernandez, C.
    Ezeiza, A.
    2011 IEEE INTERNATIONAL CARNAHAN CONFERENCE ON SECURITY TECHNOLOGY (ICCST), 2011,
  • [40] MULTI-LINGUAL SPEECH RECOGNITION WITH LOW-RANK MULTI-TASK DEEP NEURAL NETWORKS
    Mohan, Aanchan
    Rose, Richard
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4994 - 4998