Data-pooling and multi-task learning for enhanced performance of speech recognition systems in multiple low resourced languages

被引:1
|
作者
Madhavaraj, A. [1 ]
Ramakrishnan, A. G. [1 ]
机构
[1] Indian Inst Sci, MILE Lab, Elect Engn, Bangalore 560012, Karnataka, India
关键词
Multi-task learning; data-pooling; deep neural networks; phone mapping; alignments; senone posteriors; cross-lingual training; multilingual training; parameter sharing; speech recognition; Gujarati; Tamil; Telugu;
D O I
10.1109/ncc.2019.8732237
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
We present two approaches to improve the performance of automatic speech recognition (ASR) systems for Gujarati, Tamil and Telugu. In the first approach using data-pooling with phone mapping (DP-PM), a deep neural network (DNN) is trained to predict the senones for the target language; then we use the feature vectors and their alignments from other source languages to map the phones from the source to the target language. The lexicons of the source languages are then modified using this phone mapping and an ASR system for the target language is trained using both the target and the modified source data. This DPPM approach gives relative improvements in word error rates (WER) of 5.1% for Gujarati, 3.1% for Tamil and 3.4% for Telugu, over the corresponding baseline figures. In the second approach using multi-task DNN (MT-DNN) modeling, we use feature vectors from all the languages and train a DNN with three output layers, each predicting the senones of one of the languages. Objective functions of the output layers are modified such that during training, only those DNN layers responsible for predicting the senones of a language are updated, if the feature vector belongs to that language. This MT-DNN approach achieves relative improvements in WER of 5.7%, 3.3% and 5.2% for Gujarati, Tamil and Telugu, respectively.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning
    Zhao Huijuan
    Ye Ning
    Wang Ruchuan
    Journal of Signal Processing Systems, 2021, 93 : 299 - 308
  • [22] Attention-based LSTM with Multi-task Learning for Distant Speech Recognition
    Zhang, Yu
    Zhang, Pengyuan
    Yan, Yonghong
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3857 - 3861
  • [23] Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning
    Jain, Abhinav
    Upreti, Minali
    Jyothi, Preethi
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2454 - 2458
  • [24] Adversarial Multi-task Learning of Deep Neural Networks for Robust Speech Recognition
    Shinohara, Yusuke
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2369 - 2372
  • [25] Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition
    Shao, Qijie
    Guo, Pengcheng
    Yan, Jinghao
    Hu, Pengfei
    Xie, Lei
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 459 - 470
  • [26] Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning
    Zhao, Huijuan
    Ye, Ning
    Wang, Ruchuan
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2021, 93 (2-3): : 299 - 308
  • [27] TO REVERSE THE GRADIENT OR NOT: AN EMPIRICAL COMPARISON OF ADVERSARIAL AND MULTI-TASK LEARNING IN SPEECH RECOGNITION
    Adi, Yossi
    Zeghidour, Neil
    Collobert, Ronan
    Usunier, Nicolas
    Liptchinsky, Vitaliy
    Synnaeve, Gabriel
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3742 - 3746
  • [28] Multi-Task Learning for Improved Recognition of Multiple Types of Acoustic Information
    Kim, Jae-Won
    Park, Hochong
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (10): : 1762 - 1765
  • [29] Multi-task Learning for Low-Resolution License Plate Recognition
    Goncalves, Gabriel Resende
    Diniz, Matheus Alves
    Laroca, Rayson
    Menotti, David
    Schwartz, William Robson
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS (CIARP 2019), 2019, 11896 : 251 - 261
  • [30] Multi-Task Learning of Speech Recognition and Speech Synthesis Parameters for Ultrasound-based Silent Speech Interfaces
    Toth, Laszlo
    Gosztolya, Gabor
    Grosz, Tamas
    Marko, Alexandra
    Csapo, Tamas Gabor
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3172 - 3176