Data-pooling and multi-task learning for enhanced performance of speech recognition systems in multiple low resourced languages

被引:1
|
作者
Madhavaraj, A. [1 ]
Ramakrishnan, A. G. [1 ]
机构
[1] Indian Inst Sci, MILE Lab, Elect Engn, Bangalore 560012, Karnataka, India
关键词
Multi-task learning; data-pooling; deep neural networks; phone mapping; alignments; senone posteriors; cross-lingual training; multilingual training; parameter sharing; speech recognition; Gujarati; Tamil; Telugu;
D O I
10.1109/ncc.2019.8732237
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
We present two approaches to improve the performance of automatic speech recognition (ASR) systems for Gujarati, Tamil and Telugu. In the first approach using data-pooling with phone mapping (DP-PM), a deep neural network (DNN) is trained to predict the senones for the target language; then we use the feature vectors and their alignments from other source languages to map the phones from the source to the target language. The lexicons of the source languages are then modified using this phone mapping and an ASR system for the target language is trained using both the target and the modified source data. This DPPM approach gives relative improvements in word error rates (WER) of 5.1% for Gujarati, 3.1% for Tamil and 3.4% for Telugu, over the corresponding baseline figures. In the second approach using multi-task DNN (MT-DNN) modeling, we use feature vectors from all the languages and train a DNN with three output layers, each predicting the senones of one of the languages. Objective functions of the output layers are modified such that during training, only those DNN layers responsible for predicting the senones of a language are updated, if the feature vector belongs to that language. This MT-DNN approach achieves relative improvements in WER of 5.7%, 3.3% and 5.2% for Gujarati, Tamil and Telugu, respectively.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
    Fu, Hongliang
    Zhuang, Zhihao
    Wang, Yang
    Huang, Chen
    Duan, Wenzhuo
    ENTROPY, 2023, 25 (01)
  • [42] Towards Speech Emotion Recognition "in the wild" using Aggregated Corpora and Deep Multi-Task Learning
    Kim, Jaebok
    Englebienne, Gwenn
    Truong, Khiet P.
    Evers, Vanessa
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1113 - 1117
  • [43] A GENERAL MULTI-TASK LEARNING FRAMEWORK TO LEVERAGE TEXT DATA FOR SPEECH TO TEXT TASKS
    Tang, Yun
    Pino, Juan
    Wang, Changhan
    Ma, Xutai
    Genzel, Dmitriy
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6209 - 6213
  • [44] Hear No Evil: Towards Adversarial Robustness of Automatic Speech Recognition via Multi-Task Learning
    Das, Nilaksh
    Chau, Duen Horng
    INTERSPEECH 2022, 2022, : 3839 - 3843
  • [45] Combining multi-task autoencoder with Wasserstein generative adversarial networks for improving speech recognition performance
    Kao, Chao Yuan
    Ko, Hanseok
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2019, 38 (06): : 670 - 677
  • [46] Multi-task learning of solute segregation energy across multiple alloy systems
    Yuan, Liang
    Ma, Zongyi
    Pan, Zhiliang
    COMPUTATIONAL MATERIALS SCIENCE, 2025, 253
  • [47] End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning
    Imaizumi, Ryo
    Masumura, Ryo
    Shiota, Sayaka
    Kiya, Hitoshi
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)
  • [48] Enhanced Pest Recognition Using Multi-Task Deep Learning with the Discriminative Attention Multi-Network
    Dong, Zhaojie
    Wei, Xinyu
    Wu, Yonglin
    Guo, Jiaming
    Zeng, Zhixiong
    APPLIED SCIENCES-BASEL, 2024, 14 (13):
  • [49] Boosting Low-Resource Speech Recognition in Air Traffic Communication via Pretrained Feature Aggregation and Multi-Task Learning
    Guo, Dongyue
    Zhang, Zichen
    Yang, Bo
    Zhang, Jianwei
    Lin, Yi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (09) : 3714 - 3718
  • [50] JOINT ACOUSTIC MODELING OF TRIPHONES AND TRIGRAPHEMES BY MULTI-TASK LEARNING DEEP NEURAL NETWORKS FOR LOW-RESOURCE SPEECH RECOGNITION
    Chen, Dongpeng
    Mak, Brian
    Leung, Cheung-Chi
    Sivadas, Sunil
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,