Multi-Task Learning using Mismatched Transcription for Under-Resourced Speech Recognition

被引:5
|
作者
Van Hai Do [1 ,4 ]
Chen, Nancy E. [2 ]
Lim, Boon Pang [2 ]
Hasegawa-Johnson, Mark [1 ,3 ]
机构
[1] Viettel Grp, Hanoi, Vietnam
[2] ASTAR, Inst Infocomm Res, Singapore, Singapore
[3] Univ Illinois, Urbana, IL USA
[4] Adv Digital Sci Ctr, Singapore, Singapore
关键词
mismatched transcription; probabilistic transcription; multi-task learning; low resourced languages; FEATURES; IMPROVE; ASR;
D O I
10.21437/Interspeech.2017-788
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is challenging to obtain large amounts of native (matched) labels for audio in under-resourced languages. This could be due to a lack of literate speakers of the language or a lack of universally acknowledged orthography. One solution is to increase the amount of labeled data by using mismatched transcription, which employs transcribers who do not speak the language (in place of native speakers), to transcribe what they hear as nonsense speech in their own language (e.g.. Mandarin). This paper presents a multi-task learning framework where the DNN acoustic model is simultaneously trained using both a limited amount of native (matched) transcription and a larger set of mismatched transcription. We find that by using a multi-task learning framework, we achieve improvements over monolingual baselines and previously proposed mismatched transcription adaptation techniques. In addition, we show that using alignments provided by a GMM adapted by mismatched transcription further improves acoustic modeling performance. Our experiments on Georgian data from the IARPA Babel program show the effectiveness of the proposed method.
引用
收藏
页码:734 / 738
页数:5
相关论文
共 50 条
  • [41] Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems
    de Wet, Febe
    Kleynhans, Neil
    van Compernolle, Dirk
    Sahraeian, Reza
    SOUTH AFRICAN JOURNAL OF SCIENCE, 2017, 113 (1-2) : 25 - 33
  • [42] Language Modeling for Speech Analytics in Under-Resourced Languages
    Wills, Simone
    Uys, Pieter
    van Heerden, Charl
    Barnard, Etienne
    INTERSPEECH 2020, 2020, : 4941 - 4945
  • [43] Matrix Covariance Estimation Methods for robust Security Speech Recognition with under-resourced conditions
    Barroso, N.
    De Ipina, K. Lopez
    Hernandez, C.
    Ezeiza, A.
    2011 IEEE INTERNATIONAL CARNAHAN CONFERENCE ON SECURITY TECHNOLOGY (ICCST), 2011,
  • [44] Development of Under-Resourced Bahasa Indonesia Speech Corpus
    Cahyaningtyas, Elok
    Arifianto, Dhany
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1097 - 1101
  • [45] ASR for Under-Resourced Languages From Probabilistic Transcription
    Hasegawa-Johnson, Mark A.
    Jyothi, Preethi
    McCloy, Daniel
    Mirbagheri, Majid
    di Liberto, Giovanni M.
    Das, Amit
    Ekin, Bradley
    Liu, Chunxi
    Manohar, Vimal
    Tang, Hao
    Lalor, Edmund C.
    Chen, Nancy F.
    Hager, Paul
    Kekona, Tyler
    Sloan, Rose
    Lee, Adrian K. C.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (01) : 50 - 63
  • [46] Analysis of Mismatched Transcriptions Generated by Humans and Machines for Under-Resourced Languages
    Do, Van Hai
    Chen, Nancy F.
    Lim, Boon Pang
    Hasegawa-Johnson, Mark
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3863 - 3867
  • [47] Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning
    Zhao Huijuan
    Ye Ning
    Wang Ruchuan
    Journal of Signal Processing Systems, 2021, 93 : 299 - 308
  • [48] Attention-based LSTM with Multi-task Learning for Distant Speech Recognition
    Zhang, Yu
    Zhang, Pengyuan
    Yan, Yonghong
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3857 - 3861
  • [49] Adversarial Multi-task Learning of Deep Neural Networks for Robust Speech Recognition
    Shinohara, Yusuke
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2369 - 2372
  • [50] Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition
    Shao, Qijie
    Guo, Pengcheng
    Yan, Jinghao
    Hu, Pengfei
    Xie, Lei
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 459 - 470