Multi-Task Learning using Mismatched Transcription for Under-Resourced Speech Recognition

被引:5
|
作者
Van Hai Do [1 ,4 ]
Chen, Nancy E. [2 ]
Lim, Boon Pang [2 ]
Hasegawa-Johnson, Mark [1 ,3 ]
机构
[1] Viettel Grp, Hanoi, Vietnam
[2] ASTAR, Inst Infocomm Res, Singapore, Singapore
[3] Univ Illinois, Urbana, IL USA
[4] Adv Digital Sci Ctr, Singapore, Singapore
关键词
mismatched transcription; probabilistic transcription; multi-task learning; low resourced languages; FEATURES; IMPROVE; ASR;
D O I
10.21437/Interspeech.2017-788
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is challenging to obtain large amounts of native (matched) labels for audio in under-resourced languages. This could be due to a lack of literate speakers of the language or a lack of universally acknowledged orthography. One solution is to increase the amount of labeled data by using mismatched transcription, which employs transcribers who do not speak the language (in place of native speakers), to transcribe what they hear as nonsense speech in their own language (e.g.. Mandarin). This paper presents a multi-task learning framework where the DNN acoustic model is simultaneously trained using both a limited amount of native (matched) transcription and a larger set of mismatched transcription. We find that by using a multi-task learning framework, we achieve improvements over monolingual baselines and previously proposed mismatched transcription adaptation techniques. In addition, we show that using alignments provided by a GMM adapted by mismatched transcription further improves acoustic modeling performance. Our experiments on Georgian data from the IARPA Babel program show the effectiveness of the proposed method.
引用
收藏
页码:734 / 738
页数:5
相关论文
共 50 条
  • [31] Design of multi-feature class models for Speech Recognition Security Systems with under-resourced languages
    Barroso, N.
    de Ipina, K. Lopez
    Hernandez, C.
    Ezeiza, A.
    2011 IEEE INTERNATIONAL CARNAHAN CONFERENCE ON SECURITY TECHNOLOGY (ICCST), 2011,
  • [32] Sentence boundary detection without speech recognition: A case of an under-resourced language
    Jamil, Nursuriati
    Ramli, Muhammad Izzad
    Seman, Noraini
    JOURNAL OF ELECTRICAL SYSTEMS, 2015, 11 (03) : 308 - 318
  • [33] Influences of Age in Emotion Recognition of Spontaneous Speech A Case of an Under-Resourced Language
    Jamil, Nursuriati
    Apandi, Farihah
    Hamzah, Raseeda
    2017 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2017,
  • [34] MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION
    Ravanelli, Mirco
    Zhong, Jianyuan
    Pascual, Santiago
    Swietojanski, Pawel
    Monteiro, Joao
    Trmal, Jan
    Bengio, Yoshua
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6989 - 6993
  • [35] Speaker-Aware Multi-Task Learning for Automatic Speech Recognition
    Pironkov, Gueorgui
    Dupont, Stephane
    Dutoit, Thierry
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2900 - 2905
  • [36] Mismatched Crowdsourcing based Language Perception for Under-resourced Languages
    Chen, Wenda
    Hasegawa-Johnson, Mark
    Chen, Nancy F.
    SLTU-2016 5TH WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGIES FOR UNDER-RESOURCED LANGUAGES, 2016, 81 : 23 - 29
  • [37] HANDWRITTEN NUMERAL RECOGNITION USING MULTI-TASK LEARNING
    Hou, Jinhui
    Zeng, Huanqiang
    Cai, Lei
    Zhu, Jianqing
    Cao, Jiuwen
    Hou, Junhui
    2017 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ISPACS 2017), 2017, : 155 - 158
  • [38] Investigating the Impact of the Training Data Volume for Robust Speech Recognition using Multi-Task Learning
    Pironkov, Gueorgui
    Dupont, Stephane
    Dutoit, Thierry
    2017 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2017, : 382 - 387
  • [39] Towards Speech Emotion Recognition "in the wild" using Aggregated Corpora and Deep Multi-Task Learning
    Kim, Jaebok
    Englebienne, Gwenn
    Truong, Khiet P.
    Evers, Vanessa
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1113 - 1117
  • [40] GRAPHEME AND MULTILINGUAL POSTERIOR FEATURES FOR UNDER-RESOURCED SPEECH RECOGNITION: A STUDY ON SCOTTISH GAELIC
    Rasipuram, Ramya
    Bell, Peter
    Magimai-Doss, Mathew
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7334 - 7338