Multi-Task Learning using Mismatched Transcription for Under-Resourced Speech Recognition

被引:5
|
作者
Van Hai Do [1 ,4 ]
Chen, Nancy E. [2 ]
Lim, Boon Pang [2 ]
Hasegawa-Johnson, Mark [1 ,3 ]
机构
[1] Viettel Grp, Hanoi, Vietnam
[2] ASTAR, Inst Infocomm Res, Singapore, Singapore
[3] Univ Illinois, Urbana, IL USA
[4] Adv Digital Sci Ctr, Singapore, Singapore
关键词
mismatched transcription; probabilistic transcription; multi-task learning; low resourced languages; FEATURES; IMPROVE; ASR;
D O I
10.21437/Interspeech.2017-788
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is challenging to obtain large amounts of native (matched) labels for audio in under-resourced languages. This could be due to a lack of literate speakers of the language or a lack of universally acknowledged orthography. One solution is to increase the amount of labeled data by using mismatched transcription, which employs transcribers who do not speak the language (in place of native speakers), to transcribe what they hear as nonsense speech in their own language (e.g.. Mandarin). This paper presents a multi-task learning framework where the DNN acoustic model is simultaneously trained using both a limited amount of native (matched) transcription and a larger set of mismatched transcription. We find that by using a multi-task learning framework, we achieve improvements over monolingual baselines and previously proposed mismatched transcription adaptation techniques. In addition, we show that using alignments provided by a GMM adapted by mismatched transcription further improves acoustic modeling performance. Our experiments on Georgian data from the IARPA Babel program show the effectiveness of the proposed method.
引用
收藏
页码:734 / 738
页数:5
相关论文
共 50 条
  • [1] Multi-task learning in under-resourced Dravidian languages
    Adeep Hande
    Siddhanth U. Hegde
    Bharathi Raja Chakravarthi
    Journal of Data, Information and Management, 2022, 4 (2): : 137 - 165
  • [2] Speech recognition of under-resourced languages using mismatched transcriptions
    Do, Van Hai
    Chen, Nancy F.
    Lim, Boon Pang
    Hasegawa-Johnson, Mark
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2016, : 112 - 115
  • [3] Improved ASR for Under-Resourced Languages Through Multi-Task Learning with Acoustic Landmarks
    He, Di
    Lim, Boon Pang
    Yang, Xuesong
    Hasegawa-Johnson, Mark
    Chen, Deming
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2618 - 2622
  • [4] Modeling under-resourced languages for speech recognition
    Kurimo, Mikko
    Enarvi, Seppo
    Tilk, Ottokar
    Varjokallio, Matti
    Mansikkaniemi, Andre
    Alumae, Tanel
    LANGUAGE RESOURCES AND EVALUATION, 2017, 51 (04) : 961 - 987
  • [5] Modeling under-resourced languages for speech recognition
    Mikko Kurimo
    Seppo Enarvi
    Ottokar Tilk
    Matti Varjokallio
    André Mansikkaniemi
    Tanel Alumäe
    Language Resources and Evaluation, 2017, 51 : 961 - 987
  • [6] Automatic Speech Recognition for an Under-Resourced Language - Amharic
    Abate, Solomon Teferra
    Menzel, Wolfgang
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 973 - 976
  • [7] Automatic speech recognition for under-resourced languages: A survey
    Besacier, Laurent
    Barnard, Etienne
    Karpov, Alexey
    Schultz, Tanja
    SPEECH COMMUNICATION, 2014, 56 : 85 - 100
  • [8] Automatic Speech Recognition for an Under-Resourced Language - Amharic
    Abate, Solomon Teferra
    Menzel, Wolfgang
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1737 - 1740
  • [9] Speech Emotion Recognition with Multi-task Learning
    Cai, Xingyu
    Yuan, Jiahong
    Zheng, Renjie
    Huang, Liang
    Church, Kenneth
    INTERSPEECH 2021, 2021, : 4508 - 4512
  • [10] Speech Emotion Recognition using Decomposed Speech via Multi-task Learning
    Hsu, Jia-Hao
    Wu, Chung-Hsien
    Wei, Yu-Hung
    INTERSPEECH 2023, 2023, : 4553 - 4557