Multi-Task Learning using Mismatched Transcription for Under-Resourced Speech Recognition

被引:5
|
作者
Van Hai Do [1 ,4 ]
Chen, Nancy E. [2 ]
Lim, Boon Pang [2 ]
Hasegawa-Johnson, Mark [1 ,3 ]
机构
[1] Viettel Grp, Hanoi, Vietnam
[2] ASTAR, Inst Infocomm Res, Singapore, Singapore
[3] Univ Illinois, Urbana, IL USA
[4] Adv Digital Sci Ctr, Singapore, Singapore
来源
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年
关键词
mismatched transcription; probabilistic transcription; multi-task learning; low resourced languages; FEATURES; IMPROVE; ASR;
D O I
10.21437/Interspeech.2017-788
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is challenging to obtain large amounts of native (matched) labels for audio in under-resourced languages. This could be due to a lack of literate speakers of the language or a lack of universally acknowledged orthography. One solution is to increase the amount of labeled data by using mismatched transcription, which employs transcribers who do not speak the language (in place of native speakers), to transcribe what they hear as nonsense speech in their own language (e.g.. Mandarin). This paper presents a multi-task learning framework where the DNN acoustic model is simultaneously trained using both a limited amount of native (matched) transcription and a larger set of mismatched transcription. We find that by using a multi-task learning framework, we achieve improvements over monolingual baselines and previously proposed mismatched transcription adaptation techniques. In addition, we show that using alignments provided by a GMM adapted by mismatched transcription further improves acoustic modeling performance. Our experiments on Georgian data from the IARPA Babel program show the effectiveness of the proposed method.
引用
收藏
页码:734 / 738
页数:5
相关论文
共 50 条
  • [41] Enhanced Pest Recognition Using Multi-Task Deep Learning with the Discriminative Attention Multi-Network
    Dong, Zhaojie
    Wei, Xinyu
    Wu, Yonglin
    Guo, Jiaming
    Zeng, Zhixiong
    APPLIED SCIENCES-BASEL, 2024, 14 (13):
  • [42] Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
    Fu, Hongliang
    Zhuang, Zhihao
    Wang, Yang
    Huang, Chen
    Duan, Wenzhuo
    ENTROPY, 2023, 25 (01)
  • [43] Multi-task learning for face ethnicity and gender recognition
    Yu, Chanjuan
    Fang, Yuchun
    Li, Yang
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8833 : 136 - 144
  • [44] Finger Vein Recognition Based on Multi-Task Learning
    Hao, Zhiang
    Fang, Peiyu
    Yang, Hanwen
    2020 5TH INTERNATIONAL CONFERENCE ON MATHEMATICS AND ARTIFICIAL INTELLIGENCE (ICMAI 2020), 2020, : 133 - 140
  • [45] IMPROVING SAR TARGET RECOGNITION WITH MULTI-TASK LEARNING
    Du, Wenrui
    Zhang, Fan
    Ma, Fei
    Yin, Qiang
    Zhou, Yongsheng
    IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 284 - 287
  • [46] Multi-task gradient descent for multi-task learning
    Lu Bai
    Yew-Soon Ong
    Tiantian He
    Abhishek Gupta
    Memetic Computing, 2020, 12 : 355 - 369
  • [47] Multi-Task Learning for Improved Recognition of Multiple Types of Acoustic Information
    Kim, Jae-Won
    Park, Hochong
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (10): : 1762 - 1765
  • [48] Multi-task coordinate attention gating network for speech emotion recognition under noisy circumstances
    Sun, Linhui
    Lei, Yunlong
    Zhang, Zixiao
    Tang, Yi
    Wang, Jing
    Ye, Lei
    Li, Pingan
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 107
  • [49] Multi-Task Learning for Voice Related Recognition Tasks
    Montalvo, Ana
    Calvo, Jose R.
    Bonastre, Jean-Francois
    INTERSPEECH 2020, 2020, : 2997 - 3001
  • [50] Multi-Task Learning for Face Ethnicity and Gender Recognition
    Yu, Chanjuan
    Fang, Yuchun
    Li, Yang
    BIOMETRIC RECOGNITION (CCBR 2014), 2014, 8833 : 136 - 144