Multi-Task Learning using Mismatched Transcription for Under-Resourced Speech Recognition

被引：5

作者：

Van Hai Do ^{[1
,4
]}

Chen, Nancy E. ^{[2
]}

Lim, Boon Pang ^{[2
]}

Hasegawa-Johnson, Mark ^{[1
,3
]}

机构：

[1] Viettel Grp, Hanoi, Vietnam

[2] ASTAR, Inst Infocomm Res, Singapore, Singapore

[3] Univ Illinois, Urbana, IL USA

[4] Adv Digital Sci Ctr, Singapore, Singapore

来源：

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年

关键词：

mismatched transcription; probabilistic transcription; multi-task learning; low resourced languages; FEATURES; IMPROVE; ASR;

D O I：

10.21437/Interspeech.2017-788

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

It is challenging to obtain large amounts of native (matched) labels for audio in under-resourced languages. This could be due to a lack of literate speakers of the language or a lack of universally acknowledged orthography. One solution is to increase the amount of labeled data by using mismatched transcription, which employs transcribers who do not speak the language (in place of native speakers), to transcribe what they hear as nonsense speech in their own language (e.g.. Mandarin). This paper presents a multi-task learning framework where the DNN acoustic model is simultaneously trained using both a limited amount of native (matched) transcription and a larger set of mismatched transcription. We find that by using a multi-task learning framework, we achieve improvements over monolingual baselines and previously proposed mismatched transcription adaptation techniques. In addition, we show that using alignments provided by a GMM adapted by mismatched transcription further improves acoustic modeling performance. Our experiments on Georgian data from the IARPA Babel program show the effectiveness of the proposed method.

引用

页码：734 / 738

页数：5

共 50 条

[41] Enhanced Pest Recognition Using Multi-Task Deep Learning with the Discriminative Attention Multi-Network
Dong, Zhaojie
Wei, Xinyu
Wu, Yonglin
Guo, Jiaming
Zeng, Zhixiong
APPLIED SCIENCES-BASEL, 2024, 14 (13):
[42] Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
Fu, Hongliang
Zhuang, Zhihao
Wang, Yang
Huang, Chen
Duan, Wenzhuo
ENTROPY, 2023, 25 (01)
[43] Multi-task learning for face ethnicity and gender recognition
Yu, Chanjuan
Fang, Yuchun
Li, Yang
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8833 : 136 - 144
[44] Finger Vein Recognition Based on Multi-Task Learning
Hao, Zhiang
Fang, Peiyu
Yang, Hanwen
2020 5TH INTERNATIONAL CONFERENCE ON MATHEMATICS AND ARTIFICIAL INTELLIGENCE (ICMAI 2020), 2020, : 133 - 140
[45] IMPROVING SAR TARGET RECOGNITION WITH MULTI-TASK LEARNING
Du, Wenrui
Zhang, Fan
Ma, Fei
Yin, Qiang
Zhou, Yongsheng
IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 284 - 287
[46] Multi-task gradient descent for multi-task learning
Lu Bai
Yew-Soon Ong
Tiantian He
Abhishek Gupta
Memetic Computing, 2020, 12 : 355 - 369
[47] Multi-Task Learning for Improved Recognition of Multiple Types of Acoustic Information
Kim, Jae-Won
Park, Hochong
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (10): : 1762 - 1765
[48] Multi-task coordinate attention gating network for speech emotion recognition under noisy circumstances
Sun, Linhui
Lei, Yunlong
Zhang, Zixiao
Tang, Yi
Wang, Jing
Ye, Lei
Li, Pingan
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 107
[49] Multi-Task Learning for Voice Related Recognition Tasks
Montalvo, Ana
Calvo, Jose R.
Bonastre, Jean-Francois
INTERSPEECH 2020, 2020, : 2997 - 3001
[50] Multi-Task Learning for Face Ethnicity and Gender Recognition
Yu, Chanjuan
Fang, Yuchun
Li, Yang
BIOMETRIC RECOGNITION (CCBR 2014), 2014, 8833 : 136 - 144

← 1 2 3 4 5 →