Multi-Task Learning using Mismatched Transcription for Under-Resourced Speech Recognition

被引：5

作者：

Van Hai Do ^{[1
,4
]}

Chen, Nancy E. ^{[2
]}

Lim, Boon Pang ^{[2
]}

Hasegawa-Johnson, Mark ^{[1
,3
]}

机构：

[1] Viettel Grp, Hanoi, Vietnam

[2] ASTAR, Inst Infocomm Res, Singapore, Singapore

[3] Univ Illinois, Urbana, IL USA

[4] Adv Digital Sci Ctr, Singapore, Singapore

来源：

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年

关键词：

mismatched transcription; probabilistic transcription; multi-task learning; low resourced languages; FEATURES; IMPROVE; ASR;

D O I：

10.21437/Interspeech.2017-788

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

It is challenging to obtain large amounts of native (matched) labels for audio in under-resourced languages. This could be due to a lack of literate speakers of the language or a lack of universally acknowledged orthography. One solution is to increase the amount of labeled data by using mismatched transcription, which employs transcribers who do not speak the language (in place of native speakers), to transcribe what they hear as nonsense speech in their own language (e.g.. Mandarin). This paper presents a multi-task learning framework where the DNN acoustic model is simultaneously trained using both a limited amount of native (matched) transcription and a larger set of mismatched transcription. We find that by using a multi-task learning framework, we achieve improvements over monolingual baselines and previously proposed mismatched transcription adaptation techniques. In addition, we show that using alignments provided by a GMM adapted by mismatched transcription further improves acoustic modeling performance. Our experiments on Georgian data from the IARPA Babel program show the effectiveness of the proposed method.

引用

页码：734 / 738

页数：5

共 50 条

[1] Multi-task learning in under-resourced Dravidian languages
Adeep Hande
Siddhanth U. Hegde
Bharathi Raja Chakravarthi
Journal of Data, Information and Management, 2022, 4 (2): : 137 - 165
[2] Speech recognition of under-resourced languages using mismatched transcriptions
Do, Van Hai
Chen, Nancy F.
Lim, Boon Pang
Hasegawa-Johnson, Mark
PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2016, : 112 - 115
[3] Improved ASR for Under-Resourced Languages Through Multi-Task Learning with Acoustic Landmarks
He, Di
Lim, Boon Pang
Yang, Xuesong
Hasegawa-Johnson, Mark
Chen, Deming
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2618 - 2622
[4] Multitask Learning for Phone Recognition of Underresourced Languages Using Mismatched Transcription
Van Hai Do
Chen, Nancy F.
Lim, Boon Pang
Hasegawa-Johnson, Mark A.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (03) : 501 - 514
[5] Automatic speech recognition for under-resourced languages: A survey
Besacier, Laurent
Barnard, Etienne
Karpov, Alexey
Schultz, Tanja
SPEECH COMMUNICATION, 2014, 56 : 85 - 100
[6] Speech Emotion Recognition with Multi-task Learning
Cai, Xingyu
Yuan, Jiahong
Zheng, Renjie
Huang, Liang
Church, Kenneth
INTERSPEECH 2021, 2021, : 4508 - 4512
[7] Speech Emotion Recognition using Decomposed Speech via Multi-task Learning
Hsu, Jia-Hao
Wu, Chung-Hsien
Wei, Yu-Hung
INTERSPEECH 2023, 2023, : 4553 - 4557
[8] Multi-task Learning for Speech Emotion and Emotion Intensity Recognition
Yue, Pengcheng
Qu, Leyuan
Zheng, Shukai
Li, Taihao
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1232 - 1237
[9] Speech Emotion Recognition in the Wild using Multi-task and Adversarial Learning
Parry, Jack
DeMattos, Eric
Klementiev, Anita
Ind, Axel
Morse-Kopp, Daniela
Clarke, Georgia
Palaz, Dimitri
INTERSPEECH 2022, 2022, : 1158 - 1162
[10] Towards multi-task learning of speech and speaker recognition
Vaessen, Nik
van Leeuwen, David A.
INTERSPEECH 2023, 2023, : 4898 - 4902

← 1 2 3 4 5 →