Multi-Task Learning using Mismatched Transcription for Under-Resourced Speech Recognition

被引：5

作者：

Van Hai Do ^{[1
,4
]}

Chen, Nancy E. ^{[2
]}

Lim, Boon Pang ^{[2
]}

Hasegawa-Johnson, Mark ^{[1
,3
]}

机构：

[1] Viettel Grp, Hanoi, Vietnam

[2] ASTAR, Inst Infocomm Res, Singapore, Singapore

[3] Univ Illinois, Urbana, IL USA

[4] Adv Digital Sci Ctr, Singapore, Singapore

来源：

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年

关键词：

mismatched transcription; probabilistic transcription; multi-task learning; low resourced languages; FEATURES; IMPROVE; ASR;

D O I：

10.21437/Interspeech.2017-788

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

It is challenging to obtain large amounts of native (matched) labels for audio in under-resourced languages. This could be due to a lack of literate speakers of the language or a lack of universally acknowledged orthography. One solution is to increase the amount of labeled data by using mismatched transcription, which employs transcribers who do not speak the language (in place of native speakers), to transcribe what they hear as nonsense speech in their own language (e.g.. Mandarin). This paper presents a multi-task learning framework where the DNN acoustic model is simultaneously trained using both a limited amount of native (matched) transcription and a larger set of mismatched transcription. We find that by using a multi-task learning framework, we achieve improvements over monolingual baselines and previously proposed mismatched transcription adaptation techniques. In addition, we show that using alignments provided by a GMM adapted by mismatched transcription further improves acoustic modeling performance. Our experiments on Georgian data from the IARPA Babel program show the effectiveness of the proposed method.

引用

页码：734 / 738

页数：5

共 50 条

[21] Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning
Zhao Huijuan
Ye Ning
Wang Ruchuan
Journal of Signal Processing Systems, 2021, 93 : 299 - 308
[22] Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning
Zhao, Huijuan
Ye, Ning
Wang, Ruchuan
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2021, 93 (2-3): : 299 - 308
[23] TO REVERSE THE GRADIENT OR NOT: AN EMPIRICAL COMPARISON OF ADVERSARIAL AND MULTI-TASK LEARNING IN SPEECH RECOGNITION
Adi, Yossi
Zeghidour, Neil
Collobert, Ronan
Usunier, Nicolas
Liptchinsky, Vitaliy
Synnaeve, Gabriel
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3742 - 3746
[24] Recognition of Latin American Spanish using Multi-task Learning
Mendes, Carlos
Abad, Alberto
Neto, Joao Paulo
Trancoso, Isabel
INTERSPEECH 2019, 2019, : 2135 - 2139
[25] BaDumTss: Multi-task Learning for Beatbox Transcription
Mehta, Priya
Maheshwari, Meet
Joshi, Brihi
Chakraborty, Tanmoy
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT III, 2022, 13282 : 173 - 186
[26] Speech based suicide risk recognition for crisis intervention hotlines using explainable multi-task learning
Ding, Zhong
Zhou, Yang
Dai, An-Jie
Qian, Chen
Zhong, Bao-Liang
Liu, Chen-Ling
Liu, Zhen-Tao
JOURNAL OF AFFECTIVE DISORDERS, 2025, 370 : 392 - 400
[27] Feature-Enhanced Multi-Task Learning for Speech Emotion Recognition Using Decision Trees and LSTM
Wang, Chun
Shen, Xizhong
ELECTRONICS, 2024, 13 (14)
[28] A Primary task driven adaptive loss function for multi-task speech emotion recognition
Liu, Lu-Yao
Liu, Wen-Zhe
Feng, Lin
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 127
[29] TASK AWARE MULTI-TASK LEARNING FOR SPEECH TO TEXT TASKS
Indurthi, Sathish
Zaidi, Mohd Abbas
Lakumarapu, Nikhil Kumar
Lee, Beomseok
Han, Hyojung
Ahn, Seokchan
Kim, Sangha
Kim, Chanwoo
Hwang, Inchul
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7723 - 7727
[30] Multimodal Sentiment Recognition With Multi-Task Learning
Zhang, Sun
Yin, Chunyong
Yin, Zhichao
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2023, 7 (01): : 200 - 209

← 1 2 3 4 5 →