ASR for Under-Resourced Languages From Probabilistic Transcription

被引:25
|
作者
Hasegawa-Johnson, Mark A. [1 ]
Jyothi, Preethi [1 ]
McCloy, Daniel [2 ]
Mirbagheri, Majid [2 ]
di Liberto, Giovanni M. [3 ]
Das, Amit [1 ]
Ekin, Bradley [2 ]
Liu, Chunxi [4 ]
Manohar, Vimal [4 ]
Tang, Hao [5 ]
Lalor, Edmund C. [3 ]
Chen, Nancy F. [6 ]
Hager, Paul [7 ]
Kekona, Tyler [2 ]
Sloan, Rose [8 ]
Lee, Adrian K. C. [2 ]
机构
[1] Univ Illinois, Urbana, IL 61801 USA
[2] Univ Washington, Seattle, WA 98195 USA
[3] Trinity Coll Dublin, Dublin 2, Ireland
[4] Johns Hopkins Univ, Baltimore, MD 21218 USA
[5] Toyota Technol Inst Chicago, Chicago, IL 60637 USA
[6] Inst Infocomm Res, Singapore 138632, Singapore
[7] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[8] Columbia Univ, New York, NY 10027 USA
基金
美国国家科学基金会;
关键词
Automatic speech recognition; EEG; mismatched crowdsourcing; under-resourced languages; SPEECH RECOGNITION; NEURAL-NETWORK; ERROR; MODEL;
D O I
10.1109/TASLP.2016.2621659
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In many under-resourced languages it is possible to find text, and it is possible to find speech, but transcribed speech suitable for training automatic speech recognition (ASR) is unavailable. In the absence of native transcripts, this paper proposes the use of a probabilistic transcript: A probability mass function over possible phonetic transcripts of the waveform. Three sources of probabilistic transcripts are demonstrated. First, self-training is a well-established semisupervised learning technique, in which a cross-lingual ASR first labels unlabeled speech, and is then adapted using the same labels. Second, mismatched crowdsourcing is a recent technique in which nonspeakers of the language are asked to write what they hear, and their nonsense transcripts are decoded using noisy channel models of second-language speech perception. Third, EEG distribution coding is a new technique in which nonspeakers of the language listen to it, and their electrocortical response signals are interpreted to indicate probabilities. ASR was trained in four languages without native transcripts. Adaptation using mismatched crowdsourcing significantly outperformed self-training, and both significantly outperformed a cross-lingual baseline. Both EEG distribution coding and text-derived phone language models were shown to improve the quality of probabilistic transcripts derived from mismatched crowdsourcing.
引用
收藏
页码:50 / 63
页数:14
相关论文
共 50 条
  • [41] Speech recognition of under-resourced languages using mismatched transcriptions
    Do, Van Hai
    Chen, Nancy F.
    Lim, Boon Pang
    Hasegawa-Johnson, Mark
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2016, : 112 - 115
  • [42] An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages
    Ustalov, Dmitry
    Teslenko, Denis
    Panchenko, Alexander
    Chernoskutov, Mikhail
    Biemann, Chris
    Ponzetto, Simone Paolo
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1018 - 1022
  • [43] Cross-lingual acoustic modeling for under-resourced languages
    Song, Meixu
    Zhang, Qingqing
    Pan, Jielin
    Yan, Yonghong
    Journal of Computational Information Systems, 2015, 11 (14): : 5039 - 5046
  • [44] An (unhelpful) guide to selecting the right ASR architecture for your under-resourced language
    Jimerson, Robbie
    Liu, Zoey
    Prud'hommeaux, Emily
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1008 - 1016
  • [45] Engineering for the Under-Resourced
    Tsai, Nancey Trevanian
    IEEE PULSE, 2023, 14 (06) : 54 - 55
  • [46] POSTGRADS UNDER-RESOURCED
    MASLEN, G
    SEARCH, 1992, 23 (06): : 192 - 192
  • [47] Using Resource-Rich Languages to Improve Morphological Analysis of Under-Resourced Languages
    Baumann, Peter
    Pierrehumbert, Janet
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3355 - 3359
  • [48] Analysis of Mismatched Transcriptions Generated by Humans and Machines for Under-Resourced Languages
    Do, Van Hai
    Chen, Nancy F.
    Lim, Boon Pang
    Hasegawa-Johnson, Mark
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3863 - 3867
  • [49] Creating language resources for under-resourced languages: methodologies, and experiments with Arabic
    El-Haj, Mahmoud
    Kruschwitz, Udo
    Fox, Chris
    LANGUAGE RESOURCES AND EVALUATION, 2015, 49 (03) : 549 - 580
  • [50] A Review on Speech Recognition for Under-Resourced Languages: A Case Study of Vietnamese
    Phung, Trung-Nghia
    Nguyen, Duc-Binh
    Pham, Ngoc-Phuong
    INTERNATIONAL JOURNAL OF KNOWLEDGE AND SYSTEMS SCIENCE, 2024, 15 (01)