ASR for Under-Resourced Languages From Probabilistic Transcription

被引:25
|
作者
Hasegawa-Johnson, Mark A. [1 ]
Jyothi, Preethi [1 ]
McCloy, Daniel [2 ]
Mirbagheri, Majid [2 ]
di Liberto, Giovanni M. [3 ]
Das, Amit [1 ]
Ekin, Bradley [2 ]
Liu, Chunxi [4 ]
Manohar, Vimal [4 ]
Tang, Hao [5 ]
Lalor, Edmund C. [3 ]
Chen, Nancy F. [6 ]
Hager, Paul [7 ]
Kekona, Tyler [2 ]
Sloan, Rose [8 ]
Lee, Adrian K. C. [2 ]
机构
[1] Univ Illinois, Urbana, IL 61801 USA
[2] Univ Washington, Seattle, WA 98195 USA
[3] Trinity Coll Dublin, Dublin 2, Ireland
[4] Johns Hopkins Univ, Baltimore, MD 21218 USA
[5] Toyota Technol Inst Chicago, Chicago, IL 60637 USA
[6] Inst Infocomm Res, Singapore 138632, Singapore
[7] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[8] Columbia Univ, New York, NY 10027 USA
基金
美国国家科学基金会;
关键词
Automatic speech recognition; EEG; mismatched crowdsourcing; under-resourced languages; SPEECH RECOGNITION; NEURAL-NETWORK; ERROR; MODEL;
D O I
10.1109/TASLP.2016.2621659
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In many under-resourced languages it is possible to find text, and it is possible to find speech, but transcribed speech suitable for training automatic speech recognition (ASR) is unavailable. In the absence of native transcripts, this paper proposes the use of a probabilistic transcript: A probability mass function over possible phonetic transcripts of the waveform. Three sources of probabilistic transcripts are demonstrated. First, self-training is a well-established semisupervised learning technique, in which a cross-lingual ASR first labels unlabeled speech, and is then adapted using the same labels. Second, mismatched crowdsourcing is a recent technique in which nonspeakers of the language are asked to write what they hear, and their nonsense transcripts are decoded using noisy channel models of second-language speech perception. Third, EEG distribution coding is a new technique in which nonspeakers of the language listen to it, and their electrocortical response signals are interpreted to indicate probabilities. ASR was trained in four languages without native transcripts. Adaptation using mismatched crowdsourcing significantly outperformed self-training, and both significantly outperformed a cross-lingual baseline. Both EEG distribution coding and text-derived phone language models were shown to improve the quality of probabilistic transcripts derived from mismatched crowdsourcing.
引用
收藏
页码:50 / 63
页数:14
相关论文
共 50 条
  • [21] Evaluation of Dictionary Creating Methods for Under-Resourced Languages
    Simon, Eszter
    Mittelholcz, Ivan
    TEXT, SPEECH, AND DIALOGUE, TSD 2017, 2017, 10415 : 246 - 254
  • [22] Introduction to the special issue on processing under-resourced languages
    Besacier, Laurent
    Barnard, Etienne
    Karpov, Alexey
    Schultz, Tanja
    SPEECH COMMUNICATION, 2014, 56 : 83 - 84
  • [23] Language Identification for Under-Resourced Languages in the Basque Context
    Barroso, Nora
    de Ipina, Karmele Lopez
    Grana, Manuel
    Ezeiza, Aitzol
    SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS, 6TH INTERNATIONAL CONFERENCE SOCO 2011, 2011, 87 : 475 - 483
  • [24] Automatic speech recognition for under-resourced languages: A survey
    Besacier, Laurent
    Barnard, Etienne
    Karpov, Alexey
    Schultz, Tanja
    SPEECH COMMUNICATION, 2014, 56 : 85 - 100
  • [25] Phonetic alignment for speech synthesis in under-resourced languages
    van Niekerk, D. R.
    Barnard, E.
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 856 - +
  • [26] Automatic sub-word unit discovery and pronunciation lexicon induction for ASR with application to under-resourced languages
    Agenbag, Wiehan
    Niesler, Thomas
    COMPUTER SPEECH AND LANGUAGE, 2019, 57 : 20 - 40
  • [27] Semi-supervised Development of ASR Systems for Multilingual Code-switched Speech in Under-resourced Languages
    Biswas, Astik
    Yilmaz, Emre
    de Wet, Febe
    Van der Westhuizen, Ewald
    Niesler, Thomas
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3468 - 3474
  • [28] Linguistic Linked Open Data and Under-Resourced Languages: From Collection to Application
    Moran, Steven
    Chiarcos, Christian
    DEVELOPMENT OF LINGUISTIC LINKED OPEN DATA RESOURCES FOR COLLABORATIVE DATA-INTENSIVE RESEARCH IN THE LANGUAGE SCIENCES, 2019, : 39 - 68
  • [29] Cross-Lingual Link Discovery for Under-Resourced Languages
    Rosner, Michael
    Ahmadi, Sina
    Apostol, Elena-Simona
    Bosque-Gil, Julia
    Chiarcos, Christian
    Dojchinovski, Milan
    Gkirtzou, Katerina
    Gracia, Jorge
    Gromann, Dagmar
    Liebeskind, Chaya
    Oleskeviene, Giedre Valunaite
    Serasset, Gilles
    Truica, Ciprian-Octavian
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 181 - 192
  • [30] WordNet construction for under-resourced languages using personalized PageRank
    Berangi, Parisa
    Mousavi, Zahra
    Faili, Heshaam
    Shakery, Azadeh
    DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2021, 36 (03) : 565 - 580