ASR for Under-Resourced Languages From Probabilistic Transcription

被引:25
|
作者
Hasegawa-Johnson, Mark A. [1 ]
Jyothi, Preethi [1 ]
McCloy, Daniel [2 ]
Mirbagheri, Majid [2 ]
di Liberto, Giovanni M. [3 ]
Das, Amit [1 ]
Ekin, Bradley [2 ]
Liu, Chunxi [4 ]
Manohar, Vimal [4 ]
Tang, Hao [5 ]
Lalor, Edmund C. [3 ]
Chen, Nancy F. [6 ]
Hager, Paul [7 ]
Kekona, Tyler [2 ]
Sloan, Rose [8 ]
Lee, Adrian K. C. [2 ]
机构
[1] Univ Illinois, Urbana, IL 61801 USA
[2] Univ Washington, Seattle, WA 98195 USA
[3] Trinity Coll Dublin, Dublin 2, Ireland
[4] Johns Hopkins Univ, Baltimore, MD 21218 USA
[5] Toyota Technol Inst Chicago, Chicago, IL 60637 USA
[6] Inst Infocomm Res, Singapore 138632, Singapore
[7] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[8] Columbia Univ, New York, NY 10027 USA
基金
美国国家科学基金会;
关键词
Automatic speech recognition; EEG; mismatched crowdsourcing; under-resourced languages; SPEECH RECOGNITION; NEURAL-NETWORK; ERROR; MODEL;
D O I
10.1109/TASLP.2016.2621659
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In many under-resourced languages it is possible to find text, and it is possible to find speech, but transcribed speech suitable for training automatic speech recognition (ASR) is unavailable. In the absence of native transcripts, this paper proposes the use of a probabilistic transcript: A probability mass function over possible phonetic transcripts of the waveform. Three sources of probabilistic transcripts are demonstrated. First, self-training is a well-established semisupervised learning technique, in which a cross-lingual ASR first labels unlabeled speech, and is then adapted using the same labels. Second, mismatched crowdsourcing is a recent technique in which nonspeakers of the language are asked to write what they hear, and their nonsense transcripts are decoded using noisy channel models of second-language speech perception. Third, EEG distribution coding is a new technique in which nonspeakers of the language listen to it, and their electrocortical response signals are interpreted to indicate probabilities. ASR was trained in four languages without native transcripts. Adaptation using mismatched crowdsourcing significantly outperformed self-training, and both significantly outperformed a cross-lingual baseline. Both EEG distribution coding and text-derived phone language models were shown to improve the quality of probabilistic transcripts derived from mismatched crowdsourcing.
引用
收藏
页码:50 / 63
页数:14
相关论文
共 50 条
  • [31] Multi-task learning in under-resourced Dravidian languages
    Adeep Hande
    Siddhanth U. Hegde
    Bharathi Raja Chakravarthi
    Journal of Data, Information and Management, 2022, 4 (2): : 137 - 165
  • [32] A Phone Mapping Technique for Acoustic Modeling of Under-resourced Languages
    Van Hai Do
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 233 - 236
  • [33] Text Spotting In Large Speech Databases For Under-Resourced Languages
    Buzo, Andi
    Cucu, Horia
    Burileanu, Corneliu
    2013 7TH CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN - COMPUTER DIALOGUE (SPED), 2013,
  • [34] A Statistical Method for Translating Chinese into Under-resourced Minority Languages
    Chen, Lei
    Li, Miao
    Zhang, Jian
    Zhu, Zede
    Yang, Zhenxin
    MACHINE TRANSLATION, CWMT 2014, 2014, 493 : 49 - 60
  • [35] Automating the Creation of Speech Recognition Systems for Under-Resourced Languages
    Khusainov, Aidar
    Suleymanov, Dzhavdet
    2015 FOURTEENTH MEXICAN INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (MICAI), 2015, : 28 - 32
  • [36] Crawl and crowd to bring machine translation to under-resourced languages
    Antonio Toral
    Miquel Esplá-Gomis
    Filip Klubička
    Nikola Ljubešić
    Vassilis Papavassiliou
    Prokopis Prokopidis
    Raphael Rubino
    Andy Way
    Language Resources and Evaluation, 2017, 51 : 1019 - 1051
  • [37] Network-Enabled Keyword Extraction for Under-Resourced Languages
    Beliga, Slobodan
    Martincic-Ipsic, Sanda
    SEMANTIC KEYWORD-BASED SEARCH ON STRUCTURED DATA SOURCES, IKC 2016, 2017, 10151 : 124 - 135
  • [38] Crawl and crowd to bring machine translation to under-resourced languages
    Toral, Antonio
    Espla-Gomis, Miquel
    Klubicka, Filip
    Ljubesic, Nikola
    Papavassiliou, Vassilis
    Prokopidis, Prokopis
    Rubino, Raphael
    Way, Andy
    LANGUAGE RESOURCES AND EVALUATION, 2017, 51 (04) : 1019 - 1051
  • [39] Mismatched Crowdsourcing based Language Perception for Under-resourced Languages
    Chen, Wenda
    Hasegawa-Johnson, Mark
    Chen, Nancy F.
    SLTU-2016 5TH WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGIES FOR UNDER-RESOURCED LANGUAGES, 2016, 81 : 23 - 29
  • [40] Towards Learning Morphology for Under-Resourced Fusional and Agglutinating Languages
    Shalonova, Ksenia
    Golenia, Bruno
    Flach, Peter
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (05): : 956 - 965