ASR for Under-Resourced Languages From Probabilistic Transcription

被引:25
|
作者
Hasegawa-Johnson, Mark A. [1 ]
Jyothi, Preethi [1 ]
McCloy, Daniel [2 ]
Mirbagheri, Majid [2 ]
di Liberto, Giovanni M. [3 ]
Das, Amit [1 ]
Ekin, Bradley [2 ]
Liu, Chunxi [4 ]
Manohar, Vimal [4 ]
Tang, Hao [5 ]
Lalor, Edmund C. [3 ]
Chen, Nancy F. [6 ]
Hager, Paul [7 ]
Kekona, Tyler [2 ]
Sloan, Rose [8 ]
Lee, Adrian K. C. [2 ]
机构
[1] Univ Illinois, Urbana, IL 61801 USA
[2] Univ Washington, Seattle, WA 98195 USA
[3] Trinity Coll Dublin, Dublin 2, Ireland
[4] Johns Hopkins Univ, Baltimore, MD 21218 USA
[5] Toyota Technol Inst Chicago, Chicago, IL 60637 USA
[6] Inst Infocomm Res, Singapore 138632, Singapore
[7] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[8] Columbia Univ, New York, NY 10027 USA
基金
美国国家科学基金会;
关键词
Automatic speech recognition; EEG; mismatched crowdsourcing; under-resourced languages; SPEECH RECOGNITION; NEURAL-NETWORK; ERROR; MODEL;
D O I
10.1109/TASLP.2016.2621659
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In many under-resourced languages it is possible to find text, and it is possible to find speech, but transcribed speech suitable for training automatic speech recognition (ASR) is unavailable. In the absence of native transcripts, this paper proposes the use of a probabilistic transcript: A probability mass function over possible phonetic transcripts of the waveform. Three sources of probabilistic transcripts are demonstrated. First, self-training is a well-established semisupervised learning technique, in which a cross-lingual ASR first labels unlabeled speech, and is then adapted using the same labels. Second, mismatched crowdsourcing is a recent technique in which nonspeakers of the language are asked to write what they hear, and their nonsense transcripts are decoded using noisy channel models of second-language speech perception. Third, EEG distribution coding is a new technique in which nonspeakers of the language listen to it, and their electrocortical response signals are interpreted to indicate probabilities. ASR was trained in four languages without native transcripts. Adaptation using mismatched crowdsourcing significantly outperformed self-training, and both significantly outperformed a cross-lingual baseline. Both EEG distribution coding and text-derived phone language models were shown to improve the quality of probabilistic transcripts derived from mismatched crowdsourcing.
引用
收藏
页码:50 / 63
页数:14
相关论文
共 50 条
  • [1] ASR and translation for under-resourced languages
    Besacier, L.
    Le, V-B.
    Boitet, C.
    Berment, V.
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 6079 - 6082
  • [2] IMPROVING HMM/DNN IN ASR OF UNDER-RESOURCED LANGUAGES USING PROBABILISTIC SAMPLING
    Song, Meixu
    Zhang, Qingqing
    Pan, Jielin
    Yan, Yonghong
    2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, : 20 - 24
  • [3] ASR for Documenting Acutely Under-Resourced Indigenous Languages
    Jimerson, Robbie
    Prud'hommeaux, Emily
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 4161 - 4166
  • [4] ADAPTING ASR FOR UNDER-RESOURCED LANGUAGES USING MISMATCHED TRANSCRIPTIONS
    Liu, Chunxi
    Jyothi, Preethi
    Tang, Hao
    Manohar, Vimal
    Sloan, Rose
    Kekona, Tyler
    Hasegawa-Johnson, Mark
    Khudanpur, Sanjeev
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5840 - 5844
  • [5] YAST : A scalable ASR toolkit especially designed for under-resourced languages
    Ferreira, Emmanuel
    Nocera, Pascal
    Goudi, Maria
    Ngoc Diep Do Thi
    2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 141 - 144
  • [6] The limitations of data perturbation for ASR of learner data in under-resourced languages
    Badenhorst, Jaco
    de Wet, Febe
    2017 PATTERN RECOGNITION ASSOCIATION OF SOUTH AFRICA AND ROBOTICS AND MECHATRONICS (PRASA-ROBMECH), 2017, : 44 - 49
  • [7] Eigentrigraphemes for under-resourced languages
    Ko, Tom
    Mak, Brian
    SPEECH COMMUNICATION, 2014, 56 : 132 - 141
  • [8] A smartphone-based ASR data collection tool for under-resourced languages
    de Vries, Nic J.
    Davel, Marelie H.
    Badenhorst, Jaco
    Basson, Willem D.
    de Wet, Febe
    Barnard, Etienne
    de Waal, Alta
    SPEECH COMMUNICATION, 2014, 56 : 119 - 131
  • [9] The LREMap for Under-Resourced Languages
    Del Gratta, Riccardo
    Frontini, Francesca
    Khan, Anas Fahad
    Mariani, Joseph
    Soria, Claudia
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [10] Rapid building of an ASR system for Under-Resourced Languages based on Multilingual Unsupervised Training
    Ngoc Thang Vu
    Kraus, Franziska
    Schultz, Tanja
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 3152 - +