IMPROVING MULTIPLE-CROWD-SOURCED TRANSCRIPTIONS USING A SPEECH RECOGNISER

被引:0
|
作者
van Dalen, R. C. [1 ]
Knill, K. M. [1 ]
Tsiakoulis, P. [1 ]
Gales, M. J. F. [1 ]
机构
[1] Univ Cambridge, Dept Engn, Trumpington St, Cambridge CB2 1PZ, England
来源
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年
关键词
Automatic speech recognition; crowd-sourcing; transcription combination;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper introduces a method to produce high-quality transcriptions of speech data from only two crowd-sourced transcriptions. These transcriptions, produced cheaply by people on the Internet, for example through Amazon Mechanical Turk, are often of low quality. Often, multiple crowd-sourced transcriptions are combined to form one transcription of higher quality. However, the state of the art is to use essentially a form of majority voting, which requires at least three transcriptions for each utterance. This paper shows how to refine this approach to work with only two transcriptions. It then introduces a method that uses a speech recogniser (bootstrapped on a simple combination scheme) to combine transcriptions. When only two crowd-sourced transcriptions are available, on a noisy data set this improves the word error rate to gold-standard transcriptions by 21% relative.
引用
收藏
页码:4709 / 4713
页数:5
相关论文
共 39 条
  • [21] Improving Recognition of Speech System Using Multimodal Approach
    Radha, N.
    Shahina, A.
    Khan, A. Nayeemulla
    INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, VOL 2, 2019, 56 : 397 - 410
  • [22] Improving Adaptive Learning Models Using Prosodic Speech Features
    Wilschut, Thomas
    Sense, Florian
    Scharenborg, Odette
    van Rijn, Hedderik
    ARTIFICIAL INTELLIGENCE IN EDUCATION, AIED 2023, 2023, 13916 : 255 - 266
  • [23] Improving Joint Speech and Emotion Recognition Using Global Style Tokens
    Kyung, Jehyun
    Seong, Ju-Seok
    Choi, Jeong-Hwan
    Jeoung, Ye-Rin
    Chang, Joon-Hyuk
    INTERSPEECH 2023, 2023, : 4528 - 4532
  • [24] Improving a Conversational Speech Recognition System Using Phonetic and Neural Transcript Correction
    Campos-Soberanis, Mario
    Campos-Sobrino, Diego
    Viana-Camara, Rafael
    ADVANCES IN SOFT COMPUTING (MICAI 2021), PT II, 2021, 13068 : 46 - 58
  • [25] Using Dialogue-Based Dynamic Language Models for Improving Speech Recognition
    Manuel Lucas-Cuesta, Juan
    Fernandez, Fernando
    Ferreiros, Javier
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2439 - 2442
  • [26] Comparison of crowd-sourced, electronic health records based, and traditional health-care based influenza-tracking systems at multiple spatial resolutions in the United States of America
    Kristin Baltrusaitis
    John S. Brownstein
    Samuel V. Scarpino
    Eric Bakota
    Adam W. Crawley
    Giuseppe Conidi
    Julia Gunn
    Josh Gray
    Anna Zink
    Mauricio Santillana
    BMC Infectious Diseases, 18
  • [27] Comparison of crowd-sourced, electronic health records based, and traditional health-care based influenza-tracking systems at multiple spatial resolutions in the United States of America
    Baltrusaitis, Kristin
    Brownstein, John S.
    Scarpino, Samuel V.
    Bakota, Eric
    Crawley, Adam W.
    Conidi, Giuseppe
    Gunn, Julia
    Gray, Josh
    Zink, Anna
    Santillana, Mauricio
    BMC INFECTIOUS DISEASES, 2018, 18
  • [28] Using crowd-sourced data for real-time monitoring of food prices during the COVID-19 pandemic: Insights from a pilot project in northern Nigeria
    Adewopo, Julius B.
    Solano-Hermosilla, Gloria
    Colen, Liesbeth
    Micale, Fabio
    GLOBAL FOOD SECURITY-AGRICULTURE POLICY ECONOMICS AND ENVIRONMENT, 2021, 29
  • [29] Towards a Speech Recognizer for Multiple Languages Using Arabic Acoustic Model Application to Amazigh Language
    Sadiqui, Ali
    Zinedine, Ahmed
    ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, 2018, 782 : 67 - 78
  • [30] Improving Recognition of Syallabic Units of Hindi Languagae Using Combined Features of Throat Microphone and Normal Microphone Speech
    Radha, N.
    Shahina, A.
    Vinoth, G.
    Khan, A. Nayeemulla
    2014 INTERNATIONAL CONFERENCE ON CONTROL, INSTRUMENTATION, COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICCICCT), 2014, : 1343 - 1348