IMPROVING MULTIPLE-CROWD-SOURCED TRANSCRIPTIONS USING A SPEECH RECOGNISER

被引：0

作者：

van Dalen, R. C. ^{[1
]}

Knill, K. M. ^{[1
]}

Tsiakoulis, P. ^{[1
]}

Gales, M. J. F. ^{[1
]}

机构：

[1] Univ Cambridge, Dept Engn, Trumpington St, Cambridge CB2 1PZ, England

来源：

2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年

关键词：

Automatic speech recognition; crowd-sourcing; transcription combination;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper introduces a method to produce high-quality transcriptions of speech data from only two crowd-sourced transcriptions. These transcriptions, produced cheaply by people on the Internet, for example through Amazon Mechanical Turk, are often of low quality. Often, multiple crowd-sourced transcriptions are combined to form one transcription of higher quality. However, the state of the art is to use essentially a form of majority voting, which requires at least three transcriptions for each utterance. This paper shows how to refine this approach to work with only two transcriptions. It then introduces a method that uses a speech recogniser (bootstrapped on a simple combination scheme) to combine transcriptions. When only two crowd-sourced transcriptions are available, on a noisy data set this improves the word error rate to gold-standard transcriptions by 21% relative.

引用

页码：4709 / 4713

页数：5

共 39 条

[21] Improving Recognition of Speech System Using Multimodal Approach
Radha, N.
Shahina, A.
Khan, A. Nayeemulla
INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, VOL 2, 2019, 56 : 397 - 410
[22] Improving Adaptive Learning Models Using Prosodic Speech Features
Wilschut, Thomas
Sense, Florian
Scharenborg, Odette
van Rijn, Hedderik
ARTIFICIAL INTELLIGENCE IN EDUCATION, AIED 2023, 2023, 13916 : 255 - 266
[23] Improving Joint Speech and Emotion Recognition Using Global Style Tokens
Kyung, Jehyun
Seong, Ju-Seok
Choi, Jeong-Hwan
Jeoung, Ye-Rin
Chang, Joon-Hyuk
INTERSPEECH 2023, 2023, : 4528 - 4532
[24] Improving a Conversational Speech Recognition System Using Phonetic and Neural Transcript Correction
Campos-Soberanis, Mario
Campos-Sobrino, Diego
Viana-Camara, Rafael
ADVANCES IN SOFT COMPUTING (MICAI 2021), PT II, 2021, 13068 : 46 - 58
[25] Using Dialogue-Based Dynamic Language Models for Improving Speech Recognition
Manuel Lucas-Cuesta, Juan
Fernandez, Fernando
Ferreiros, Javier
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2439 - 2442
[26] Comparison of crowd-sourced, electronic health records based, and traditional health-care based influenza-tracking systems at multiple spatial resolutions in the United States of America
Kristin Baltrusaitis
John S. Brownstein
Samuel V. Scarpino
Eric Bakota
Adam W. Crawley
Giuseppe Conidi
Julia Gunn
Josh Gray
Anna Zink
Mauricio Santillana
BMC Infectious Diseases, 18
[27] Comparison of crowd-sourced, electronic health records based, and traditional health-care based influenza-tracking systems at multiple spatial resolutions in the United States of America
Baltrusaitis, Kristin
Brownstein, John S.
Scarpino, Samuel V.
Bakota, Eric
Crawley, Adam W.
Conidi, Giuseppe
Gunn, Julia
Gray, Josh
Zink, Anna
Santillana, Mauricio
BMC INFECTIOUS DISEASES, 2018, 18
[28] Using crowd-sourced data for real-time monitoring of food prices during the COVID-19 pandemic: Insights from a pilot project in northern Nigeria
Adewopo, Julius B.
Solano-Hermosilla, Gloria
Colen, Liesbeth
Micale, Fabio
GLOBAL FOOD SECURITY-AGRICULTURE POLICY ECONOMICS AND ENVIRONMENT, 2021, 29
[29] Towards a Speech Recognizer for Multiple Languages Using Arabic Acoustic Model Application to Amazigh Language
Sadiqui, Ali
Zinedine, Ahmed
ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, 2018, 782 : 67 - 78
[30] Improving Recognition of Syallabic Units of Hindi Languagae Using Combined Features of Throat Microphone and Normal Microphone Speech
Radha, N.
Shahina, A.
Vinoth, G.
Khan, A. Nayeemulla
2014 INTERNATIONAL CONFERENCE ON CONTROL, INSTRUMENTATION, COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICCICCT), 2014, : 1343 - 1348

← 1 2 3 4 →