IMPROVING MULTIPLE-CROWD-SOURCED TRANSCRIPTIONS USING A SPEECH RECOGNISER

被引：0

作者：

van Dalen, R. C. ^{[1
]}

Knill, K. M. ^{[1
]}

Tsiakoulis, P. ^{[1
]}

Gales, M. J. F. ^{[1
]}

机构：

[1] Univ Cambridge, Dept Engn, Trumpington St, Cambridge CB2 1PZ, England

来源：

2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年

关键词：

Automatic speech recognition; crowd-sourcing; transcription combination;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper introduces a method to produce high-quality transcriptions of speech data from only two crowd-sourced transcriptions. These transcriptions, produced cheaply by people on the Internet, for example through Amazon Mechanical Turk, are often of low quality. Often, multiple crowd-sourced transcriptions are combined to form one transcription of higher quality. However, the state of the art is to use essentially a form of majority voting, which requires at least three transcriptions for each utterance. This paper shows how to refine this approach to work with only two transcriptions. It then introduces a method that uses a speech recogniser (bootstrapped on a simple combination scheme) to combine transcriptions. When only two crowd-sourced transcriptions are available, on a noisy data set this improves the word error rate to gold-standard transcriptions by 21% relative.

引用

页码：4709 / 4713

页数：5