The PASCAL CHiME speech separation and recognition challenge

被引：133

作者：

Barker, Jon ^{[1
]}

Vincent, Emmanuel ^{[2
]}

Ma, Ning ^{[1
]}

Christensen, Heidi ^{[1
]}

Green, Phil ^{[1
]}

机构：

[1] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, S Yorkshire, England

[2] Ctr Rennes Bretagne Atlantique, INRIA, F-35042 Rennes, France

来源：

COMPUTER SPEECH AND LANGUAGE | 2013年 / 27卷 / 03期

基金：

英国工程与自然科学研究理事会;

关键词：

Speech recognition; Source separation; Noise robustness;

D O I：

10.1016/j.csl.2012.10.004

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Distant microphone speech recognition systems that operate with human-like robustness remain a distant goal. The key difficulty is that operating in everyday listening conditions entails processing a speech signal that is reverberantly mixed into a noise background composed of multiple competing sound sources. This paper describes a recent speech recognition evaluation that was designed to bring together researchers from multiple communities in order to foster novel approaches to this problem. The task was to identify keywords from sentences reverberantly mixed into audio backgrounds binaurally recorded in a busy domestic environment. The challenge was designed to model the essential difficulties of the multisource environment problem while remaining on a scale that would make it accessible to a wide audience. Compared to previous ASR evaluations a particular novelty of the task is that the utterances to be recognised were provided in a continuous audio background rather than as pre-segmented utterances thus allowing a range of background modelling techniques to be employed. The challenge attracted thirteen submissions. This paper describes the challenge problem, provides an overview of the systems that were entered and provides a comparison alongside both a baseline recognition system and human performance. The paper discusses insights gained from the challenge and lessons learnt for the design of future such evaluations. (c) 2012 Elsevier Ltd. All rights reserved.

引用

页码：621 / 633

页数：13

共 26 条

[1]

[Anonymous], P CHIME WORKSH

[2]

[Anonymous], 2011, P INTERSPEECH

[3]

[Anonymous], 2011, INT WORKSH MACH LIST

[4]

[Anonymous], P INT WORKSH MACH LI

[5]

[Anonymous], 2009, Distant Speech Recognition

[6]

Christensen H, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, P1918

[7] An audio-visual corpus for speech perception and automatic speech recognition (L) [J].

Cooke, Martin ;

Barker, Jon ;

Cunningham, Stuart ;

Shao, Xu .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (05) :2421-2424

[8] Spectral and temporal changes to speech produced in the presence of energetic and informational maskers [J].

Cooke, Martin ;

Lu, Youyi .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 128 (04) :2059-2069

[9] Monaural speech separation and recognition challenge [J].

Cooke, Martin ;

Hershey, John R. ;

Rennie, Steven J. .

COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01) :1-15

[10]

Delcroix Marc., 2011, Machine Listening in Multisource Environments

← 1 2 3 →