An audio-visual corpus for speech perception and automatic speech recognition (L)

被引：754

作者：

Cooke, Martin

Barker, Jon

Cunningham, Stuart

Shao, Xu

机构：

[1] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, S Yorkshire, England

[2] Univ Sheffield, Dept Human Commun Sci, Sheffield S1 4DP, S Yorkshire, England

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 2006年 / 120卷 / 05期

基金：

英国工程与自然科学研究理事会;

关键词：

D O I：

10.1121/1.2229005

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

An audio-visual corpus has been collected to support the use of common material in speech perception and automatic speech recognition studies. The corpus consists of high-quality audio and video recordings of 1000 sentences spoken by each of 34 talkers. Sentences are simple, syntactically identical phrases such as "place green at B 4 now." Intelligibility tests using the audio signals suggest that the-material is easily identifiable in quiet and low levels of stationary noise. The annotated corpus is available on the web for research use. (c) 2006 Acoustical Society of America.

引用

页码：2421 / 2424

页数：4

共 14 条

[1] RECOGNITION OF PLOSIVE SYLLABLES IN NOISE - COMPARISON OF AN AUDITORY MODEL WITH HUMAN-PERFORMANCE [J].