AV16.3: An audio-visual corpus for speaker localization and tracking

被引：0

作者：

Lathoud, G ^{[1
]}

Odobez, JM

Gatica-Perez, D

机构：

[1] IDIAP Res Inst, CH-1920 Martigny, Switzerland

[2] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland

来源：

MACHINE LEARNING FOR MULTIMODAL INTERACTION | 2005年 / 3361卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Assessing the quality of a speaker localization or tracking algorithm on a few short examples is difficult, especially when the ground-truth is absent or not well defined. One step towards systematic performance evaluation of such algorithms is to provide time-continuous speaker location annotation over a series of real recordings, covering various test cases. Areas of interest include audio, video and audio-visual speaker localization and tracking. The desired location annotation can be either 2-dimensional (image plane) or 3-dimensional (physical space). This paper motivates and describes a corpus of audio-visual data called "AV16.3", along with a method for 3-D location annotation based on calibrated cameras. "16.3" stands for 16 microphones and 3 cameras, recorded in a fully synchronized manner, in a meeting room. Part of this corpus has already been successfully used to report research results.

引用

页码：182 / 195

页数：14

共 9 条

[1] ALGAZI V, 2001, P WASPAA
[2] DIBIASE JH, 2001, ROBUST LOCALIZATION, P157
[3] JANIN A, 2003, ICSI M CORP P ICASSP
[4] LATHOUD G, 2004, IN PRESS P SAPA
[5] MOORE D, 2002, COM0207 IDIAP
[6] Moving-talker, speaker-independent feature study, and baseline results using the WAVE Multimodal speech corpus
Patterson, EK
Gurbuz, S
Tufekci, Z
Gowdy, JN
[J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1189 - 1201
[7] Perez P., 2002, P ECCV
[8] Shriberg E., 2001, P EUROSPEECH, P1359
[9] SVOBODA T, MULTICAMERA SELF CAL

← 1 →