AV16.3: An audio-visual corpus for speaker localization and tracking

被引：0

作者：

Lathoud, G ^{[1
]}

Odobez, JM

Gatica-Perez, D

机构：

[1] IDIAP Res Inst, CH-1920 Martigny, Switzerland

[2] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland

来源：

MACHINE LEARNING FOR MULTIMODAL INTERACTION | 2005年 / 3361卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Assessing the quality of a speaker localization or tracking algorithm on a few short examples is difficult, especially when the ground-truth is absent or not well defined. One step towards systematic performance evaluation of such algorithms is to provide time-continuous speaker location annotation over a series of real recordings, covering various test cases. Areas of interest include audio, video and audio-visual speaker localization and tracking. The desired location annotation can be either 2-dimensional (image plane) or 3-dimensional (physical space). This paper motivates and describes a corpus of audio-visual data called "AV16.3", along with a method for 3-D location annotation based on calibrated cameras. "16.3" stands for 16 microphones and 3 cameras, recorded in a fully synchronized manner, in a meeting room. Part of this corpus has already been successfully used to report research results.

引用

页码：182 / 195

页数：14

共 50 条

[41] Developing an Audio-visual Corpus of Scottish Gaelic
Clayton, Ian
Patton, Colleen
Carnie, Andrew
Hammond, Michael
Fisher, Muriel
LANGUAGE DOCUMENTATION & CONSERVATION, 2018, 12 : 481 - 513
[42] A self-calibrating algorithm for speaker tracking based on audio-visual statistical models
Beal, MJ
Jojic, N
Attias, H
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 1997 - 2000
[43] Speaker Localization among multi-faces in noisy environment by audio-visual Integration
Kim, Hyun-Don
Choi, Jong-Suk
Kim, Munsang
2006 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), VOLS 1-10, 2006, : 1305 - 1310
[44] The Turkish Audio-Visual Bipolar Disorder Corpus
Ciftci, Elvan
Kaya, Heysem
Gulec, Huseyin
Salah, Albert Ali
2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
[45] AusTalk: an audio-visual corpus of Australian English
Estival, Dominique
Cassidy, Steve
Cox, Felicity
Burnham, Denis
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3105 - 3109
[46] A Visual Signal Reliability for Robust Audio-Visual Speaker Identification
Tariquzzaman, Md.
Kim, Jin Young
Na, Seung You
Kim, Hyoung-Gook
Har, Dongsoo
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (10): : 2052 - 2055
[47] Audio-Visual Tracking of Concurrent Speakers
Qian, Xinyuan
Brutti, Alessio
Lanz, Oswald
Omologo, Maurizio
Cavallaro, Andrea
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 942 - 954
[48] Egocentric Audio-Visual Object Localization
Huang, Chao
Flan, Yapeng
Kurnar, Anurag
Xu, Chenliang
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22910 - 22921
[49] Audio-visual tracking for natural interactivity
Pingali, G
Tunali, G
Carlbom, I
ACM MULTIMEDIA 99, PROCEEDINGS, 1999, : 373 - 382
[50] AVA ACTIVE SPEAKER: AN AUDIO-VISUAL DATASET FOR ACTIVE SPEAKER DETECTION
Roth, Joseph
Chaudhuri, Sourish
Klejch, Ondrej
Marvin, Radhika
Gallagher, Andrew
Kaver, Liat
Ramaswamy, Sharadh
Stopczynski, Arkadiusz
Schmid, Cordelia
Xi, Zhonghua
Pantofaru, Caroline
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4492 - 4496

← 1 2 3 4 5 →