AV16.3: An audio-visual corpus for speaker localization and tracking

被引:0
|
作者
Lathoud, G [1 ]
Odobez, JM
Gatica-Perez, D
机构
[1] IDIAP Res Inst, CH-1920 Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Assessing the quality of a speaker localization or tracking algorithm on a few short examples is difficult, especially when the ground-truth is absent or not well defined. One step towards systematic performance evaluation of such algorithms is to provide time-continuous speaker location annotation over a series of real recordings, covering various test cases. Areas of interest include audio, video and audio-visual speaker localization and tracking. The desired location annotation can be either 2-dimensional (image plane) or 3-dimensional (physical space). This paper motivates and describes a corpus of audio-visual data called "AV16.3", along with a method for 3-D location annotation based on calibrated cameras. "16.3" stands for 16 microphones and 3 cameras, recorded in a fully synchronized manner, in a meeting room. Part of this corpus has already been successfully used to report research results.
引用
收藏
页码:182 / 195
页数:14
相关论文
共 50 条
  • [41] Developing an Audio-visual Corpus of Scottish Gaelic
    Clayton, Ian
    Patton, Colleen
    Carnie, Andrew
    Hammond, Michael
    Fisher, Muriel
    LANGUAGE DOCUMENTATION & CONSERVATION, 2018, 12 : 481 - 513
  • [42] A self-calibrating algorithm for speaker tracking based on audio-visual statistical models
    Beal, MJ
    Jojic, N
    Attias, H
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 1997 - 2000
  • [43] Speaker Localization among multi-faces in noisy environment by audio-visual Integration
    Kim, Hyun-Don
    Choi, Jong-Suk
    Kim, Munsang
    2006 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), VOLS 1-10, 2006, : 1305 - 1310
  • [44] The Turkish Audio-Visual Bipolar Disorder Corpus
    Ciftci, Elvan
    Kaya, Heysem
    Gulec, Huseyin
    Salah, Albert Ali
    2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [45] AusTalk: an audio-visual corpus of Australian English
    Estival, Dominique
    Cassidy, Steve
    Cox, Felicity
    Burnham, Denis
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3105 - 3109
  • [46] A Visual Signal Reliability for Robust Audio-Visual Speaker Identification
    Tariquzzaman, Md.
    Kim, Jin Young
    Na, Seung You
    Kim, Hyoung-Gook
    Har, Dongsoo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (10): : 2052 - 2055
  • [47] Audio-Visual Tracking of Concurrent Speakers
    Qian, Xinyuan
    Brutti, Alessio
    Lanz, Oswald
    Omologo, Maurizio
    Cavallaro, Andrea
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 942 - 954
  • [48] Egocentric Audio-Visual Object Localization
    Huang, Chao
    Flan, Yapeng
    Kurnar, Anurag
    Xu, Chenliang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22910 - 22921
  • [49] Audio-visual tracking for natural interactivity
    Pingali, G
    Tunali, G
    Carlbom, I
    ACM MULTIMEDIA 99, PROCEEDINGS, 1999, : 373 - 382
  • [50] AVA ACTIVE SPEAKER: AN AUDIO-VISUAL DATASET FOR ACTIVE SPEAKER DETECTION
    Roth, Joseph
    Chaudhuri, Sourish
    Klejch, Ondrej
    Marvin, Radhika
    Gallagher, Andrew
    Kaver, Liat
    Ramaswamy, Sharadh
    Stopczynski, Arkadiusz
    Schmid, Cordelia
    Xi, Zhonghua
    Pantofaru, Caroline
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4492 - 4496