AV16.3: An audio-visual corpus for speaker localization and tracking

被引:0
|
作者
Lathoud, G [1 ]
Odobez, JM
Gatica-Perez, D
机构
[1] IDIAP Res Inst, CH-1920 Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Assessing the quality of a speaker localization or tracking algorithm on a few short examples is difficult, especially when the ground-truth is absent or not well defined. One step towards systematic performance evaluation of such algorithms is to provide time-continuous speaker location annotation over a series of real recordings, covering various test cases. Areas of interest include audio, video and audio-visual speaker localization and tracking. The desired location annotation can be either 2-dimensional (image plane) or 3-dimensional (physical space). This paper motivates and describes a corpus of audio-visual data called "AV16.3", along with a method for 3-D location annotation based on calibrated cameras. "16.3" stands for 16 microphones and 3 cameras, recorded in a fully synchronized manner, in a meeting room. Part of this corpus has already been successfully used to report research results.
引用
收藏
页码:182 / 195
页数:14
相关论文
共 50 条
  • [31] A JOINT AUDIO-VISUAL APPROACH TO AUDIO LOCALIZATION
    Jensen, Jesper Rindom
    Christensen, Mads Graesboll
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 454 - 458
  • [32] A Bayesian approach to audio-visual speaker identification
    Nefian, AV
    Liang, LH
    Fu, TY
    Liu, XX
    AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 761 - 769
  • [33] Speaker independent audio-visual speech recognition
    Zhang, Y
    Levinson, S
    Huang, T
    2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 1073 - 1076
  • [34] Multifactor fusion for audio-visual speaker recognition
    Chetty, Girija
    Tran, Dat
    LECTURE NOTES IN SIGNAL SCIENCE, INTERNET AND EDUCATION (SSIP'07/MIV'07/DIWEB'07), 2007, : 70 - +
  • [35] ENVIRONMENTALLY ROBUST AUDIO-VISUAL SPEAKER IDENTIFICATION
    Schoenherr, Lea
    Orth, Dennis
    Heckmann, Martin
    Kolossa, Dorothea
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 312 - 318
  • [36] Audio-visual biometric based speaker identification
    Kar, Biswajit
    Bhatia, Sandeep
    Dutta, P. K.
    ICCIMA 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, VOL IV, PROCEEDINGS, 2007, : 94 - 98
  • [37] Audio-visual system for robust speaker recognition
    Chen, Q
    Yang, JG
    Gou, J
    MLMTA '05: Proceedings of the International Conference on Machine Learning Models Technologies and Applications, 2005, : 97 - 103
  • [38] Audio-Visual Feature Fusion for Speaker Identification
    Almaadeed, Noor
    Aggoun, Amar
    Amira, Abbes
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT I, 2012, 7663 : 56 - 67
  • [39] Audio-visual speaker identification based on the use of dynamic audio and visual features
    Fox, N
    Reilly, RB
    AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 743 - 751
  • [40] Audio-Visual Bimodal Combination-Based Speaker Tracking Method for Mobile Robot
    Zhang, Hao-Yan
    Zhang, Long-Bo
    Shi, Qi-Feng
    Liu, Zhen-Tao
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2024, 28 (01) : 196 - 205