Multi-modal Fusion Framework with Particle Filter for Speaker Tracking

被引:0
作者
Saeed, Anwar [1 ]
Al-Hamadi, Ayoub [1 ]
Heuer, Michael [1 ]
机构
[1] Otto von Guericke Univ, Inst Elect Signal Proc & Commun IESK, POB 4210, D-39106 Magdeburg, Germany
来源
INTERNATIONAL JOURNAL OF FUTURE GENERATION COMMUNICATION AND NETWORKING | 2012年 / 5卷 / 04期
关键词
Speaker tracking; Human skin detection; Face detection; Particle filter; Time difference of arrival;
D O I
暂无
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
In the domain of Human-Computer Interaction (HCI), the main focus of the computer is to interpret the external stimuli provided by users. Moreover in the multi-person scenarios, it is important to localize and track the speaker. To solve this issue, we introduce here a framework by which multi-modal sensory data can be efficiently and meaningfully combined in the application of speaker tracking. This framework fuses together four different observation types taken from multi-modal sensors. The advantages of this fusion are that weak sensory data from either modality can be reinforced, and the presence of noise can be reduced. We propose a method of combining these modalities by employing a particle filter. This method offers satisfied real-time performance. We demonstrate results of a speaker localization in two- and three-person scenarios.
引用
收藏
页码:65 / 76
页数:12
相关论文
共 15 条
  • [1] [Anonymous], 2008, AM SOC ENG ED ASEE Z
  • [2] Blake Andrew, 1996, NIPS, P361
  • [3] Cutler R, 2000, 2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, P1589, DOI 10.1109/ICME.2000.871073
  • [4] Audiovisual probabilistic tracking of multiple speakers in meetings
    Gatica-Perez, Daniel
    Lathoud, Guillaume
    Odobez, Jean-Marc
    McCowan, Iain
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (02): : 601 - 616
  • [5] Kapralos Bill, 2003, AUDIO VISUAL LOCALIZ
  • [6] GENERALIZED CORRELATION METHOD FOR ESTIMATION OF TIME-DELAY
    KNAPP, CH
    CARTER, GC
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1976, 24 (04): : 320 - 327
  • [7] RAHMAN N, 2006, P MMU INT S INF COMM
  • [8] Saeed A., 2011, 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2011), P238, DOI 10.1109/ICSIPA.2011.6144134
  • [9] Saeed Anwar, 2011, INT COMP INT SYST IC, V3, P605
  • [10] Schettini R., 2006, SPIE