Integrated audio-visual processing for object localization and tracking

被引:1
作者
Pingali, GS [1 ]
机构
[1] AT&T Bell Labs, Lucent Technol, Murray Hill, NJ 07974 USA
来源
MULTIMEDIA COMPUTING AND NETWORKING 1998 | 1997年 / 3310卷
关键词
multimodal; people tracking; acoustic talker direction finding; video; audio; multimedia; real time;
D O I
10.1117/12.298421
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a system that combines audio and visual cues for locating and tracking an object, typically a person, in real time. It is shown that combining a speech source localization algorithm with a video-based head tracking algorithm results in a more accurate and robust tracker than that obtained using any one of the audio or visual modalities. Performance evaluation results are presented with a system that runs in real time on a general purpose processor. The multimodal tracker has several applications such as teleconferencing, multimedia kiosks and interactive games.
引用
收藏
页码:206 / 213
页数:8
相关论文
empty
未找到相关数据