Integrated audio-visual processing for object localization and tracking
被引:1
作者:
Pingali, GS
论文数: 0引用数: 0
h-index: 0
机构:
AT&T Bell Labs, Lucent Technol, Murray Hill, NJ 07974 USAAT&T Bell Labs, Lucent Technol, Murray Hill, NJ 07974 USA
Pingali, GS
[1
]
机构:
[1] AT&T Bell Labs, Lucent Technol, Murray Hill, NJ 07974 USA
来源:
MULTIMEDIA COMPUTING AND NETWORKING 1998
|
1997年
/
3310卷
关键词:
multimodal;
people tracking;
acoustic talker direction finding;
video;
audio;
multimedia;
real time;
D O I:
10.1117/12.298421
中图分类号:
TP3 [计算技术、计算机技术];
学科分类号:
0812 ;
摘要:
This paper presents a system that combines audio and visual cues for locating and tracking an object, typically a person, in real time. It is shown that combining a speech source localization algorithm with a video-based head tracking algorithm results in a more accurate and robust tracker than that obtained using any one of the audio or visual modalities. Performance evaluation results are presented with a system that runs in real time on a general purpose processor. The multimodal tracker has several applications such as teleconferencing, multimedia kiosks and interactive games.