RAV4D: A Radar-Audio-Visual Dataset for Indoor Multi-Person Tracking

被引：1

作者：

Zhou, Yi ^{[1
]}

Song, Ningfei ^{[1
]}

Ma, Jieming ^{[1
]}

Man, Ka Lok ^{[1
]}

Lopez-Benitez, Miguel ^{[2
]}

Yu, Limin ^{[1
]}

Yue, Yutao ^{[3
]}

机构：

[1] Xian Jiaotong Liverpool Univ, Sch Adv Technol, Suzhou, Peoples R China

[2] Univ Liverpool, Dept Elect Engn & Elect, Liverpool, England

[3] JITRI, Inst Deep Percept Technol, Wuxi, Peoples R China

来源：

2024 IEEE RADAR CONFERENCE, RADARCONF 2024 | 2024年

关键词：

Multiple Object Tracking; Sensor Fusion; Speaker Tracking; Radar Tracking;

D O I：

10.1109/RADARCONF2458775.2024.10549701

中图分类号：

TP7 [遥感技术];

学科分类号：

081102 ; 0816 ; 081602 ; 083002 ; 1404 ;

摘要：

Indoor multi-person tracking is a widely explored area of research. However, publicly available datasets are either oversimplified or provide only visual data. To fill this gap, our paper presents the RAV4D dataset, a novel multimodal dataset that includes data from radar, microphone arrays, and stereo cameras. This dataset is characterised by the provision of 3D positions, Euler angles and Doppler velocities. By integrating these different data types, RAV4D aims to exploit the synergistic and complementary capabilities of these modalities to improve tracking performance. The development of RAV4D addresses two main challenges: sensor calibration and 3D annotation. A novel calibration target is designed to effectively calibrate the radar, stereo camera and microphone array. In addition, a visually guided annotation framework is proposed to address the challenge of annotating radar data. This framework uses head positions, heading orientation and depth information from stereo cameras and radar to establish accurate ground truth for multimodal tracking trajectories. The dataset is publicly available at https://zenodo.org/records/10208199.

引用

页数：6