MF-Net: A Multimodal Fusion Model for Fast Multi-Object Tracking

被引：4

作者：

Tian, Shirui ^{[1
]}

Duan, Mingxing ^{[1
,2
]}

Deng, Jiayan ^{[3
]}

Luo, Huizhang ^{[1
]}

Hu, Yikun ^{[1
]}

机构：

[1] Hunan Univ, Sch Informat Sci & Engn, Changsha 410082, Peoples R China

[2] Hunan Univ, Shenzhen Inst, Shenzhen 518063, Peoples R China

[3] Hunan Modern Logist Coll, Sch Logist & Informat, Changsha 410000, Peoples R China

来源：

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY | 2024年 / 73卷 / 08期

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Point cloud compression; Mathematical models; Computational modeling; Hidden Markov models; Three-dimensional displays; Object detection; Task analysis; Multi-object tracking; multimodel fusion; object detection; trajectory matching; Gaussian function;

D O I：

10.1109/TVT.2024.3375457

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In the realm of multimodal multi-object tracking (MOT) applications based on point clouds and images, the current research predominantly focuses on enhancing tracking accuracy, often neglecting the issue of computational efficiency. Consequently, these models often struggle to exhibit optimal tracking capabilities in scenarios demanding high real-time performance. To address these challenges, this paper introduces a fast multi-object tracking model based on multimodal fusion (MF-Net). The model is divided into three primary modules: object detection, multimodal fusion, and trajectory matching. Firstly, a 2D detector is used to identify objects in the image and compute their posterior estimate, and a 3D classification network extracts the foreground points of the object from the point cloud. Subsequently, a perspective projection module is then designed to determine the transformation matrix and the minimum number of vertex pairs that map the coordinates of the foreground points onto a 2D plane. Based on the model, a Planar Gaussian Function (PGF) model was constructed to fit small and hard objects that were missed in the image according to the foreground points, thus compensating for the limitations of 2D detectors and ensuring accuracy while reducing training time. Finally, the merged object performs trajectory matching. The performance of MF-Net has been verified through experiments in plenty conducted on publicly available KITTI and nuScenes datasets. In comparison to existing competitive models, our algorithm demonstrates a substantial enhancement in both detection and tracking performance, achieving satisfactory accuracy but showcasing superior real-time efficiency.

引用

页码：10948 / 10962

页数：15

共 54 条

[1] Multi-Object Tracking Based on a Novel Feature Image With Multi-Modal Information [J].

An, Yi ;

Wu, Jialin ;

Cui, Yunhao ;

Hu, Huosheng .

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (08) :9909-9921

[2]

[Anonymous], 1974, Indefinite Inner Product Spaces

[3] A Solution for Large-Scale Multi-Object Tracking [J].

Beard, Michael ;

Vo, Ba Tuong ;

Vo, Ba-Ngu .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 :2754-2769

[4] Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics [J].

Bernardin, Keni ;

Stiefelhagen, Rainer .

EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2008, 2008 (1)

[5]

Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164

[6] Cascade R-CNN: Delving into High Quality Object Detection [J].

Cai, Zhaowei ;

Vasconcelos, Nuno .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6154-6162

[7]

Chaabane M, 2021, Arxiv, DOI arXiv:2102.02267

[8] Citywide Traffic Flow Prediction Based on Multiple Gated Spatio-temporal Convolutional Neural Networks [J].

Chen, Cen ;

Li, Kenli ;

Teo, Sin G. ;

Zou, Xiaofeng ;

Li, Keqin ;

Zeng, Zeng .

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2020, 14 (04)

[9] VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking [J].

Chen, Yukang ;

Liu, Jianhui ;

Zhang, Xiangyu ;

Qi, Xiaojuan ;

Jia, Jiaya .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :21674-21683

[10] VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention [J].

Deng, Shengheng ;

Liang, Zhihao ;

Sun, Lin ;

Jia, Kui .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :8438-8447

← 1 2 3 4 5 6 →