Exploring Recurrent Long-Term Temporal Fusion for Multi-View 3D Perception

被引：2

作者：

Han, Chunrui ^{[1
]}

Yang, Jinrong ^{[2
]}

Sun, Jianjian ^{[1
]}

Ge, Zheng ^{[1
]}

Dong, Runpei ^{[3
]}

Zhou, Hongyu ^{[1
]}

Mao, Weixin ^{[4
]}

Peng, Yuang ^{[5
]}

Zhang, Xiangyu ^{[1
]}

机构：

[1] Megvii Technol, Beijing 100080, Peoples R China

[2] Huazhong Univ Sci & Technol, Wuhan 430074, Peoples R China

[3] Xi An Jiao Tong Univ, Beijing 100084, Peoples R China

[4] Waseda Univ, Fukuoka 8070832, Japan

[5] Tsinghua Univ, Jian 343200, Peoples R China

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2024年 / 9卷 / 07期

关键词：

Three-dimensional displays; History; Task analysis; Feature extraction; Fuses; Pipelines; Detectors; Multi-view 3D object detection; recurrent network and long-term temporal fusion;

D O I：

10.1109/LRA.2024.3401172

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Long-term temporal fusion is a crucial but often overlooked technique in camera-based Bird's-Eye-View (BEV) 3D perception. Existing methods are mostly in a parallel manner. While parallel fusion can benefit from long-term information, it suffers from increasing computational and memory overheads as the fusion window size grows. Alternatively, BEVFormer adopts a recurrent fusion pipeline so that history information can be efficiently integrated, yet it fails to benefit from longer temporal frames. In this letter, we explore an embarrassingly simple long-term recurrent fusion strategy built upon the LSS-based methods and find it already able to enjoy the merits from both sides, i.e., rich long-term information and efficient fusion pipeline. A temporal embedding module is further proposed to improve the model's robustness against occasionally missed frames in practical scenarios. We name this simple but effective fusing pipeline VideoBEV. Experimental results on the nuScenes benchmark show that VideoBEV obtains strong performance on various camera-based 3D perception tasks, including object detection (<bold>55.4%</bold> mAP and <bold>62.9%</bold> NDS), segmentation (<bold>48.6%</bold> vehicle mIoU), tracking (<bold>54.8%</bold> AMOTA), and motion prediction (<bold>0.80 m</bold> minADE and <bold>0.463</bold> EPA).

引用

页码：6544 / 6551

页数：8

共 50 条

[41] PIXGAN-Drone: 3D Avatar of Human Body Reconstruction From Multi-View 2D Images
Rasheed, Ali Salim
Jabberi, Marwa
Hamdani, Tarek M.
Alimi, Adel M.
IEEE ACCESS, 2024, 12 : 74762 - 74776
[42] 3DMNDT: 3D Multi-View Registration Method Based on the Normal Distributions Transform
Zhu, Jihua
Mu, Jiaxi
Yan, Chao-Bo
Wang, Di
Li, Zhongyu
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, 21 (01) : 488 - 501
[43] EPMF: Efficient Perception-Aware Multi-Sensor Fusion for 3D Semantic Segmentation
Tan, Mingkui
Zhuang, Zhuangwei
Chen, Sitao
Li, Rong
Jia, Kui
Wang, Qicheng
Li, Yuanqing
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 8258 - 8273
[44] C2FNet: A Coarse-to-Fine Network for Multi-View 3D Point Cloud Generation
Lei, Jianjun
Song, Jiahui
Peng, Bo
Li, Wanqing
Pan, Zhaoqing
Huang, Qingming
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6707 - 6718
[45] LiDAR-Camera Fusion in Perspective View for 3D Object Detection in Surface Mine
Ai, Yunfeng
Yang, Xue
Song, Ruiqi
Cui, Chenglin
Li, Xinqing
Cheng, Qi
Tian, Bin
Chen, Long
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (02): : 3721 - 3730
[46] No-Reference 3D Point Cloud Quality Assessment Using Multi-View Projection and Deep Convolutional Neural Network
Bourbia, Salima
Karine, Ayoub
Chetouani, Aladine
El Hassouni, Mohammed
Jridi, Maher
IEEE ACCESS, 2023, 11 : 26759 - 26772
[47] Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking With Transformer
Luo, Zhipeng
Zhou, Changqing
Pan, Liang
Zhang, Gongjie
Liu, Tianrui
Luo, Yueru
Zhao, Haiyu
Liu, Ziwei
Lu, Shijian
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (09) : 5921 - 5935
[48] Pixel2Mesh++: 3D Mesh Generation and Refinement From Multi-View Images
Wen, Chao
Zhang, Yinda
Cao, Chenjie
Li, Zhuwen
Xue, Xiangyang
Fu, Yanwei
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (02) : 2166 - 2180
[49] Robotic Arm Platform for Multi-View Image Acquisition and 3D Reconstruction in Minimally Invasive Surgery
Saikia, Alexander
Vece, Chiara Di
Bonilla, Sierra
He, Chloe
Magbagbeola, Morenike
Mennillo, Laurent
Czempiel, Tobias
Bano, Sophia
Stoyanov, Danail
IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (04): : 3174 - 3181
[50] Multi-View Data Augmentation to Improve Wound Segmentation on 3D Surface Model by Deep Learning
Niri, R.
Gutierrez, E.
Douzi, H.
Lucas, Y.
Treuillet, S.
Castaneda, B.
Hernandez, I
IEEE ACCESS, 2021, 9 : 157628 - 157638

← 1 2 3 4 5 →