Exploring Recurrent Long-Term Temporal Fusion for Multi-View 3D Perception

被引:2
|
作者
Han, Chunrui [1 ]
Yang, Jinrong [2 ]
Sun, Jianjian [1 ]
Ge, Zheng [1 ]
Dong, Runpei [3 ]
Zhou, Hongyu [1 ]
Mao, Weixin [4 ]
Peng, Yuang [5 ]
Zhang, Xiangyu [1 ]
机构
[1] Megvii Technol, Beijing 100080, Peoples R China
[2] Huazhong Univ Sci & Technol, Wuhan 430074, Peoples R China
[3] Xi An Jiao Tong Univ, Beijing 100084, Peoples R China
[4] Waseda Univ, Fukuoka 8070832, Japan
[5] Tsinghua Univ, Jian 343200, Peoples R China
来源
IEEE ROBOTICS AND AUTOMATION LETTERS | 2024年 / 9卷 / 07期
关键词
Three-dimensional displays; History; Task analysis; Feature extraction; Fuses; Pipelines; Detectors; Multi-view 3D object detection; recurrent network and long-term temporal fusion;
D O I
10.1109/LRA.2024.3401172
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Long-term temporal fusion is a crucial but often overlooked technique in camera-based Bird's-Eye-View (BEV) 3D perception. Existing methods are mostly in a parallel manner. While parallel fusion can benefit from long-term information, it suffers from increasing computational and memory overheads as the fusion window size grows. Alternatively, BEVFormer adopts a recurrent fusion pipeline so that history information can be efficiently integrated, yet it fails to benefit from longer temporal frames. In this letter, we explore an embarrassingly simple long-term recurrent fusion strategy built upon the LSS-based methods and find it already able to enjoy the merits from both sides, i.e., rich long-term information and efficient fusion pipeline. A temporal embedding module is further proposed to improve the model's robustness against occasionally missed frames in practical scenarios. We name this simple but effective fusing pipeline VideoBEV. Experimental results on the nuScenes benchmark show that VideoBEV obtains strong performance on various camera-based 3D perception tasks, including object detection (<bold>55.4%</bold> mAP and <bold>62.9%</bold> NDS), segmentation (<bold>48.6%</bold> vehicle mIoU), tracking (<bold>54.8%</bold> AMOTA), and motion prediction (<bold>0.80 m</bold> minADE and <bold>0.463</bold> EPA).
引用
收藏
页码:6544 / 6551
页数:8
相关论文
共 50 条
  • [41] PIXGAN-Drone: 3D Avatar of Human Body Reconstruction From Multi-View 2D Images
    Rasheed, Ali Salim
    Jabberi, Marwa
    Hamdani, Tarek M.
    Alimi, Adel M.
    IEEE ACCESS, 2024, 12 : 74762 - 74776
  • [42] 3DMNDT: 3D Multi-View Registration Method Based on the Normal Distributions Transform
    Zhu, Jihua
    Mu, Jiaxi
    Yan, Chao-Bo
    Wang, Di
    Li, Zhongyu
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, 21 (01) : 488 - 501
  • [43] EPMF: Efficient Perception-Aware Multi-Sensor Fusion for 3D Semantic Segmentation
    Tan, Mingkui
    Zhuang, Zhuangwei
    Chen, Sitao
    Li, Rong
    Jia, Kui
    Wang, Qicheng
    Li, Yuanqing
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 8258 - 8273
  • [44] C2FNet: A Coarse-to-Fine Network for Multi-View 3D Point Cloud Generation
    Lei, Jianjun
    Song, Jiahui
    Peng, Bo
    Li, Wanqing
    Pan, Zhaoqing
    Huang, Qingming
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6707 - 6718
  • [45] LiDAR-Camera Fusion in Perspective View for 3D Object Detection in Surface Mine
    Ai, Yunfeng
    Yang, Xue
    Song, Ruiqi
    Cui, Chenglin
    Li, Xinqing
    Cheng, Qi
    Tian, Bin
    Chen, Long
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (02): : 3721 - 3730
  • [46] No-Reference 3D Point Cloud Quality Assessment Using Multi-View Projection and Deep Convolutional Neural Network
    Bourbia, Salima
    Karine, Ayoub
    Chetouani, Aladine
    El Hassouni, Mohammed
    Jridi, Maher
    IEEE ACCESS, 2023, 11 : 26759 - 26772
  • [47] Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking With Transformer
    Luo, Zhipeng
    Zhou, Changqing
    Pan, Liang
    Zhang, Gongjie
    Liu, Tianrui
    Luo, Yueru
    Zhao, Haiyu
    Liu, Ziwei
    Lu, Shijian
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (09) : 5921 - 5935
  • [48] Pixel2Mesh++: 3D Mesh Generation and Refinement From Multi-View Images
    Wen, Chao
    Zhang, Yinda
    Cao, Chenjie
    Li, Zhuwen
    Xue, Xiangyang
    Fu, Yanwei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (02) : 2166 - 2180
  • [49] Robotic Arm Platform for Multi-View Image Acquisition and 3D Reconstruction in Minimally Invasive Surgery
    Saikia, Alexander
    Vece, Chiara Di
    Bonilla, Sierra
    He, Chloe
    Magbagbeola, Morenike
    Mennillo, Laurent
    Czempiel, Tobias
    Bano, Sophia
    Stoyanov, Danail
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (04): : 3174 - 3181
  • [50] Multi-View Data Augmentation to Improve Wound Segmentation on 3D Surface Model by Deep Learning
    Niri, R.
    Gutierrez, E.
    Douzi, H.
    Lucas, Y.
    Treuillet, S.
    Castaneda, B.
    Hernandez, I
    IEEE ACCESS, 2021, 9 : 157628 - 157638