CMAN: Leaning Global Structure Correlation for Monocular 3D Object Detection

被引:8
作者
Cao, Yuanzhouhan [1 ]
Zhang, Hui [1 ]
Li, Yidong [1 ]
Ren, Chao [2 ]
Lang, Congyan [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp Sci & Informat Technol, Beijing 100044, Peoples R China
[2] Sichuan Univ, Coll Elect & Informat Engn, Chengdu 610065, Peoples R China
基金
中国国家自然科学基金;
关键词
Three-dimensional displays; Correlation; Object detection; Point cloud compression; Feature extraction; Laser radar; Estimation; 3D object detection; attention learning; structure learning; data fusion;
D O I
10.1109/TITS.2022.3205446
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
The key to 3D object detection is proper utilization of depth data. Compared with LiDAR based approaches, 3D object detection from a single image remains a challenging task due to the lack of structure information. Recent methods leverage monocular depth estimation as a way to produce 2D depth maps, and adopt the depth maps as additional source of input to explore structure information. However, these methods either encode local structure correlations, or encode long range structure correlations by iteratively passing local messages. In this work, we propose a cross modal attention network (CMAN) for monocular 3D object detection. It is built upon the self-attention module which learns attention map from single modal data. Our CMAN is able to encode structure correlations from depth data, and embed the structure correlations with appearance information which is learned from RGB data. Thanks to the attention learning mechanism, our CMAN learns global structure correlations without iteration. In order to reduce the computational burden, our CMAN adopts a novel node sampler to eliminate redundant nodes during the attention map calculation. Experiment results on benchmark KITTI3D dataset show that our proposed CMAN outperforms the state-of-the-art methods.
引用
收藏
页码:24727 / 24737
页数:11
相关论文
共 66 条
[1]   Higher Order Conditional Random Fields in Deep Neural Networks [J].
Arnab, Anurag ;
Jayasumana, Sadeep ;
Zheng, Shuai ;
Torr, Philip H. S. .
COMPUTER VISION - ECCV 2016, PT II, 2016, 9906 :524-540
[2]   M3D-RPN: Monocular 3D Region Proposal Network for Object Detection [J].
Brazil, Garrick ;
Liu, Xiaoming .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9286-9295
[3]  
Cai YJ, 2020, AAAI CONF ARTIF INTE, V34, P10478
[4]   Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle analysis from monocular image [J].
Chabot, Florian ;
Chaouch, Mohamed ;
Rabarisoa, Jaonary ;
Teuliere, Celine ;
Chateau, Thierry .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1827-1836
[5]   Pyramid Stereo Matching Network [J].
Chang, Jia-Ren ;
Chen, Yong-Sheng .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5410-5418
[6]   Interpretable End-to-End Urban Autonomous Driving With Latent Deep Reinforcement Learning [J].
Chen, Jianyu ;
Li, Shengbo Eben ;
Tomizuka, Masayoshi .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (06) :5068-5078
[7]   3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection [J].
Chen, Xiaozhi ;
Kundu, Kaustav ;
Zhu, Yukun ;
Ma, Huimin ;
Fidler, Sanja ;
Urtasun, Raquel .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (05) :1259-1272
[8]   Monocular 3D Object Detection for Autonomous Driving [J].
Chen, Xiaozhi ;
Kundu, Kaustav ;
Zhang, Ziyu ;
Ma, Huimin ;
Fidler, Sanja ;
Urtasun, Raquel .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2147-2156
[9]  
Chen XZ, 2015, ADV NEUR IN, V28
[10]  
Chen YL, 2019, IEEE I CONF COMP VIS, P9774, DOI [10.1109/iccv.2019.00987, 10.1109/ICCV.2019.00987]