From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection

被引：68

作者：

Deng, Jiajun ^{[1
]}

Zhou, Wengang ^{[1
]}

Zhang, Yanyong ^{[2
]}

Li, Houqiang ^{[1
]}

机构：

[1] Univ Sci & Technol China USTC, Dept Elect Engn & Informat Sci, CAS Key Lab Technol Geospatial Informat Proc & Ap, Hefei 230026, Peoples R China

[2] Univ Sci & Technol China USTC, Dept Comp Sci, Hefei 230026, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2021年 / 31卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Three-dimensional displays; Feature extraction; Proposals; Object detection; Laser radar; Detectors; Semantics; Point cloud; 3D object detection; LiDAR;

D O I：

10.1109/TCSVT.2021.3100848

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

As an emerging data modal with precise distance sensing, LiDAR point clouds have been placed great expectations on 3D scene understanding. However, point clouds are always sparsely distributed in the 3D space, and with unstructured storage, which makes it difficult to represent them for effective 3D object detection. To this end, in this work, we regard point clouds as hollow-3D data and propose a new architecture, namely Hallucinated Hollow-3D R-CNN (H(2)3D R-CNN), to address the problem of 3D object detection. In our approach, we first extract the multi-view features by sequentially projecting the point clouds into the perspective view and the bird-eye view. Then, we hallucinate the 3D representation by a novel bilaterally guided multi-view fusion block. Finally, the 3D objects are detected via a box refinement module with a novel Hierarchical Voxel RoI Pooling operation. The proposed H(2)3D R-CNN provides a new angle to take full advantage of complementary information in the perspective view and the bird-eye view with an efficient framework. We evaluate our approach on the public KITTI Dataset and Waymo Open Dataset. Extensive experiments demonstrate the superiority of our method over the state-of-the-art algorithms with respect to both effectiveness and efficiency. The code is available at https://github.com/djiajunustc/H-23D_R-CNN.

引用

页码：4722 / 4734

页数：13

共 52 条

[1] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].

Anderson, Peter ;

He, Xiaodong ;

Buehler, Chris ;

Teney, Damien ;

Johnson, Mark ;

Gould, Stephen ;

Zhang, Lei .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086

[2] Object Detection in Video with Spatiotemporal Sampling Networks [J].

Bertasius, Gedas ;

Torresani, Lorenzo ;

Shi, Jianbo .

COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 :342-357

[3] YOLACT Real-time Instance Segmentation [J].

Bolya, Daniel ;

Zhou, Chong ;

Xiao, Fanyi ;

Lee, Yong Jae .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9156-9165

[4] Cascade R-CNN: Delving into High Quality Object Detection [J].

Cai, Zhaowei ;

Vasconcelos, Nuno .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6154-6162

[5]

Chen Q., 2020, ADV NEUR IN, V33

[6]

Chen T., ARXIV200208510, V2020

[7] Multi-View 3D Object Detection Network for Autonomous Driving [J].

Chen, Xiaozhi ;

Ma, Huimin ;

Wan, Ji ;

Li, Bo ;

Xia, Tian .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534

[8]

Chu X., ARXIV210702493, V2021

[9]

Dai J, 2016, PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), P1796, DOI 10.1109/ICIT.2016.7475036

[10]

Deng JJ, 2021, AAAI CONF ARTIF INTE, V35, P1201

← 1 2 3 4 5 6 →