UAMD-Net: A Unified Adaptive Multimodal Neural Network for Dense Depth Completion

被引：7

作者：

Chen, Guancheng ^{[1
]}

Lin, Junli ^{[1
]}

Qin, Huabiao ^{[1
]}

机构：

[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510641, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2023年 / 33卷 / 10期

关键词：

Depth completion; multimodal neural network; random modality dropout;

D O I：

10.1109/TCSVT.2023.3254650

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Depth estimation is a critical problem in robotics applications especially autonomous driving. Currently, depth prediction based on binocular stereo matching and depth completion based on fusion of monocular image and laser point cloud are two mainstream methods. However, the former usually suffers from lack of constraint while building cost volume, and the latter could not be trained in self-supervised way and haven't utilized the geometric constraint of stereo matching, which we think will further improve the performance. Therefore, we propose a novel multimodal neural network, namely UAMD-Net, for dense depth completion based on fusion of binocular stereo matching and the weak constraint from the sparse point clouds. Specifically, the sparse point clouds are converted to sparse depth map and filled to the multimodal feature encoder (MFE) with binocular image, constructing a cross-modal cost volume. Then, it will be further processed by the multimodal feature aggregator (MFA) and the depth regression layer. Furthermore, since previous multimodal depth estimation methods ignore the problem of modality dependence, we propose a new training strategy called random modality dropout (RMD) which enables the network to be adaptively trained with multiple modality inputs and inference with specific modality inputs. Benefiting from the flexible network structure and adaptive training method, our proposed network can realize unified training under various modality input conditions. Comprehensive experiments conducted on KITTI and DrivingStereo depth completion datasets demonstrate that our method produces robust results and outperforms other state-of-the-art methods.

引用

页码：5406 / 5419

页数：14

共 58 条

[1]

Akbari H, 2021, ADV NEUR IN

[2] Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos [J].

Alfasly, Saghir ;

Lu, Jian ;

Xu, Chen ;

Zou, Yuru .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :20176-20185

[3]

Amiri Ali Jahani, 2019, 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), P602, DOI 10.1109/ROBIO49542.2019.8961504

[4] Monocular Depth Estimation With Augmented Ordinal Depth Relationships [J].

Cao, Yuanzhouhan ;

Zhao, Tianqi ;

Xian, Ke ;

Shen, Chunhua ;

Cao, Zhiguo ;

Xu, Shugong .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (08) :2674-2682

[5] Pyramid Stereo Matching Network [J].

Chang, Jia-Ren ;

Chen, Yong-Sheng .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5410-5418

[6] Fixing Defect of Photometric Loss for Self-Supervised Monocular Depth Estimation [J].

Chen, Shu ;

Pu, Zhengdong ;

Fan, Xiang ;

Zou, Beiji .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) :1328-1338

[7] Learning Depth with Convolutional Spatial Propagation Network [J].

Cheng, Xinjing ;

Wang, Peng ;

Yang, Ruigang .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (10) :2361-2379

[8]

Cheng XJ, 2020, AAAI CONF ARTIF INTE, V34, P10615

[9]

Cheng Xuelian, HIERARCHICAL NEURAL

[10] Indoor 3D Human Trajectory Reconstruction Using Surveillance Camera Videos and Point Clouds [J].

Dai, Yudi ;

Wen, Chenglu ;

Wu, Hai ;

Guo, Yulan ;

Chen, Longbiao ;

Wang, Cheng .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) :2482-2495

← 1 2 3 4 5 6 →