CVCNet: Learning Cost Volume Compression for Efficient Stereo Matching

被引：6

作者：

Guo, Yulan ^{[1
]}

Wang, Yun ^{[1
]}

Wang, Longguang ^{[2
]}

Wang, Zi ^{[3
]}

Cheng, Chen ^{[4
]}

机构：

[1] Sun Yat Sen Univ, Sch Elect & Commun Engn, Shenzhen Campus, Shenzhen 510275, Peoples R China

[2] Aviat Univ Air Force, Changchun 130012, Peoples R China

[3] Natl Univ Def Technol, Coll Aerosp Sci & Engn, Changsha 410073, Peoples R China

[4] Xian Coll Technol, Xian 710049, Shaanxi, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2023年 / 25卷

基金：

中国国家自然科学基金;

关键词：

Computer vision; image matching; machine vision; robot vision systems; stereo vision; DEPTH; TIME;

D O I：

10.1109/TMM.2022.3228169

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

State-of-the-art deep learning based stereo matching algorithms usually rely on full-size cost volumes for highly accurate disparity estimation. The full-size cost volume processes all possible disparity candidates equally without considering their different matching uncertainties. Consequently, considerable redundant computation is involved on those candidates with very low matching uncertainties, making these methods difficult to be deployed in real-time applications. To tackle this problem, we propose CVCNet featuring an adaptive disparity range prediction module (ADR) and a disparity refinement module (DRM). The ADR adaptively predicts pixel-wise disparity range to discard the "unimportant" disparity candidates. It enables our network to obtain a compressed cost volume. Besides, the DRM improves disparity range prediction and refines the predicted disparity map. With the proposed modules, our CVCNet learns to build a compressed cost volume to achieve efficient disparity estimation. Experimental results on the KITTI and SceneFlow datasets show that our method achieves state-of-the-art performance, and runs at a significant order of magnitude faster speed than existing 3D CNN based methods. Particularly, our method ranks 1st on the KITTI 2012 and KITTI 2015 benchmarks among all published methods with running time shorter than 100 ms.

引用

页码：7786 / 7799

页数：14

共 53 条

[1] Real-Time, Full 3-D Reconstruction of Moving Foreground Objects From Multiple Consumer Depth Cameras [J].

Alexiadis, Dimitrios S. ;

Zarpalas, Dimitrios ;

Daras, Petros .

IEEE TRANSACTIONS ON MULTIMEDIA, 2013, 15 (02) :339-358

[2] An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision [J].

Boykov, Y ;

Kolmogorov, V .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (09) :1124-1137

[3] Pyramid Stereo Matching Network [J].

Chang, Jia-Ren ;

Chen, Yong-Sheng .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5410-5418

[4] Variational Fusion of Time-of-Flight and Stereo Data for Depth Estimation Using Edge-Selective Joint Filtering [J].

Chen, Baoliang ;

Jung, Cheolkon ;

Zhang, Zhendong .

IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (11) :2882-2890

[5] On the Over-Smoothing Problem of CNN Based Disparity Estimation [J].

Chen, Chuangrong ;

Chen, Xiaozhi ;

Cheng, Hui .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :8996-9004

[6] Deep Stereo using Adaptive Thin Volume Representation with Uncertainty Awareness [J].

Cheng, Shuo ;

Xu, Zexiang ;

Zhu, Shilin ;

Li, Zhuwen ;

Li, Li Erran ;

Ramamoorthi, Ravi ;

Su, Hao .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :2521-2531

[7]

Cheng Xuelian, HIERARCHICAL NEURAL

[8] FlowNet: Learning Optical Flow with Convolutional Networks [J].

Dosovitskiy, Alexey ;

Fischer, Philipp ;

Ilg, Eddy ;

Haeusser, Philip ;

Hazirbas, Caner ;

Golkov, Vladimir ;

van der Smagt, Patrick ;

Cremers, Daniel ;

Brox, Thomas .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2758-2766

[9] DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch [J].

Duggal, Shivam ;

Wang, Shenlong ;

Ma, Wei-Chiu ;

Hu, Rui ;

Urtasun, Raquel .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4383-4392

[10] Live three-dimensional content for augmented reality [J].

Farbiz, F ;

Cheok, AD ;

Wei, L ;

ZhiYing, Z ;

Ke, X ;

Prince, S ;

Billinghurst, M ;

Kato, H .

IEEE TRANSACTIONS ON MULTIMEDIA, 2005, 7 (03) :514-523

← 1 2 3 4 5 6 →