Context-Aware Multi-view Stereo Network for Efficient Edge-Preserving Depth Estimation

被引：0

作者：

Su, Wanjuan ^{[1
]}

Tao, Wenbing ^{[1
]}

机构：

[1] Huazhong Univ Sci & Technol, Sch Artificial Intelligence & Automat, Natl Key Lab Sci & Technol Multispectral Informat, Wuhan 430074, Peoples R China

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2025年

基金：

中国国家自然科学基金;

关键词：

Multi-view stereo; Depth estimation; Depth refinement; 3D dense reconstruction; Correspondence matching;

D O I：

10.1007/s11263-024-02337-8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning-based multi-view stereo methods have achieved great progress in recent years by employing the coarse-to-fine depth estimation framework. However, existing methods still encounter difficulties in recovering depth in featureless areas, object boundaries, and thin structures which mainly due to the poor distinguishability of matching clues in low-textured regions, the inherently smooth properties of 3D convolution neural networks used for cost volume regularization, and information loss of the coarsest scale features. To address these issues, we propose a Context-Aware multi-view stereo Network (CANet) that leverages contextual cues in images to achieve efficient edge-preserving depth estimation. The structural self-similarity information in the reference view is exploited by the introduced self-similarity attended cost aggregation module to perform long-range dependencies modeling in the cost volume, which can boost the matchability of featureless regions. The context information in the reference view is subsequently utilized to progressively refine multi-scale depth estimation through the proposed hierarchical edge-preserving residual learning module, resulting in delicate depth estimation at edges. To enrich features at the coarsest scale by making it focus more on delicate areas, a focal selection module is presented which can enhance the recovery of initial depth with finer details such as thin structure. By integrating the strategies above into the well-designed lightweight cascade framework, CANet achieves superior performance and efficiency trade-offs. Extensive experiments show that the proposed method achieves state-of-the-art performance with fast inference speed and low memory usage. Notably, CANet ranks first on challenging Tanks and Temples advanced dataset and ETH3D high-res benchmark among all published learning-based methods.

引用

页数：25

共 50 条

[1] Efficient Edge-Preserving Multi-View Stereo Network for Depth Estimation
Su, Wanjuan
Tao, Wenbing
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 2348 - 2356
[2] TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers
Ding, Yikang
Yuan, Wentao
Zhu, Qingtian
Zhang, Haotian
Liu, Xiangyue
Wang, Yuanjiang
Liu, Xiao
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8575 - 8584
[3] Edge-Aware Spatial Propagation Network for Multi-view Depth Estimation
Siyuan Xu
Qingshan Xu
Wanjuan Su
Wenbing Tao
Neural Processing Letters, 2023, 55 : 10905 - 10923
[4] Edge-Aware Spatial Propagation Network for Multi-view Depth Estimation
Xu, Siyuan
Xu, Qingshan
Su, Wanjuan
Tao, Wenbing
NEURAL PROCESSING LETTERS, 2023, 55 (08) : 10905 - 10923
[5] Uncertainty Guided Multi-View Stereo Network for Depth Estimation
Su, Wanjuan
Xu, Qingshan
Tao, Wenbing
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (11) : 7796 - 7808
[6] Continuous Depth Estimation for Multi-view Stereo
Liu, Yebin
Cao, Xun
Dai, Qionghai
Xu, Wenli
CVPR: 2009 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-4, 2009, : 2121 - 2128
[7] Efficient Edge-Preserving Stereo Matching
Cigla, Cevahir
Alatan, A. Aydin
2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCV WORKSHOPS), 2011,
[8] Multi-view learning for context-aware extractive summarization
Yang, Zhenyu
Yang, Jie
Yecies, Brian
Li, Wanqing
2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2020, : 1762 - 1769
[9] Context-Aware Multi-View Summarization Network for Image-Text Matching
Qu, Leigang
Liu, Meng
Cao, Da
Nie, Liqiang
Tian, Qi
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1047 - 1055
[10] Unsupervised multi-view stereo network based on multi-stage depth estimation
Qi, Shuai
Sang, Xinzhu
Yan, Binbin
Wang, Peng
Chen, Duo
Wang, Huachun
Ye, Xiaoqian
IMAGE AND VISION COMPUTING, 2022, 122

← 1 2 3 4 5 →