SA-MVSNet: Self-attention-based multi-view stereo network for 3D reconstruction of images with weak texture

被引：7

作者：

Yang, Ronghao ^{[1
]}

Miao, Wang ^{[1
]}

Zhang, Zhenxin ^{[2
,3
]}

Liu, Zhenlong ^{[1
]}

Li, Mubai ^{[2
,3
]}

Lin, Bin ^{[1
]}

机构：

[1] Chengdu Univ Technol, Coll Earth Sci, Chengdu 610059, Sichuan, Peoples R China

[2] Capital Normal Univ, Key Lab 3D Informat Acquisit & Applicat, MOE, Beijing 100048, Peoples R China

[3] Capital Normal Univ, Coll Resource Environm & Tourism, Beijing 100048, Peoples R China

来源：

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2024年 / 131卷

基金：

北京市自然科学基金; 中国国家自然科学基金;

关键词：

Multi-view stereo; Depth estimation; Self-attention; Transformer; Weak texture; Adaptive propagation;

D O I：

10.1016/j.engappai.2023.107800

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-view stereo (MVS) reconstruction is a key task of image-based 3D reconstruction, and deep learning-based methods can achieve better results than traditional algorithms. However, most of the current deep learning-based MVS methods use convolutional neural networks (CNNs) to extract image features, which cannot achieve the aggregation of long-distance context information and capture robust global information. In addition, in the process of fusing depth maps into point clouds, the confidence filters will filter out the depth values with low confidence in weak texture areas. These problems will lead to the low completeness of 3D reconstruction of weak texture and texture-less areas. To address the above problems, this paper proposes SA-MVSNet based on the PatchmatchNet with a self-attentive mechanism. First, we design a coarse-to-fine network framework to advance depth map estimation. In the feature extraction network, a module with a pyramid structure based on Swin Transformer Block is used to replace the original Feature Pyramid Network (FPN), and the self-correlation between weak texture areas is enhanced by applying a global self-attention mechanism. Then, we also propose a self-attention-based adaptive propagation module (SA-AP), which applies a self-attention calculation within depth value propagation window to obtain the relative weight values of current pixel and others, and then adaptively samples the depth values of neighbors on the same surface for propagation. Experiments show that SA-MVSNet has significantly improved the completeness of 3D reconstruction for the images with weak texture on DTU (provided by Danish Technical University), BlendedMVS, and Tanks and Temple datasets. Our code is available at https://github.com/miaowang525/SA-MVSNet.

引用

页数：15

共 45 条

[1] Large-Scale Data for Multiple-View Stereopsis [J].

Aanaes, Henrik ;

Jensen, Rasmus Ramsbol ;

Vogiatzis, George ;

Tola, Engin ;

Dahl, Anders Bjorholm .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2016, 120 (02) :153-168

[2]

Cao CJ, 2022, Arxiv, DOI [arXiv:2208.02541, 10.48550/arXiv.2208.02541 2208.02541, DOI 10.48550/ARXIV.2208.025412208.02541]

[3] Point-Based Multi-View Stereo Network [J].

Chen, Rui ;

Han, Songfang ;

Xu, Jing ;

Su, Hao .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :1538-1547

[4] Deep Stereo using Adaptive Thin Volume Representation with Uncertainty Awareness [J].

Cheng, Shuo ;

Xu, Zexiang ;

Zhu, Shilin ;

Li, Zhuwen ;

Li, Li Erran ;

Ramamoorthi, Ravi ;

Su, Hao .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :2521-2531

[5] A space-sweep approach to true multi-image matching [J].

Collins, RT .

1996 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1996, :358-363

[6] Deformable Convolutional Networks [J].

Dai, Jifeng ;

Qi, Haozhi ;

Xiong, Yuwen ;

Li, Yi ;

Zhang, Guodong ;

Hu, Han ;

Wei, Yichen .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773

[7]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[8] TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers [J].

Ding, Yikang ;

Yuan, Wentao ;

Zhu, Qingtian ;

Zhang, Haotian ;

Liu, Xiangyue ;

Wang, Yuanjiang ;

Liu, Xiao .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :8575-8584

[9]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[10] DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch [J].

Duggal, Shivam ;

Wang, Shenlong ;

Ma, Wei-Chiu ;

Hu, Rui ;

Urtasun, Raquel .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4383-4392

← 1 2 3 4 5 →