SA-MVSNet: Self-attention-based multi-view stereo network for 3D reconstruction of images with weak texture

被引：4

作者：

Yang, Ronghao ^{[1
]}

Miao, Wang ^{[1
]}

Zhang, Zhenxin ^{[2
,3
]}

Liu, Zhenlong ^{[1
]}

Li, Mubai ^{[2
,3
]}

Lin, Bin ^{[1
]}

机构：

[1] Chengdu Univ Technol, Coll Earth Sci, Chengdu 610059, Sichuan, Peoples R China

[2] Capital Normal Univ, Key Lab 3D Informat Acquisit & Applicat, MOE, Beijing 100048, Peoples R China

[3] Capital Normal Univ, Coll Resource Environm & Tourism, Beijing 100048, Peoples R China

来源：

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2024年 / 131卷

基金：

北京市自然科学基金; 中国国家自然科学基金;

关键词：

Multi-view stereo; Depth estimation; Self-attention; Transformer; Weak texture; Adaptive propagation;

D O I：

10.1016/j.engappai.2023.107800

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-view stereo (MVS) reconstruction is a key task of image-based 3D reconstruction, and deep learning-based methods can achieve better results than traditional algorithms. However, most of the current deep learning-based MVS methods use convolutional neural networks (CNNs) to extract image features, which cannot achieve the aggregation of long-distance context information and capture robust global information. In addition, in the process of fusing depth maps into point clouds, the confidence filters will filter out the depth values with low confidence in weak texture areas. These problems will lead to the low completeness of 3D reconstruction of weak texture and texture-less areas. To address the above problems, this paper proposes SA-MVSNet based on the PatchmatchNet with a self-attentive mechanism. First, we design a coarse-to-fine network framework to advance depth map estimation. In the feature extraction network, a module with a pyramid structure based on Swin Transformer Block is used to replace the original Feature Pyramid Network (FPN), and the self-correlation between weak texture areas is enhanced by applying a global self-attention mechanism. Then, we also propose a self-attention-based adaptive propagation module (SA-AP), which applies a self-attention calculation within depth value propagation window to obtain the relative weight values of current pixel and others, and then adaptively samples the depth values of neighbors on the same surface for propagation. Experiments show that SA-MVSNet has significantly improved the completeness of 3D reconstruction for the images with weak texture on DTU (provided by Danish Technical University), BlendedMVS, and Tanks and Temple datasets. Our code is available at https://github.com/miaowang525/SA-MVSNet.

引用

页数：15

共 50 条

[31] Multi-View Stereo Network Based on Attention Mechanism and Neural Volume Rendering
Zhu, Daixian
Kong, Haoran
Qiu, Qiang
Ruan, Xiaoman
Liu, Shulin
ELECTRONICS, 2023, 12 (22)
[32] 3D OBJECT RELIGHTING BASED ON MULTI-VIEW STEREO AND IMAGE BASED LIGHTING TECHNIQUES
Yang, Guangwei
Liu, Yebin
ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 934 - +
[33] Maize Plant Phenotyping: Comparing 3D Laser Scanning, Multi-View Stereo Reconstruction, and 3D Digitizing Estimates
Wang, Yongjian
Wen, Weiliang
Wu, Sheng
Wang, Chuanyu
Yu, Zetao
Guo, Xinyu
Zhao, Chunjiang
REMOTE SENSING, 2019, 11 (01)
[34] Uanet: uncertainty-aware cost volume aggregation-based multi-view stereo for 3D reconstruction
Lu, Ping
Cai, Youcheng
Yang, Jiale
Wang, Dong
Wu, Tingting
VISUAL COMPUTER, 2024, : 4567 - 4580
[35] 3D reconstruction and depth estimation method for local anomalies of rail surface based on multi-view stereo matching
Hu, Pengyu
Zhong, Qianwen
Zheng, Shubin
Chen, Xieqi
Peng, Lele
MEASUREMENT SCIENCE AND TECHNOLOGY, 2025, 36 (01)
[36] A Coarse-to-Fine Transformer-Based Network for 3D Reconstruction from Non-Overlapping Multi-View Images
Shan, Yue
Xiao, Jun
Liu, Lupeng
Wang, Yunbiao
Yu, Dongbo
Zhang, Wenniu
REMOTE SENSING, 2024, 16 (05)
[37] SACANet: end-to-end self-attention-based network for 3D clothing animation
Chen, Yunxi
Cao, Yuanjie
Fang, Fei
Huang, Jin
Hu, Xinrong
He, Ruhan
Zhang, Junjie
VISUAL COMPUTER, 2024, : 3829 - 3842
[38] EMVS: Event-Based Multi-View Stereo—3D Reconstruction with an Event Camera in Real-Time
Henri Rebecq
Guillermo Gallego
Elias Mueggler
Davide Scaramuzza
International Journal of Computer Vision, 2018, 126 : 1394 - 1414
[39] MVLayoutNet: 3D Layout Reconstruction with Multi-view Panoramas
Hu, Zhihua
Duan, Bo
Zhang, Yanfeng
Sun, Mingwei
Huang, Jingwei
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1289 - 1298
[40] TMSDNet: Transformer with multi-scale dense network for single and multi-view 3D reconstruction
Zhu, Xiaoqiang
Yao, Xinsheng
Zhang, Junjie
Zhu, Mengyao
You, Lihua
Yang, Xiaosong
Zhang, Jianjun
Zhao, He
Zeng, Dan
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2024, 35 (01)

← 1 2 3 4 5 →