SA-MVSNet: Self-attention-based multi-view stereo network for 3D reconstruction of images with weak texture

被引:4
|
作者
Yang, Ronghao [1 ]
Miao, Wang [1 ]
Zhang, Zhenxin [2 ,3 ]
Liu, Zhenlong [1 ]
Li, Mubai [2 ,3 ]
Lin, Bin [1 ]
机构
[1] Chengdu Univ Technol, Coll Earth Sci, Chengdu 610059, Sichuan, Peoples R China
[2] Capital Normal Univ, Key Lab 3D Informat Acquisit & Applicat, MOE, Beijing 100048, Peoples R China
[3] Capital Normal Univ, Coll Resource Environm & Tourism, Beijing 100048, Peoples R China
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
Multi-view stereo; Depth estimation; Self-attention; Transformer; Weak texture; Adaptive propagation;
D O I
10.1016/j.engappai.2023.107800
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-view stereo (MVS) reconstruction is a key task of image-based 3D reconstruction, and deep learning-based methods can achieve better results than traditional algorithms. However, most of the current deep learning-based MVS methods use convolutional neural networks (CNNs) to extract image features, which cannot achieve the aggregation of long-distance context information and capture robust global information. In addition, in the process of fusing depth maps into point clouds, the confidence filters will filter out the depth values with low confidence in weak texture areas. These problems will lead to the low completeness of 3D reconstruction of weak texture and texture-less areas. To address the above problems, this paper proposes SA-MVSNet based on the PatchmatchNet with a self-attentive mechanism. First, we design a coarse-to-fine network framework to advance depth map estimation. In the feature extraction network, a module with a pyramid structure based on Swin Transformer Block is used to replace the original Feature Pyramid Network (FPN), and the self-correlation between weak texture areas is enhanced by applying a global self-attention mechanism. Then, we also propose a self-attention-based adaptive propagation module (SA-AP), which applies a self-attention calculation within depth value propagation window to obtain the relative weight values of current pixel and others, and then adaptively samples the depth values of neighbors on the same surface for propagation. Experiments show that SA-MVSNet has significantly improved the completeness of 3D reconstruction for the images with weak texture on DTU (provided by Danish Technical University), BlendedMVS, and Tanks and Temple datasets. Our code is available at https://github.com/miaowang525/SA-MVSNet.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Multi-View Stereo Network Based on Attention Mechanism and Neural Volume Rendering
    Zhu, Daixian
    Kong, Haoran
    Qiu, Qiang
    Ruan, Xiaoman
    Liu, Shulin
    ELECTRONICS, 2023, 12 (22)
  • [32] 3D OBJECT RELIGHTING BASED ON MULTI-VIEW STEREO AND IMAGE BASED LIGHTING TECHNIQUES
    Yang, Guangwei
    Liu, Yebin
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 934 - +
  • [33] Maize Plant Phenotyping: Comparing 3D Laser Scanning, Multi-View Stereo Reconstruction, and 3D Digitizing Estimates
    Wang, Yongjian
    Wen, Weiliang
    Wu, Sheng
    Wang, Chuanyu
    Yu, Zetao
    Guo, Xinyu
    Zhao, Chunjiang
    REMOTE SENSING, 2019, 11 (01)
  • [34] Uanet: uncertainty-aware cost volume aggregation-based multi-view stereo for 3D reconstruction
    Lu, Ping
    Cai, Youcheng
    Yang, Jiale
    Wang, Dong
    Wu, Tingting
    VISUAL COMPUTER, 2024, : 4567 - 4580
  • [35] 3D reconstruction and depth estimation method for local anomalies of rail surface based on multi-view stereo matching
    Hu, Pengyu
    Zhong, Qianwen
    Zheng, Shubin
    Chen, Xieqi
    Peng, Lele
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2025, 36 (01)
  • [36] A Coarse-to-Fine Transformer-Based Network for 3D Reconstruction from Non-Overlapping Multi-View Images
    Shan, Yue
    Xiao, Jun
    Liu, Lupeng
    Wang, Yunbiao
    Yu, Dongbo
    Zhang, Wenniu
    REMOTE SENSING, 2024, 16 (05)
  • [37] SACANet: end-to-end self-attention-based network for 3D clothing animation
    Chen, Yunxi
    Cao, Yuanjie
    Fang, Fei
    Huang, Jin
    Hu, Xinrong
    He, Ruhan
    Zhang, Junjie
    VISUAL COMPUTER, 2024, : 3829 - 3842
  • [38] EMVS: Event-Based Multi-View Stereo—3D Reconstruction with an Event Camera in Real-Time
    Henri Rebecq
    Guillermo Gallego
    Elias Mueggler
    Davide Scaramuzza
    International Journal of Computer Vision, 2018, 126 : 1394 - 1414
  • [39] MVLayoutNet: 3D Layout Reconstruction with Multi-view Panoramas
    Hu, Zhihua
    Duan, Bo
    Zhang, Yanfeng
    Sun, Mingwei
    Huang, Jingwei
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1289 - 1298
  • [40] TMSDNet: Transformer with multi-scale dense network for single and multi-view 3D reconstruction
    Zhu, Xiaoqiang
    Yao, Xinsheng
    Zhang, Junjie
    Zhu, Mengyao
    You, Lihua
    Yang, Xiaosong
    Zhang, Jianjun
    Zhao, He
    Zeng, Dan
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2024, 35 (01)