Multi-scale feature fusion attention of stereo vision depth recovery network based on Swin Transformer

被引:0
作者
Zou, Changjun [1 ]
机构
[1] East China Jiaotong Univ, Sch Informat & Software Engn, Nanchang, Peoples R China
来源
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS | 2025年 / 19卷 / 01期
基金
中国国家自然科学基金;
关键词
Depth recovery; Stereo vision; Multi-scale; Swin transformer; Structural Similarity loss function;
D O I
10.3837/tiis.2025.01.007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Stereo vision can be applied to obtain the depth of the current image by taking left and right images through binocular cameras. The foundation of traditional depth recovery approaches is disparity matching, which necessitates intricate registration algorithms and a significant amount of computation. Moreover, effective disparity matching is hard to be established for images with weak or repeated textures. We therefore proposed a multi-scale U-shaped Swin transformer structure based binocular visual depth recovery backbone network. It has a larger receptive field, allowing it to extract both local and global feature more effectively. A new loss function takes into account the SSIM as well as the L1 loss was proposed and validated, which allows for more accurate depth restoration. By combining the correlation disparity cost volume and our new loss function, depth recovery can be accomplished efficiently. Our tests on the dataset such as Middlebury, ETH3D and Cityspace have achieved excellent results, demonstrating the advantages of our proposed approach, and its PSNR/SSIM/L1 performance improved significantly, especially on the bonn dataset, as compared to the second-place results, by 5.06%/3.00%/19.50%, respectively.
引用
收藏
页码:149 / 166
页数:18
相关论文
共 35 条
  • [1] Al-Najjar Y. A. Y., 2012, International Journal of Scientific and Engineering Research, V3, P1
  • [2] Pyramid Stereo Matching Network
    Chang, Jia-Ren
    Chen, Yong-Sheng
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5410 - 5418
  • [3] Uncertainty-Driven Dense Two-View Structure From Motion
    Chen, Weirong
    Kumar, Suryansh
    Yu, Fisher
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (03) : 1763 - 1770
  • [4] The Cityscapes Dataset for Semantic Urban Scene Understanding
    Cordts, Marius
    Omran, Mohamed
    Ramos, Sebastian
    Rehfeld, Timo
    Enzweiler, Markus
    Benenson, Rodrigo
    Franke, Uwe
    Roth, Stefan
    Schiele, Bernt
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
  • [5] A novel image denoising algorithm combining attention mechanism and residual UNet network
    Ding, Shifei
    Wang, Qidong
    Guo, Lili
    Zhang, Jian
    Ding, Ling
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (01) : 581 - 611
  • [6] Dosovitskiy A., 2021, ARXIV, P1, DOI 10.48550/ARXIV.2010.11929
  • [7] Trustworthy Artificial Intelligence Requirements in the Autonomous Driving Domain
    Fernandez-Llorca, David
    Gomez, Emilia
    [J]. COMPUTER, 2023, 56 (02) : 29 - 39
  • [8] Group-wise Correlation Stereo Network
    Guo, Xiaoyang
    Yang, Kai
    Yang, Wukui
    Wang, Xiaogang
    Li, Hongsheng
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3268 - 3277
  • [9] Literature Survey on Stereo Vision Disparity Map Algorithms
    Hamzah, Rostam Affendi
    Ibrahim, Haidi
    [J]. JOURNAL OF SENSORS, 2016, 2016
  • [10] HMSM-Net: Hierarchical multi-scale matching network for disparity estimation of high-resolution satellite stereo images
    He, Sheng
    Li, Shenhong
    Jiang, San
    Jiang, Wanshou
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 188 : 314 - 330