Multi-scale feature fusion attention of stereo vision depth recovery network based on Swin Transformer

被引：0

作者：

Zou, Changjun ^{[1
]}

机构：

[1] East China Jiaotong Univ, Sch Informat & Software Engn, Nanchang, Peoples R China

来源：

KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS | 2025年 / 19卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Depth recovery; Stereo vision; Multi-scale; Swin transformer; Structural Similarity loss function;

D O I：

10.3837/tiis.2025.01.007

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Stereo vision can be applied to obtain the depth of the current image by taking left and right images through binocular cameras. The foundation of traditional depth recovery approaches is disparity matching, which necessitates intricate registration algorithms and a significant amount of computation. Moreover, effective disparity matching is hard to be established for images with weak or repeated textures. We therefore proposed a multi-scale U-shaped Swin transformer structure based binocular visual depth recovery backbone network. It has a larger receptive field, allowing it to extract both local and global feature more effectively. A new loss function takes into account the SSIM as well as the L1 loss was proposed and validated, which allows for more accurate depth restoration. By combining the correlation disparity cost volume and our new loss function, depth recovery can be accomplished efficiently. Our tests on the dataset such as Middlebury, ETH3D and Cityspace have achieved excellent results, demonstrating the advantages of our proposed approach, and its PSNR/SSIM/L1 performance improved significantly, especially on the bonn dataset, as compared to the second-place results, by 5.06%/3.00%/19.50%, respectively.

引用

页码：149 / 166

页数：18

共 35 条

[1] Al-Najjar Y. A. Y., 2012, International Journal of Scientific and Engineering Research, V3, P1
[2] Pyramid Stereo Matching Network
Chang, Jia-Ren
Chen, Yong-Sheng
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5410 - 5418
[3] Uncertainty-Driven Dense Two-View Structure From Motion
Chen, Weirong
Kumar, Suryansh
Yu, Fisher
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (03) : 1763 - 1770
[4] The Cityscapes Dataset for Semantic Urban Scene Understanding
Cordts, Marius
Omran, Mohamed
Ramos, Sebastian
Rehfeld, Timo
Enzweiler, Markus
Benenson, Rodrigo
Franke, Uwe
Roth, Stefan
Schiele, Bernt
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
[5] A novel image denoising algorithm combining attention mechanism and residual UNet network
Ding, Shifei
Wang, Qidong
Guo, Lili
Zhang, Jian
Ding, Ling
[J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (01) : 581 - 611
[6] Dosovitskiy A., 2021, ARXIV, P1, DOI 10.48550/ARXIV.2010.11929
[7] Trustworthy Artificial Intelligence Requirements in the Autonomous Driving Domain
Fernandez-Llorca, David
Gomez, Emilia
[J]. COMPUTER, 2023, 56 (02) : 29 - 39
[8] Group-wise Correlation Stereo Network
Guo, Xiaoyang
Yang, Kai
Yang, Wukui
Wang, Xiaogang
Li, Hongsheng
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3268 - 3277
[9] Literature Survey on Stereo Vision Disparity Map Algorithms
Hamzah, Rostam Affendi
Ibrahim, Haidi
[J]. JOURNAL OF SENSORS, 2016, 2016
[10] HMSM-Net: Hierarchical multi-scale matching network for disparity estimation of high-resolution satellite stereo images
He, Sheng
Li, Shenhong
Jiang, San
Jiang, Wanshou
[J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 188 : 314 - 330

← 1 2 3 4 →