STTR-3D: Stereo Transformer 3D Network for Video-Based Disparity Change Estimation

被引：0

作者：

Yang, Qitong ^{[1
]}

Rakai, Lionel ^{[1
]}

Sun, Shijie ^{[1
]}

Song, Huansheng ^{[1
]}

Song, Xiangyu ^{[2
]}

Akhtar, Naveed ^{[3
]}

机构：

[1] Changan Univ, Xian 710000, Shaanxi, Peoples R China

[2] Swinburne Univ Technol, Hawthorn, Vic 3122, Australia

[3] Univ Western Australia, Crawley, WA 6009, Australia

来源：

WEB AND BIG DATA, PT IV, APWEB-WAIM 2023 | 2024年 / 14334卷

基金：

中国国家自然科学基金;

关键词：

Stereo Estimation; Disparity Change; Scene Flow; Optimal Transport;

D O I：

10.1007/978-981-97-2421-5_15

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the field of computer vision and stereo depth estimation, there has been little research in obtaining high-accuracy disparity change maps from two-dimensional images. This map offers information that fills the gap between optical flow and depth which is desirable for numerous academic research problems and industrial applications, such as navigation systems, driving assistance, and autonomous systems. We introduce STTR3D, a 3D extension of the STereo TRansformer (STTR) which leverages transformers and an attention mechanism to handle stereo depth estimation. We further make use of the Scene Flow Flying-Things-3D dataset which openly includes data for disparity change and apply 1) refinements through the use of MLP over relative position encoding and 2) regression head with an entropy-regularized optimal transport to obtain a disparity change map. This model consistently demonstrates superior performance for depth estimation tasks as compared to the original model. Compared to the existing supervised learning methods for estimating stereo depth, our technique simultaneously handles disparity estimation and the disparity change problem with an end-to-end network, also establishing that the addition of our transformer yields improved performance that achieves high precision for both issues.

引用

页码：217 / 231

页数：15

共 44 条

[31] SuperGlue: Learning Feature Matching with Graph Neural Networks [J].

Sarlin, Paul-Edouard ;

DeTone, Daniel ;

Malisiewicz, Tomasz ;

Rabinovich, Andrew .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :4937-4946

[32] Context-Aware Depth and Pose Estimation for Bronchoscopic Navigation [J].

Shen, Mali ;

Gu, Yun ;

Liu, Ning ;

Yang, Guang-Zhong .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2019, 4 (02) :732-739

[33]

Vallender SS., 1974, THEORY PROBABILITY I, V18, P784, DOI [DOI 10.1137/1118101, 10.1137/1118101]

[34] MODEL: Motif-Based Deep Feature Learning for Link Prediction [J].

Wang, Lei ;

Ren, Jing ;

Xu, Bo ;

Li, Jianxin ;

Luo, Wei ;

Xia, Feng .

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2020, 7 (02) :503-516

[35]

Wang ZR, 2020, IEEE WINT CONF APPL, P91, DOI 10.1109/WACV45572.2020.9093302

[36] Event-Based Stereo Depth Estimation Using Belief Propagation [J].

Xie, Zhen ;

Chen, Shengyong ;

Orchard, Garrick .

FRONTIERS IN NEUROSCIENCE, 2017, 11

[37] Uncertainty-Aware Multiview Deep Learning for Internet of Things Applications [J].

Xu, Cai ;

Zhao, Wei ;

Zhao, Jinglong ;

Guan, Ziyu ;

Song, Xiangyu ;

Li, Jianxin .

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (02) :1456-1466

[38]

Xu C, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P3933

[39] Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation [J].

Xu, Dan ;

Wang, Wei ;

Tang, Hao ;

Liu, Hong ;

Sebe, Nicu ;

Ricci, Elisa .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3917-3925

[40] AANet: Adaptive Aggregation Network for Efficient Stereo Matching [J].

Xu, Haofei ;

Zhang, Juyong .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :1956-1965

← 1 2 3 4 5 →