STTR-3D: Stereo Transformer 3D Network for Video-Based Disparity Change Estimation

被引:0
作者
Yang, Qitong [1 ]
Rakai, Lionel [1 ]
Sun, Shijie [1 ]
Song, Huansheng [1 ]
Song, Xiangyu [2 ]
Akhtar, Naveed [3 ]
机构
[1] Changan Univ, Xian 710000, Shaanxi, Peoples R China
[2] Swinburne Univ Technol, Hawthorn, Vic 3122, Australia
[3] Univ Western Australia, Crawley, WA 6009, Australia
来源
WEB AND BIG DATA, PT IV, APWEB-WAIM 2023 | 2024年 / 14334卷
基金
中国国家自然科学基金;
关键词
Stereo Estimation; Disparity Change; Scene Flow; Optimal Transport;
D O I
10.1007/978-981-97-2421-5_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the field of computer vision and stereo depth estimation, there has been little research in obtaining high-accuracy disparity change maps from two-dimensional images. This map offers information that fills the gap between optical flow and depth which is desirable for numerous academic research problems and industrial applications, such as navigation systems, driving assistance, and autonomous systems. We introduce STTR3D, a 3D extension of the STereo TRansformer (STTR) which leverages transformers and an attention mechanism to handle stereo depth estimation. We further make use of the Scene Flow Flying-Things-3D dataset which openly includes data for disparity change and apply 1) refinements through the use of MLP over relative position encoding and 2) regression head with an entropy-regularized optimal transport to obtain a disparity change map. This model consistently demonstrates superior performance for depth estimation tasks as compared to the original model. Compared to the existing supervised learning methods for estimating stereo depth, our technique simultaneously handles disparity estimation and the disparity change problem with an end-to-end network, also establishing that the addition of our transformer yields improved performance that achieves high precision for both issues.
引用
收藏
页码:217 / 231
页数:15
相关论文
共 44 条
[1]   Bidirectional Attention Network for Monocular Depth Estimation [J].
Aich, Shubhra ;
Vianney, Jean Marie Uwabeza ;
Islam, Md Amirul ;
Kaur, Mannat ;
Liu, Bingbing .
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, :11746-11752
[2]   Bi3D: Stereo Depth Estimation via Binary Classifications [J].
Badki, Abhishek ;
Troccoli, Alejandro ;
Kim, Kihwan ;
Kautz, Jan ;
Sen, Pradeep ;
Gallo, Orazio .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :1597-1605
[3]   PointFlowNet: Learning Representations for Rigid Motion Estimation from Point Clouds [J].
Behl, Aseem ;
Paschalidou, Despoina ;
Donne, Simon ;
Geiger, Andreas .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7954-7963
[4]   Bounding Boxes, Segmentations and Object Coordinates: How Important is Recognition for 3D Scene Flow Estimation in Autonomous Driving Scenarios? [J].
Behl, Aseem ;
Jafari, Omid Hosseini ;
Mustikovela, Siva Karthik ;
Abu Alhaija, Hassan ;
Rother, Carsten ;
Geiger, Andreas .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2593-2602
[5]   Pyramid Stereo Matching Network [J].
Chang, Jia-Ren ;
Chen, Yong-Sheng .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5410-5418
[6]  
Diamantas S. C., 2010, IEEE, P426, DOI DOI 10.1109/IST.2010.5548483
[7]   Outlier detection and disparity refinement in stereo matching [J].
Dong, Qicong ;
Feng, Jieqing .
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 60 :380-390
[8]   A Comprehensive Survey on Multi-View Clustering [J].
Fang, Uno ;
Li, Man ;
Li, Jianxin ;
Gao, Longxiang ;
Jia, Tao ;
Zhang, Yanchun .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (12) :12350-12368
[9]   Robust image clustering via context-aware contrastive graph learning [J].
Fang, Uno ;
Li, Jianxin ;
Lu, Xuequan ;
Mian, Ajmal ;
Gu, Zhaoquan .
PATTERN RECOGNITION, 2023, 138
[10]   Correlating driver gaze with the road scene for driver assistance systems [J].
Fletcher, L ;
Loy, G ;
Barnes, N ;
Zelinsky, A .
ROBOTICS AND AUTONOMOUS SYSTEMS, 2005, 52 (01) :71-84