Multi-dimensional Attention Feature Aggregation Stereo Matching Algorithm

被引:0
|
作者
Zhang Y.-R. [1 ]
Kong Y.-T. [2 ]
Liu B. [1 ]
机构
[1] School of Information Science and Engineering, Yanshan University, Qinhuangdao
[2] School of Electrical Engineering, Yanshan University, Qinhuangdao
来源
Zidonghua Xuebao/Acta Automatica Sinica | 2022年 / 48卷 / 07期
关键词
Deep learning; information interaction; multi-dimensional attention mechanism; stereo matching;
D O I
10.16383/j.aas.c200778
中图分类号
学科分类号
摘要
Existing deep learning-based stereo matching algorithms lack effective information interaction in the learning and reasoning process, and there is difference in feature dimension between feature extraction and cost aggregation, resulting in less and single application of attention methods in stereo matching networks. In order to solve these problems, a multi-dimensional attention feature aggregation stereo matching algorithm was proposed. The two-dimensional (2D) attention residual module is designed by introducing the adaptive 2D attention residual unit without dimensionality reduction into the original residual network. Local cross-channel interaction and extraction of salient information provide abundant and effective features for matching cost calculation. The three-dimensional (3D) attention hourglass aggregation module is constructed by designing a 3D attention hourglass unit with a stacked hourglass structure as the backbone. It captures multi-scale geometric context information and expand the multi-dimensional attention mechanism, adaptively aggregating and recalibrating cost volumes from different network depths. The proposed algorithm is evaluated on three standard datasets and compared with related algorithms. The experimental results show that the proposed algorithm has higher accuracy in predicting disparity and has better effect on unobstructed salient objects. © 2022 Science Press. All rights reserved.
引用
收藏
页码:1805 / 1815
页数:10
相关论文
共 37 条
  • [1] Feng D, Rosenbaum L, Dietmayer K., Towards safe autonomous driving: capture uncertainty in the deep neural network for lidar 3D vehicle detection, Proceedings of the 21st International Conference on Intelligent Transportation Systems, pp. 3266-3273, (2018)
  • [2] Schmid K, Tomic T, Ruess F, Hirschmuller H, Suppa M., Stereo vision based indoor/outdoor navigation for flying robots, Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3955-3962, (2013)
  • [3] Li Pei-Xuan, Liu Peng-Fei, Cao Fei-Dao, Zhao Huai-Ci, Weight-adaptive cross-scale algorithm for stereo matching, Acta Optica Sinica, 38, 12, pp. 248-253, (2018)
  • [4] Han Xian-Jun, Liu Yan-Li, Yang Hong-Yu, A stereo matching algorithm guided by multiple linear regression, Journal of Computer-Aided Design and Computer Graphics, 31, 1, pp. 84-93, (2019)
  • [5] Zagoruyko S, Komodakis N., Learning to compare image patches via convolutional neural networks, Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 4353-4361, (2015)
  • [6] Luo W, Schwing A G, Urtasun R., Efficient deep learning for stereo matching, Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 5695-5703, (2016)
  • [7] Long J, Shelhamer E, Darrell T., Fully convolutional networks for semantic segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 4, pp. 640-651, (2017)
  • [8] Mayer N, Ilg E, Hausser P, Fischer P, Cremers D, Dosovitskiy A, Brox T., A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation, Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 4040-4048, (2016)
  • [9] Song X, Zhao X, Fang L J, Hu H W., Edgestereo: An effective multi-task learning network for stereo matching and edge detection, International Journal of Computer Vision, 128, 4, pp. 910-930, (2020)
  • [10] Song X, Zhao X, Hu H W, Fang L J., Edgestereo: A context integrated residual pyramid network for stereo matching, Proceedings of the 14th Asian Conference on Computer Vision, 11365, pp. 20-35, (2018)