length Shape-Former: Bridging CNN and Transformer via ShapeConv for multimodal image matching

被引:50
作者
Chen, Jiaxuan [1 ]
Chen, Xiaoxian [1 ]
Chen, Shuang [1 ,3 ]
Liu, Yuyan [1 ]
Rao, Yujing [1 ,2 ]
Yang, Yang [1 ,2 ]
Wang, Haifeng [1 ]
Wu, Dan [1 ]
机构
[1] Yunnan Normal Univ, Lab Pattern Recognit & Artificial Intelligence, Kunming 650500, Peoples R China
[2] Yunnan Normal Univ, Sch Informat Sci & Technol, Kunming 650500, Peoples R China
[3] Fudan Univ, Dept Environm Sci & Engn, Shanghai 200000, Peoples R China
关键词
Feature matching; Deep learning; Shape-Former; Multimodal image matching; Registration and fusion; NETWORK; CONSENSUS; LOCALITY; MODEL;
D O I
10.1016/j.inffus.2022.10.030
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As with any data fusion task, the front-end of the pipeline for image fusion, aiming to collect multitudinous physical properties from multimodal images taken by different types of sensors, requires registering the overlapped content of two images via image matching. In other words, the accuracy of image matching will influence directly the subsequent fusion results. In this work, we propose a hybrid correspondence learning architecture, termed as Shape-Former, which is capable of solving matching problems such as multimodal, and multiview cases. Existing attempts have trouble capturing intricate feature interactions for seeking good correspondence, if the image pairs simultaneously suffer from geometric and radiation distortion. To address this, our key is to take advantage of convolutional neural network (CNN) and Transformer for enhancing structure consensus representation ability. Specifically, we introduce a novel ShapeConv so that CNN and Transformer can be generalized to sparse matches learning. Furthermore, we provide a robust soft estimation of outliers mechanism for filtering the response of outliers before capturing shape features. Finally, we also propose coupling multiple consensus representations to further solve the context conflict problems such as local ambiguity. Experiments with variety of datasets reveal that our Shape-Former outperforms state-of-the-art on multimodal image matching, and shows promising generalization ability to different types of image deformations.
引用
收藏
页码:445 / 457
页数:13
相关论文
共 74 条
[1]  
[Anonymous], 1988, P ALV VIS C
[2]  
[Anonymous], 2008, COMPUT VIS IMAGE UND
[3]   PointNetLK: Robust & Efficient Point Cloud Registration using PointNet [J].
Aoki, Yasuhiro ;
Goforth, Hunter ;
Srivatsan, Rangaprasad Arun ;
Lucey, Simon .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7156-7165
[4]   An Unsupervised Learning Model for Deformable Medical Image Registration [J].
Balakrishnan, Guha ;
Zhao, Amy ;
Sabuncu, Mert R. ;
Guttag, John ;
Dalca, Adrian V. .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :9252-9260
[5]   MAGSAC plus plus , a fast, reliable and accurate robust estimator [J].
Barath, Daniel ;
Noskova, Jana ;
Ivashechkin, Maksym ;
Matas, Jiri .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :1301-1309
[6]   MAGSAC: Marginalizing Sample Consensus [J].
Barath, Daniel ;
Matas, Jiri ;
Noskova, Jana .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :10189-10197
[7]   GMS: Grid-Based Motion Statistics for Fast, Ultra-robust Feature Correspondence [J].
Bian, Jia-Wang ;
Lin, Wen-Yan ;
Liu, Yun ;
Zhang, Le ;
Yeung, Sai-Kit ;
Cheng, Ming-Ming ;
Reid, Ian .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (06) :1580-1593
[8]   GMS: Grid-based Motion Statistics for Fast, Ultra-robust Feature Correspondence [J].
Bian, JiaWang ;
Lin, Wen-Yan ;
Matsushita, Yasuyuki ;
Yeung, Sai-Kit ;
Nguyen, Tan-Dat ;
Cheng, Ming-Ming .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2828-2837
[9]   Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses [J].
Brachmann, Eric ;
Rother, Carsten .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4321-4330
[10]   DSAC - Differentiable RANSAC for Camera Localization [J].
Brachmann, Eric ;
Krull, Alexander ;
Nowozin, Sebastian ;
Shotton, Jamie ;
Michel, Frank ;
Gumhold, Stefan ;
Rother, Carsten .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2492-2500