An evaluation of conventional and deep learning-based image-matching methods on diverse datasets

被引:17
作者
Ji, Shunping [1 ]
Zeng, Chang [1 ]
Zhang, Yongjun [1 ]
Duan, Yulin [2 ]
机构
[1] Wuhan Univ, Sch Remote Sensing & Informat Engn, 129 Luoyu Rd, Wuhan 430079, Peoples R China
[2] Chinese Acad Agr Sci, Inst Agr Resources & Reg Planning, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
deep learning; description; feature point extraction; image matching; similarity measure; PERFORMANCE; FRAMEWORK; DETECTORS; FEATURES;
D O I
10.1111/phor.12445
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
Image matching plays an important role in photogrammetry, computer vision and remote sensing. Modern deep learning-based methods have been proposed for image matching; however, whether they will surpass and take the place of the conventional handcrafted methods in the remote sensing field still remains unclear. A comprehensive evaluation on stereo remote sensing images is also lacking. This paper comprehensively evaluates the performance of conventional and deep learning-based image-matching methods by dividing the matching process into feature point extraction, description and similarity measure on various datasets, including images captured from close-range indoor and outdoor scenarios, unmanned aerial vehicles (UAVs) and satellite platforms. Different combinations of the three steps are evaluated. The experimental results reveal that, first, the performance of the different combinations varies between individual datasets, and it is difficult to determine the best combination. Second, by using more comprehensive indicators on all of the datasets, that is, the average rank and absolute rank, the combination of scale-invariant feature transform (SIFT), ContextDesc and the nearest neighbour distance ratio (NNDR), and also the original SIFT, achieve the best results, and are recommended for use in remote sensing. Third, the deep learning-based Sub-SuperPoint extractor obtains a good performance, and is second only to SIFT. The learning based ContextDesc descriptor is as effective as the SIFT descriptor, and the learning based SuperGlue matcher is not as stable as NNDR, but leads to a few top-performing combinations. Finally, the handcrafted methods are generally faster than the deep learning-based methods, but the efficiency of the latter is acceptable. We conclude that although a full deep learning-based method/combination has not yet beaten the conventional methods, there is still much room for improvement with the deep learning-based methods because large-scale aerial and satellite training datasets remain to be constructed, and specific methods for remote sensing images remain to be developed. The performance of the different combinations of feature extractor, descriptor and similarity measure varies between individual datasets. The combination of SIFT, ContextDesc and NNDR, and also the original SIFT, achieve the best results when using more comprehensive indicators on all the datasets. For extractor, the learning based Sub-SuperPoint is second only to SIFT; for descriptor, learning-based ContextDesc is as effective as the SIFT descriptor; and for matcher, learning-based SuperGlue is not as stable as NNDR.
引用
收藏
页码:137 / 159
页数:23
相关论文
共 58 条
[1]   Large-Scale Data for Multiple-View Stereopsis [J].
Aanaes, Henrik ;
Jensen, Rasmus Ramsbol ;
Vogiatzis, George ;
Tola, Engin ;
Dahl, Anders Bjorholm .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2016, 120 (02) :153-168
[2]   A Comparative Study of Interest Point Performance on a Unique Data Set [J].
Aanaes, Henrik ;
Dahl, Anders Lindbjerg ;
Pedersen, Kim Steenstrup .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2012, 97 (01) :18-35
[3]  
Abdel-Hakim A. E., 2006, Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, P1978, DOI 10.1109/CVPR.2006.95
[4]   HPatches: A benchmark and evaluation of handcrafted and learned local descriptors [J].
Balntas, Vassileios ;
Lenc, Karel ;
Vedaldi, Andrea ;
Mikolajczyk, Krystian .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3852-3861
[5]  
Balntas Vassileios, 2016, Bmvc, P1, DOI DOI 10.5244/C.30.119
[6]   A NEW PRODUCTIVE FRAMEWORK FOR POINT-BASED MATCHING OF OBLIQUE AIRCRAFT AND UAV-BASED IMAGES [J].
Bas, Sila ;
Ok, Ali Ozgun .
PHOTOGRAMMETRIC RECORD, 2021, 36 (175) :252-284
[7]   Speeded-Up Robust Features (SURF) [J].
Bay, Herbert ;
Ess, Andreas ;
Tuytelaars, Tinne ;
Van Gool, Luc .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2008, 110 (03) :346-359
[8]  
Bian J.W., 2019, arXiv
[9]   BRIEF: Binary Robust Independent Elementary Features [J].
Calonder, Michael ;
Lepetit, Vincent ;
Strecha, Christoph ;
Fua, Pascal .
COMPUTER VISION-ECCV 2010, PT IV, 2010, 6314 :778-792
[10]  
CUTURI M., 2013, Advances in neural information processing systems, V26, P2292