Image-to-Point Registration via Cross-Modality Correspondence Retrieval

被引:0
|
作者
Bie, Lin [1 ]
Li, Siqi [1 ]
Cheng, Kai [2 ]
机构
[1] Tsinghua Univ, Sch Software, Beijing, Peoples R China
[2] Army Engn Univ, Command Control Coll, Nanjing, Peoples R China
来源
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024 | 2024年
关键词
Image-to-Point Cloud registration; cross-modality correspondence retrieval; frustum point retrieval; combined correspondence retrieval; virtual point cloud;
D O I
10.1145/3652583.3658074
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-to-Point Cloud registration between 2D images and 3D LiDAR point clouds is a significant task in computer vision. The traditional registration pipeline first establishes correspondences between images and point clouds and then performs pose estimation based on the generated matches. However, 2D-3D correspondences are inherently difficult to be established due to the large modality gap between images and LiDAR point clouds. To this end, we build a bridge to alleviate the 2D-3D modality gap, which aligns LiDAR point clouds to the virtual points generated by images. In this way, the modality gap can be alleviated to the domain gap of different types of point clouds, i.e. original point clouds and virtual point clouds. Concretely, our framework conducts feature fusion from the LiDAR and virtual point cloud by utilizing the Transformer layer. To relieve the domain gap, a frustum points retrieval module and a combined correspondences retrieval module are proposed based on the consistency of the feature and position descriptor to select the correct correspondences among the candidates, which are generated from the simultaneous retrieval of features and position descriptors. In the implementation procedure, we design a frustum retrieval loss and a combined correspondence retrieval loss for cross-modality correspondence retrieval. Experimental results and comparison with state-of-the-art Image-to-Point Cloud methods on KITTI and nuScenes datasets demonstrate our proposed method has achieved superior performance.
引用
收藏
页码:266 / 274
页数:9
相关论文
共 50 条
  • [31] Implicit relative attribute enabled cross-modality hashing for face image-video retrieval
    Peng Dai
    Xue Wang
    Weihang Zhang
    Pengbo Zhang
    Wei You
    Multimedia Tools and Applications, 2018, 77 : 23547 - 23577
  • [32] CurrI2P: inter- and intra-modality similarity curriculum learning for image-to-point cloud registration
    Lin, Liwei
    Lin, Chunyu
    Nie, Lang
    Huang, Shujuan
    Zhao, Yao
    VISUAL COMPUTER, 2025,
  • [33] Sequential Discrete Hashing for Scalable Cross-Modality Similarity Retrieval
    Liu, Li
    Lin, Zijia
    Shao, Ling
    Shen, Fumin
    Ding, Guiguang
    Han, Jungong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (01) : 107 - 118
  • [34] FF-LOGO: Cross-Modality Point Cloud Registration with Feature Filtering and Local to Global Optimization
    Ma, Nan
    Wang, Mohan
    Han, Yiheng
    Liu, Yong-Jin
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 744 - 750
  • [35] Interactive Image Segmentation with Cross-Modality Vision Transformers
    Li, Kun
    Vosselman, George
    Yang, Michael Ying
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 762 - 772
  • [36] Learning cross-modality features for image caption generation
    Zeng, Chao
    Kwong, Sam
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2022, 13 (07) : 2059 - 2070
  • [37] Cross-Modality Contrastive Learning for Hyperspectral Image Classification
    Hang, Renlong
    Qian, Xuwei
    Liu, Qingshan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [38] Cross-modality Attention Method for Medical Image Enhancement
    Hu, Zebin
    Liu, Hao
    Li, Zhendong
    Yu, Zekuan
    PATTERN RECOGNITION AND COMPUTER VISION,, PT III, 2021, 13021 : 411 - 423
  • [39] Detail-Enhanced Cross-Modality Face Synthesis via Guided Image Filtering
    Dang, Yunqi
    Li, Feng
    Li, Zhaoxin
    Zuo, Wangmeng
    COMPUTER VISION, CCCV 2015, PT I, 2015, 546 : 200 - 209
  • [40] Cross-Modality Bridging and Knowledge Transferring for Image Understanding
    Yan, Chenggang
    Li, Liang
    Zhang, Chunjie
    Liu, Bingtao
    Zhang, Yongdong
    Dai, Qionghai
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (10) : 2675 - 2685