RDD: Learning Reinforced 3D Detectors and Descriptors Based on Policy Gradient

被引:0
|
作者
Cui, Wenting [1 ,2 ]
Du, Shaoyi [1 ,2 ]
Yao, Runzhao [1 ,2 ]
Tang, Canhui [1 ,2 ]
Ye, Aixue [3 ]
Wen, Feng [3 ]
Tian, Zhiqiang [4 ]
机构
[1] Xi An Jiao Tong Univ, Natl Engn Res Ctr Visual Informat & Applicat, Natl Key Lab Human Machine Hybrid Augmented Intell, Xian 710049, Peoples R China
[2] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Peoples R China
[3] Huawei Technol Co Ltd, Huawei Noahs Ark Lab, Beijing 100085, Peoples R China
[4] Xi An Jiao Tong Univ, Sch Software Engn, Xian 710049, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Three-dimensional displays; Detectors; Probabilistic logic; Point cloud compression; Training; Computer architecture; Point cloud registration; 3D description and detection; policy gradient; REGISTRATION;
D O I
10.1109/TMM.2023.3338054
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Keypoint detection and descriptor matching are two vital steps in the 3D feature extraction framework, but they are difficult to learn in an end-to-end fashion due to their inherent discreteness. To tackle the non-differentiable operations, we formulate feature extraction as a decision-making problem: the network is treated as a policy pool that can make probabilistic estimations for keypoint selection and feature matching, supervised by maximizing a reward expectation of actions. In this way, we propose a novel end-to-end training paradigm of 3D feature extraction based on the stochastic policy gradient method, named Reinforced Detectors and Descriptors (RDD). Firstly, we propose a local-to-global probabilistic keypoint selection module that formulates the sampling probabilities of keypoints in a local-and-global mechanism to yield sparse and accurate keypoints. Secondly, we regard feature matching as an optimal transport problem and an efficient Sinkhorn method is leveraged to solve the optimal matching probabilities. In particular, we carefully design a reward function and derive gradients of probabilistic actions, thus overcoming the discreteness and providing reinforced supervision signals. Since our reward function is calculated from sampled keypoints rather than from randomly sampled points as in existing methods, the gap between training and inference is bridged. Experimental results demonstrate that our approach exceeds the quality of state-of-the-art methods and shows strong generalization ability. Remarkably, our approach can achieve significantly higher Registration Recall than other advanced methods when aligning scenes with a small number of keypoints, due to our highly accurate and repeatable detector.
引用
收藏
页码:900 / 913
页数:14
相关论文
共 50 条
  • [21] Generative VoxelNet: Learning Energy-Based Models for 3D Shape Synthesis and Analysis
    Xie, Jianwen
    Zheng, Zilong
    Gao, Ruiqi
    Wang, Wenguan
    Zhu, Song-Chun
    Wu, Ying Nian
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (05) : 2468 - 2484
  • [22] Learning 3D Shape Latent for Point Cloud Completion
    Chen, Zhikai
    Long, Fuchen
    Qiu, Zhaofan
    Yao, Ting
    Zhou, Wengang
    Luo, Jiebo
    Mei, Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8717 - 8729
  • [23] Masked Autoencoders in 3D Point Cloud Representation Learning
    Jiang, Jincen
    Lu, Xuequan
    Zhao, Lizhi
    Dazeley, Richard
    Wang, Meili
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 820 - 831
  • [24] D2S: Representing Sparse Descriptors and 3D Coordinates for Camera Relocalization
    Bui, Bach-Thuan
    Bui, Huy Hoang
    Tran, Dinh-Tuan
    Lee, Joo-Ho
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (12): : 11449 - 11456
  • [25] Reinforced knowledge distillation: Multi-class imbalanced classifier based on policy gradient reinforcement learning
    Fan, Saite
    Zhang, Xinmin
    Song, Zhihuan
    NEUROCOMPUTING, 2021, 463 : 422 - 436
  • [26] CLN: Cross-Domain Learning Network for 2D Image-Based 3D Shape Retrieval
    Nie, Weizhi
    Zhao, Yue
    Nie, Jie
    Liu, An-An
    Zhao, Sicheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) : 992 - 1005
  • [27] GNNGO3D: Protein Function Prediction Based on 3D Structure and Functional Hierarchy Learning
    Zhang, Liyuan
    Jiang, Yongquan
    Yang, Yan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (08) : 3867 - 3878
  • [28] AnchorPoint: Query Design for Transformer-Based 3D Object Detection and Tracking
    Liu, Hao
    Ma, Yanni
    Wang, Hanyun
    Zhang, Chaobo
    Guo, Yulan
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (10) : 10988 - 11000
  • [29] A Feature Based Laser SLAM Using Rasterized Images of 3D Point Cloud
    Ali, Waqas
    Liu, Peilin
    Ying, Rendong
    Gong, Zheng
    IEEE SENSORS JOURNAL, 2021, 21 (21) : 24422 - 24430
  • [30] Rethinking Masked Representation Learning for 3D Point Cloud Understanding
    Wang, Chuxin
    Zha, Yixin
    He, Jianfeng
    Yang, Wenfei
    Zhang, Tianzhu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 247 - 262