RDD: Learning Reinforced 3D Detectors and Descriptors Based on Policy Gradient

被引：0

作者：

Cui, Wenting ^{[1
,2
]}

Du, Shaoyi ^{[1
,2
]}

Yao, Runzhao ^{[1
,2
]}

Tang, Canhui ^{[1
,2
]}

Ye, Aixue ^{[3
]}

Wen, Feng ^{[3
]}

Tian, Zhiqiang ^{[4
]}

机构：

[1] Xi An Jiao Tong Univ, Natl Engn Res Ctr Visual Informat & Applicat, Natl Key Lab Human Machine Hybrid Augmented Intell, Xian 710049, Peoples R China

[2] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Peoples R China

[3] Huawei Technol Co Ltd, Huawei Noahs Ark Lab, Beijing 100085, Peoples R China

[4] Xi An Jiao Tong Univ, Sch Software Engn, Xian 710049, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2025年 / 27卷

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Three-dimensional displays; Detectors; Probabilistic logic; Point cloud compression; Training; Computer architecture; Point cloud registration; 3D description and detection; policy gradient; REGISTRATION;

D O I：

10.1109/TMM.2023.3338054

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Keypoint detection and descriptor matching are two vital steps in the 3D feature extraction framework, but they are difficult to learn in an end-to-end fashion due to their inherent discreteness. To tackle the non-differentiable operations, we formulate feature extraction as a decision-making problem: the network is treated as a policy pool that can make probabilistic estimations for keypoint selection and feature matching, supervised by maximizing a reward expectation of actions. In this way, we propose a novel end-to-end training paradigm of 3D feature extraction based on the stochastic policy gradient method, named Reinforced Detectors and Descriptors (RDD). Firstly, we propose a local-to-global probabilistic keypoint selection module that formulates the sampling probabilities of keypoints in a local-and-global mechanism to yield sparse and accurate keypoints. Secondly, we regard feature matching as an optimal transport problem and an efficient Sinkhorn method is leveraged to solve the optimal matching probabilities. In particular, we carefully design a reward function and derive gradients of probabilistic actions, thus overcoming the discreteness and providing reinforced supervision signals. Since our reward function is calculated from sampled keypoints rather than from randomly sampled points as in existing methods, the gap between training and inference is bridged. Experimental results demonstrate that our approach exceeds the quality of state-of-the-art methods and shows strong generalization ability. Remarkably, our approach can achieve significantly higher Registration Recall than other advanced methods when aligning scenes with a small number of keypoints, due to our highly accurate and repeatable detector.

引用

页码：900 / 913

页数：14

共 50 条

[21] Generative VoxelNet: Learning Energy-Based Models for 3D Shape Synthesis and Analysis
Xie, Jianwen
Zheng, Zilong
Gao, Ruiqi
Wang, Wenguan
Zhu, Song-Chun
Wu, Ying Nian
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (05) : 2468 - 2484
[22] Learning 3D Shape Latent for Point Cloud Completion
Chen, Zhikai
Long, Fuchen
Qiu, Zhaofan
Yao, Ting
Zhou, Wengang
Luo, Jiebo
Mei, Tao
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8717 - 8729
[23] Masked Autoencoders in 3D Point Cloud Representation Learning
Jiang, Jincen
Lu, Xuequan
Zhao, Lizhi
Dazeley, Richard
Wang, Meili
IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 820 - 831
[24] D2S: Representing Sparse Descriptors and 3D Coordinates for Camera Relocalization
Bui, Bach-Thuan
Bui, Huy Hoang
Tran, Dinh-Tuan
Lee, Joo-Ho
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (12): : 11449 - 11456
[25] Reinforced knowledge distillation: Multi-class imbalanced classifier based on policy gradient reinforcement learning
Fan, Saite
Zhang, Xinmin
Song, Zhihuan
NEUROCOMPUTING, 2021, 463 : 422 - 436
[26] CLN: Cross-Domain Learning Network for 2D Image-Based 3D Shape Retrieval
Nie, Weizhi
Zhao, Yue
Nie, Jie
Liu, An-An
Zhao, Sicheng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) : 992 - 1005
[27] GNNGO3D: Protein Function Prediction Based on 3D Structure and Functional Hierarchy Learning
Zhang, Liyuan
Jiang, Yongquan
Yang, Yan
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (08) : 3867 - 3878
[28] AnchorPoint: Query Design for Transformer-Based 3D Object Detection and Tracking
Liu, Hao
Ma, Yanni
Wang, Hanyun
Zhang, Chaobo
Guo, Yulan
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (10) : 10988 - 11000
[29] A Feature Based Laser SLAM Using Rasterized Images of 3D Point Cloud
Ali, Waqas
Liu, Peilin
Ying, Rendong
Gong, Zheng
IEEE SENSORS JOURNAL, 2021, 21 (21) : 24422 - 24430
[30] Rethinking Masked Representation Learning for 3D Point Cloud Understanding
Wang, Chuxin
Zha, Yixin
He, Jianfeng
Yang, Wenfei
Zhang, Tianzhu
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 247 - 262

← 1 2 3 4 5 →