GaussianGrasper: 3D Language Gaussian Splatting for Open-Vocabulary Robotic Grasping

被引：0

作者：

Zheng, Yuhang ^{[1
,2
]}

Chen, Xiangyu ^{[3
]}

Zheng, Yupeng ^{[4
]}

Gu, Songen ^{[5
]}

Yang, Runyi ^{[6
]}

Jin, Bu ^{[4
]}

Li, Pengfei ^{[5
]}

Zhong, Chengliang ^{[5
]}

Wang, Zengmao ^{[7
]}

Liu, Lina ^{[8
]}

Yang, Chao ^{[9
]}

Wang, Dawei ^{[10
]}

Chen, Zhen ^{[3
]}

Long, Xiaoxiao ^{[10
]}

Wang, Meiqing ^{[1
]}

机构：

[1] Beihang Univ, SMEA, Haidian 100191, Peoples R China

[2] EncoSmart, Haidian 100191, Peoples R China

[3] EncoSmart, Beijing 100083, Peoples R China

[4] Chinese Acad Sci CASIA, Inst Automat, Haidian 100190, Peoples R China

[5] Tsinghua Univ, AIR, Haidian 100190, Peoples R China

[6] Imperial Coll London, London SW7 2AZ, England

[7] Wuhan Univ, Wuhan 430072, Peoples R China

[8] China Mobile Res Inst, Xicheng 100053, Peoples R China

[9] Shanghai AI Lab, Shanghai 200232, Peoples R China

[10] Univ Hong Kong, Hong Kong, Peoples R China

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2024年 / 9卷 / 09期

基金：

中国国家自然科学基金;

关键词：

Language-guided robotic manipulation; 3D Gaussian splatting; language feature field;

D O I：

10.1109/LRA.2024.3432348

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Constructing a 3D scene capable of accommodating open-ended language queries, is a pivotal pursuit in the domain of robotics, which facilitates robots in executing object manipulations based on human language directives. To achieve this, some research efforts have been dedicated to the development of language-embedded implicit fields. However, implicit fields (e.g. NeRF) encounter limitations due to the necessity of taking images from a larger number of viewpoints for reconstruction, coupled with their inherent inefficiencies in inference. Furthermore, these methods directly distill patch-level 2D features, leading to ambiguous segmentation boundaries. Thus, we present the GaussianGrasper, which uses 3D Gaussian Splatting (3DGS) to explicitly represent the scene as a set of Gaussian primitives and is capable of real-time rendering. Our approach takes RGB-D images from limited viewpoints as input and uses an Efficient Feature Distillation (EFD) module that employs contrastive learning to efficiently distill 2D language embeddings and constraint consistency of feature embeddings. With the reconstructed geometry of the Gaussian field, our method enables the pre-trained grasping model to generate collision-free grasp pose candidates. Furthermore, we propose a normal-guided grasp module to select the best grasp pose. Through comprehensive real-world experiments, we demonstrate that GaussianGrasper enables robots to accurately locate and grasp objects according to language instructions, providing a new solution for language-guided grasping tasks.

引用

页码：7827 / 7834

页数：8

共 50 条

[1] Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding
Shi, Jin-Chuan
Wang, Miao
Duan, Hao-Bin
Guan, Shao-Hua
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 5333 - 5343
[2] LangSplat: 3D Language Gaussian Splatting
Qin, Minghan
Li, Wanhua
Zhou, Jiawei
Wang, Haoqian
Pfister, Hanspeter
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 20051 - 20060
[3] PLA: Language-Driven Open-Vocabulary 3D Scene Understanding
Ding, Runyu
Yang, Jihan
Xue, Chuhui
Zhang, Wenqing
Bai, Song
Qi, Xiaojuan
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7010 - 7019
[4] Weakly Supervised 3D Open-vocabulary Segmentation
Liu, Kunhao
Zhan, Fangneng
Zhang, Jiahui
Xu, Muyu
Yu, Yingchen
El Saddik, Abdulmotaleb
Theobalt, Christian
Xing, Eric
Lu, Shijian
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[5] LANGUAGE-DRIVEN OPEN-VOCABULARY 3D SEMANTIC SEGMENTATION WITH KNOWLEDGE DISTILLATION
Wu, Yuting
Han, Xian-Feng
Xiao, Guoqiang
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 3320 - 3324
[6] Open-Vocabulary Affordance Detection in 3D Point Clouds
Toan Nguyen
Minh Nhat Vu
An Vuong
Dzung Nguyen
Thieu Vo
Ngan Le
Anh Nguyen
2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 5692 - 5698
[7] Search3D: Hierarchical Open-Vocabulary 3D Segmentation
Takmaz, Ayca
Delitzas, Alexandros
Sumner, Robert W.
Engelmann, Francis
Wald, Johanna
Tombari, Federico
IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (03): : 2558 - 2565
[8] 3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
Xiao, Zihao
Jing, Longlong
Wu, Shangxuan
Zhu, Alex Zihao
Ji, Jingwei
Jiang, Chiyu Max
Hung, Wei-Chih
Funkhouser, Thomas
Kuo, Weicheng
Angelova, Anelia
Zhou, Yin
Sheng, Shiwei
COMPUTER VISION - ECCV 2024, PT XL, 2025, 15098 : 21 - 38
[9] OpenMask3D: Open-Vocabulary 3D Instance Segmentation
Takmaz, Ayca
Fedele, Elisabetta
Sumner, Robert W.
Pollefeys, Marc
Tombari, Federico
Engelmann, Francis
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[10] Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
Li, Ruihuang
Zhang, Zhengqiang
He, Chenheng
Ma, Zhiyuan
Patel, Vishal M.
Zhang, Lei
COMPUTER VISION - ECCV 2024, PT XLIX, 2025, 15107 : 416 - 434

← 1 2 3 4 5 →