GaussianGrasper: 3D Language Gaussian Splatting for Open-Vocabulary Robotic Grasping

被引:0
|
作者
Zheng, Yuhang [1 ,2 ]
Chen, Xiangyu [3 ]
Zheng, Yupeng [4 ]
Gu, Songen [5 ]
Yang, Runyi [6 ]
Jin, Bu [4 ]
Li, Pengfei [5 ]
Zhong, Chengliang [5 ]
Wang, Zengmao [7 ]
Liu, Lina [8 ]
Yang, Chao [9 ]
Wang, Dawei [10 ]
Chen, Zhen [3 ]
Long, Xiaoxiao [10 ]
Wang, Meiqing [1 ]
机构
[1] Beihang Univ, SMEA, Haidian 100191, Peoples R China
[2] EncoSmart, Haidian 100191, Peoples R China
[3] EncoSmart, Beijing 100083, Peoples R China
[4] Chinese Acad Sci CASIA, Inst Automat, Haidian 100190, Peoples R China
[5] Tsinghua Univ, AIR, Haidian 100190, Peoples R China
[6] Imperial Coll London, London SW7 2AZ, England
[7] Wuhan Univ, Wuhan 430072, Peoples R China
[8] China Mobile Res Inst, Xicheng 100053, Peoples R China
[9] Shanghai AI Lab, Shanghai 200232, Peoples R China
[10] Univ Hong Kong, Hong Kong, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Language-guided robotic manipulation; 3D Gaussian splatting; language feature field;
D O I
10.1109/LRA.2024.3432348
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Constructing a 3D scene capable of accommodating open-ended language queries, is a pivotal pursuit in the domain of robotics, which facilitates robots in executing object manipulations based on human language directives. To achieve this, some research efforts have been dedicated to the development of language-embedded implicit fields. However, implicit fields (e.g. NeRF) encounter limitations due to the necessity of taking images from a larger number of viewpoints for reconstruction, coupled with their inherent inefficiencies in inference. Furthermore, these methods directly distill patch-level 2D features, leading to ambiguous segmentation boundaries. Thus, we present the GaussianGrasper, which uses 3D Gaussian Splatting (3DGS) to explicitly represent the scene as a set of Gaussian primitives and is capable of real-time rendering. Our approach takes RGB-D images from limited viewpoints as input and uses an Efficient Feature Distillation (EFD) module that employs contrastive learning to efficiently distill 2D language embeddings and constraint consistency of feature embeddings. With the reconstructed geometry of the Gaussian field, our method enables the pre-trained grasping model to generate collision-free grasp pose candidates. Furthermore, we propose a normal-guided grasp module to select the best grasp pose. Through comprehensive real-world experiments, we demonstrate that GaussianGrasper enables robots to accurately locate and grasp objects according to language instructions, providing a new solution for language-guided grasping tasks.
引用
收藏
页码:7827 / 7834
页数:8
相关论文
共 50 条
  • [1] Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding
    Shi, Jin-Chuan
    Wang, Miao
    Duan, Hao-Bin
    Guan, Shao-Hua
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 5333 - 5343
  • [2] LangSplat: 3D Language Gaussian Splatting
    Qin, Minghan
    Li, Wanhua
    Zhou, Jiawei
    Wang, Haoqian
    Pfister, Hanspeter
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 20051 - 20060
  • [3] PLA: Language-Driven Open-Vocabulary 3D Scene Understanding
    Ding, Runyu
    Yang, Jihan
    Xue, Chuhui
    Zhang, Wenqing
    Bai, Song
    Qi, Xiaojuan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7010 - 7019
  • [4] Weakly Supervised 3D Open-vocabulary Segmentation
    Liu, Kunhao
    Zhan, Fangneng
    Zhang, Jiahui
    Xu, Muyu
    Yu, Yingchen
    El Saddik, Abdulmotaleb
    Theobalt, Christian
    Xing, Eric
    Lu, Shijian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] LANGUAGE-DRIVEN OPEN-VOCABULARY 3D SEMANTIC SEGMENTATION WITH KNOWLEDGE DISTILLATION
    Wu, Yuting
    Han, Xian-Feng
    Xiao, Guoqiang
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 3320 - 3324
  • [6] Open-Vocabulary Affordance Detection in 3D Point Clouds
    Toan Nguyen
    Minh Nhat Vu
    An Vuong
    Dzung Nguyen
    Thieu Vo
    Ngan Le
    Anh Nguyen
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 5692 - 5698
  • [7] Search3D: Hierarchical Open-Vocabulary 3D Segmentation
    Takmaz, Ayca
    Delitzas, Alexandros
    Sumner, Robert W.
    Engelmann, Francis
    Wald, Johanna
    Tombari, Federico
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (03): : 2558 - 2565
  • [8] 3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
    Xiao, Zihao
    Jing, Longlong
    Wu, Shangxuan
    Zhu, Alex Zihao
    Ji, Jingwei
    Jiang, Chiyu Max
    Hung, Wei-Chih
    Funkhouser, Thomas
    Kuo, Weicheng
    Angelova, Anelia
    Zhou, Yin
    Sheng, Shiwei
    COMPUTER VISION - ECCV 2024, PT XL, 2025, 15098 : 21 - 38
  • [9] OpenMask3D: Open-Vocabulary 3D Instance Segmentation
    Takmaz, Ayca
    Fedele, Elisabetta
    Sumner, Robert W.
    Pollefeys, Marc
    Tombari, Federico
    Engelmann, Francis
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [10] Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
    Li, Ruihuang
    Zhang, Zhengqiang
    He, Chenheng
    Ma, Zhiyuan
    Patel, Vishal M.
    Zhang, Lei
    COMPUTER VISION - ECCV 2024, PT XLIX, 2025, 15107 : 416 - 434