OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields With Fine-Grained Understanding

被引:1
作者
Deng, Yinan [1 ]
Wang, Jiahui [1 ]
Zhao, Jingyu [1 ]
Dou, Jianyu [1 ]
Yang, Yi [1 ]
Yue, Yufeng [1 ]
机构
[1] Beijing Inst Technol, Sch Automat, Beijing 100811, Peoples R China
来源
IEEE ROBOTICS AND AUTOMATION LETTERS | 2025年 / 10卷 / 01期
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
Neural radiance field; Three-dimensional displays; Feature extraction; Semantics; Visualization; Training; Robots; Image color analysis; Object segmentation; Instance segmentation; Implicit mapping; open-vocabulary; object-level NeRF; representation;
D O I
10.1109/LRA.2024.3511401
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
In recent years, there has been a surge of interest in open-vocabulary 3D scene reconstruction facilitated by visual language models (VLMs), which showcase remarkable capabilities in open-set retrieval tasks. Although the semantic ambiguity of existing point-wise feature maps is alleviated by open-vocabulary mask segmenters for object-level understanding, effectively retaining fine-grained features within objects simultaneously remains challenging. To address these challenges, we introduce OpenObj, an innovative approach to build open-vocabulary object-level Neural Radiance Fields (NeRF) with fine-grained understanding. In essence, OpenObj establishes a robust framework for efficient and watertight scene modeling and comprehension at the object level. Specifically, we obtain cross-frame consistent instance-level masks for supervision through our two-stage mask clustering module. Moreover, by incorporating part-level features into the object NeRF models, OpenObj not only captures object-level instances but also preserves an understanding of their internal granularity. The results on multiple datasets demonstrate that OpenObj achieves superior performance in zero-shot segmentation and retrieval tasks. Additionally, OpenObj supports real-world robotics tasks at several levels, including global movement and local manipulation.
引用
收藏
页码:652 / 659
页数:8
相关论文
共 30 条
[1]  
Abou-Chakra J., 2022, P WORKSH IMPL REPR R
[2]   Panoptic Vision-Language Feature Fields [J].
Chen, Haoran ;
Blomqvist, Kenneth ;
Milano, Francesco ;
Siegwart, Roland .
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (03) :2144-2151
[3]   ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes [J].
Dai, Angela ;
Chang, Angel X. ;
Savva, Manolis ;
Halber, Maciej ;
Funkhouser, Thomas ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2432-2443
[4]   SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes [J].
Delitzas, Alexandros ;
Takmaz, Ayca ;
Tombari, Federico ;
Sumner, Robert ;
Pollefeys, Marc ;
Engelmann, Francis .
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, :14531-14542
[5]   OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments [J].
Deng, Yinan ;
Wang, Jiahui ;
Zhao, Jingyu ;
Tian, Xinyu ;
Chen, Guangyan ;
Yang, Yi ;
Yue, Yufeng .
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (10) :8402-8409
[6]   SEE-CSOM: Sharp-Edged and Efficient Continuous Semantic Occupancy Mapping for Mobile Robots [J].
Deng, Yinan ;
Wang, Meiling ;
Yang, Yi ;
Wang, Danwei ;
Yue, Yufeng .
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2024, 71 (02) :1718-1728
[7]   S-MKI: Incremental Dense Semantic Occupancy Reconstruction Through Multi-Entropy Kernel Inference [J].
Deng, Yinan ;
Wang, Meiling ;
Wang, Danwei ;
Yue, Yufeng .
2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, :3824-3829
[8]   HD-CCSOM: Hierarchical and Dense Collaborative Continuous Semantic Occupancy Mapping through Label Diffusion [J].
Deng, Yinan ;
Wang, Meiling ;
Yang, Yi ;
Yue, Yufeng .
2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, :2417-2422
[9]   GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping [J].
Fang, Hao-Shu ;
Wang, Chenxi ;
Gou, Minghao ;
Lu, Cewu .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11441-11450
[10]   ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning [J].
Gu, Qiao ;
Kuwajerwala, Ali ;
Morin, Sacha ;
Jatavallabhula, Krishna Murthy ;
Sen, Bipasha ;
Agarwal, Aditya ;
Rivera, Cattail ;
Paul, William ;
Ellis, Kirsty ;
Chellappa, Rama ;
Gan, Chuang ;
de Melo, Celso Miguel ;
Tenenbaum, Joshua B. ;
Forralbas, Antonio ;
Shkurti, Florian ;
Paull, Liam .
2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, :5021-5028