Panoptic Vision-Language Feature Fields

被引:5
作者
Chen, Haoran [1 ]
Blomqvist, Kenneth [1 ]
Milano, Francesco [1 ]
Siegwart, Roland [1 ]
机构
[1] Swiss Fed Inst Technol, Autonomous Syst Lab, CH-8092 Zurich, Switzerland
基金
欧盟地平线“2020”;
关键词
Semantics; Three-dimensional displays; Semantic segmentation; Self-supervised learning; Instance segmentation; Image reconstruction; Computational modeling; Semantic scene understanding; deep learning for visual perception; 3D open vocabulary panoptic segmentation; neural implicit representation;
D O I
10.1109/LRA.2024.3354624
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Recently, methods have been proposed for 3D open-vocabulary semantic segmentation. Such methods are able to segment scenes into arbitrary classes based on text descriptions provided during runtime. In this letter, we propose to the best of our knowledge the first algorithm for open-vocabulary panoptic segmentation in 3D scenes. Our algorithm, Panoptic Vision-Language Feature Fields (PVLFF), learns a semantic feature field of the scene by distilling vision-language features from a pretrained 2D model, and jointly fits an instance feature field through contrastive learning using 2D instance segments on input frames. Despite not being trained on the target classes, our method achieves panoptic segmentation performance similar to the state-of-the-art closed-set 3D systems on the HyperSim, ScanNet and Replica dataset and additionally outperforms current 3D open-vocabulary systems in terms of semantic segmentation. We ablate the components of our method to demonstrate the effectiveness of our model architecture.
引用
收藏
页码:2144 / 2151
页数:8
相关论文
共 45 条
[1]   Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields [J].
Barron, Jonathan T. ;
Mildenhall, Ben ;
Tancik, Matthew ;
Hedman, Peter ;
Martin-Brualla, Ricardo ;
Srinivasan, Pratul P. .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :5835-5844
[2]  
Bhalgat Y., 2023, PROC ADV NEURAL INFO
[3]  
Bing L., 2023, INPROC INT C LEARNRE
[4]   Neural Implicit Vision-Language Feature Fields [J].
Blomqvist, Kenneth ;
Milano, Francesco ;
Chung, Jen Jen ;
Ott, Lionel ;
Siegwart, Roland .
2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, :1313-1318
[5]  
Blomqvist L., 2022, IEEERSJ INT C INTELL, P7629
[6]  
Brohan A., 2023, C ROBOT LEARNING, P287
[7]   TensoRF: Tensorial Radiance Fields [J].
Chen, Anpei ;
Xu, Zexiang ;
Geiger, Andreas ;
Yu, Jingyi ;
Su, Hao .
COMPUTER VISION - ECCV 2022, PT XXXII, 2022, 13692 :333-350
[8]   Masked-attention Mask Transformer for Universal Image Segmentation [J].
Cheng, Bowen ;
Misra, Ishan ;
Schwing, Alexander G. ;
Kirillov, Alexander ;
Girdhar, Rohit .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :1280-1289
[9]   Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation [J].
Cheng, Bowen ;
Collins, Maxwell D. ;
Zhu, Yukun ;
Liu, Ting ;
Huang, Thomas S. ;
Adam, Hartwig ;
Chen, Liang-Chieh .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :12472-12482
[10]  
Dahnert M, 2021, ADV NEUR IN, V34