Panoptic Vision-Language Feature Fields

被引:5
作者
Chen, Haoran [1 ]
Blomqvist, Kenneth [1 ]
Milano, Francesco [1 ]
Siegwart, Roland [1 ]
机构
[1] Swiss Fed Inst Technol, Autonomous Syst Lab, CH-8092 Zurich, Switzerland
基金
欧盟地平线“2020”;
关键词
Semantics; Three-dimensional displays; Semantic segmentation; Self-supervised learning; Instance segmentation; Image reconstruction; Computational modeling; Semantic scene understanding; deep learning for visual perception; 3D open vocabulary panoptic segmentation; neural implicit representation;
D O I
10.1109/LRA.2024.3354624
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Recently, methods have been proposed for 3D open-vocabulary semantic segmentation. Such methods are able to segment scenes into arbitrary classes based on text descriptions provided during runtime. In this letter, we propose to the best of our knowledge the first algorithm for open-vocabulary panoptic segmentation in 3D scenes. Our algorithm, Panoptic Vision-Language Feature Fields (PVLFF), learns a semantic feature field of the scene by distilling vision-language features from a pretrained 2D model, and jointly fits an instance feature field through contrastive learning using 2D instance segments on input frames. Despite not being trained on the target classes, our method achieves panoptic segmentation performance similar to the state-of-the-art closed-set 3D systems on the HyperSim, ScanNet and Replica dataset and additionally outperforms current 3D open-vocabulary systems in terms of semantic segmentation. We ablate the components of our method to demonstrate the effectiveness of our model architecture.
引用
收藏
页码:2144 / 2151
页数:8
相关论文
共 45 条
[21]   Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation [J].
Kundu, Abhijit ;
Genova, Kyle ;
Yin, Xiaoqi ;
Fathi, Alireza ;
Pantofaru, Caroline ;
Guibas, Leonidas ;
Tagliasacchi, Andrea ;
Dellaert, Frank ;
Funkhouser, Thomas .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :12861-12871
[22]   MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [J].
Lambert, John ;
Liu, Zhuang ;
Sener, Ozan ;
Hays, James ;
Koltun, Vladlen .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :2876-2885
[23]  
Li B., 2022, PROC INT C LEARN REP, P11207
[24]   Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP [J].
Liang, Feng ;
Wu, Bichen ;
Dai, Xiaoliang ;
Li, Kunpeng ;
Zhao, Yinan ;
Zhang, Hang ;
Zhang, Peizhao ;
Vajda, Peter ;
Marculescu, Diana .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :7061-7070
[25]   Microsoft COCO: Common Objects in Context [J].
Lin, Tsung-Yi ;
Maire, Michael ;
Belongie, Serge ;
Hays, James ;
Perona, Pietro ;
Ramanan, Deva ;
Dollar, Piotr ;
Zitnick, C. Lawrence .
COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 :740-755
[26]  
Liu Kunhao, 2023, Advances in Neural Information Processing Systems
[27]   3D-to-2D Distillation for Indoor Scene Parsing [J].
Liu, Zhengzhe ;
Qi, Xiaojuan ;
Fu, Chi-Wing .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :4462-4472
[28]   Accelerated Hierarchical Density Based Clustering [J].
McInnes, Leland ;
Healy, John .
2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2017), 2017, :33-42
[29]  
Mildenhall B, 2022, COMMUN ACM, V65, P99, DOI 10.1145/3503250
[30]   Instant Neural Graphics Primitives with a Multiresolution Hash Encoding [J].
Mueller, Thomas ;
Evans, Alex ;
Schied, Christoph ;
Keller, Alexander .
ACM TRANSACTIONS ON GRAPHICS, 2022, 41 (04)