TAG: Guidance-Free Open-Vocabulary Semantic Segmentation

被引:0
作者
Kawano, Yasufumi [1 ]
Aoki, Yoshimitsu [1 ]
机构
[1] Keio Univ, Grad Sch Integrated Design Engn, Yokohama, Kanagawa 2238522, Japan
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Semantic segmentation; Training; Databases; Annotations; Task analysis; Semantics; Vocabulary; Computer vision; Classification algorithms; open-vocabulary; zero-guidance;
D O I
10.1109/ACCESS.2024.3418210
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Semantic segmentation is a crucial task in computer vision, where each pixel in an image is classified into a category. However, traditional methods face significant challenges, including the need for pixel-level annotations and extensive training. Furthermore, because supervised learning uses a limited set of predefined categories, models typically struggle with rare classes and cannot recognize new ones. Unsupervised and open-vocabulary segmentation, proposed to tackle these issues, faces challenges, including the inability to assign specific class labels to clusters and the necessity of user-provided text queries for guidance. In this context, we propose a novel approach, TAG which achieves Training, Annotation, and Guidance-free open-vocabulary semantic segmentation. TAG utilizes pre-trained models such as CLIP and DINO to segment images into meaningful categories without additional training or dense annotations. It retrieves class labels from an external database, providing flexibility to adapt to new scenarios. Our TAG achieves state-of-the-art results on PascalVOC, PascalContext and ADE20K for open-vocabulary segmentation without given class names, i.e. improvement of +15.3 mIoU on PascalVOC.
引用
收藏
页码:88322 / 88331
页数:10
相关论文
共 50 条
  • [41] OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments
    Deng, Yinan
    Wang, Jiahui
    Zhao, Jingyu
    Tian, Xinyu
    Chen, Guangyan
    Yang, Yi
    Yue, Yufeng
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (10): : 8402 - 8409
  • [42] Improving Closed and Open-Vocabulary Attribute Prediction Using Transformers
    Khoi Pham
    Kafle, Kushal
    Lin, Zhe
    Ding, Zhihong
    Cohen, Scott
    Tran, Quan
    Shrivastava, Abhinav
    COMPUTER VISION, ECCV 2022, PT XXV, 2022, 13685 : 201 - 219
  • [43] DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments
    Ma, Ji
    Dai, Hongming
    Mu, Yao
    Wu, Pengying
    Wang, Hao
    Chi, Xiaowei
    Fei, Yang
    Zhang, Shanghang
    Liu, Chang
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (09): : 7389 - 7396
  • [44] Open-Vocabulary Affordance Detection in 3D Point Clouds
    Toan Nguyen
    Minh Nhat Vu
    An Vuong
    Dzung Nguyen
    Thieu Vo
    Ngan Le
    Anh Nguyen
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 5692 - 5698
  • [45] OV-DAR: Open-Vocabulary Object Detection and Attributes Recognition
    Chen, Keyan
    Jiang, Xiaolong
    Wang, Haochen
    Yan, Cilin
    Gao, Yan
    Tang, Xu
    Hu, Yao
    Xie, Weidi
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (11) : 5387 - 5409
  • [46] Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting
    Shin, Hyeon-Kyeong
    Han, Hyewon
    Kim, Doyeon
    Chung, Soo-Whan
    Kang, Hong-Goo
    INTERSPEECH 2022, 2022, : 1871 - 1875
  • [47] BGFNet: Semantic Segmentation Network Based on Boundary Guidance
    Sun, Xiao
    Qian, Yurong
    Cao, Ruyi
    Tuerxun, Palidan
    Hu, Zhehao
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [48] 3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
    Xiao, Zihao
    Jing, Longlong
    Wu, Shangxuan
    Zhu, Alex Zihao
    Ji, Jingwei
    Jiang, Chiyu Max
    Hung, Wei-Chih
    Funkhouser, Thomas
    Kuo, Weicheng
    Angelova, Anelia
    Zhou, Yin
    Sheng, Shiwei
    COMPUTER VISION - ECCV 2024, PT XL, 2025, 15098 : 21 - 38
  • [49] Optimized Tokenization Process for Open-Vocabulary Code Completion: An Empirical Study
    Hussain, Yasir
    Huang, Zhiqiu
    Zhou, Yu
    Khan, Izhar Ahmed
    Khan, Nasrullah
    Abbas, Muhammad Zahid
    27TH INTERNATIONAL CONFERENCE ON EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING, EASE 2023, 2023, : 398 - 405
  • [50] Can Identifier Splitting Improve Open-Vocabulary Language Model of Code
    Shi, Jieke
    Yang, Zhou
    He, Junda
    Xu, Bowen
    Lo, David
    2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2022), 2022, : 1134 - 1138