TAG: Guidance-Free Open-Vocabulary Semantic Segmentation

被引:0
|
作者
Kawano, Yasufumi [1 ]
Aoki, Yoshimitsu [1 ]
机构
[1] Keio Univ, Grad Sch Integrated Design Engn, Yokohama, Kanagawa 2238522, Japan
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Semantic segmentation; Training; Databases; Annotations; Task analysis; Semantics; Vocabulary; Computer vision; Classification algorithms; open-vocabulary; zero-guidance;
D O I
10.1109/ACCESS.2024.3418210
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Semantic segmentation is a crucial task in computer vision, where each pixel in an image is classified into a category. However, traditional methods face significant challenges, including the need for pixel-level annotations and extensive training. Furthermore, because supervised learning uses a limited set of predefined categories, models typically struggle with rare classes and cannot recognize new ones. Unsupervised and open-vocabulary segmentation, proposed to tackle these issues, faces challenges, including the inability to assign specific class labels to clusters and the necessity of user-provided text queries for guidance. In this context, we propose a novel approach, TAG which achieves Training, Annotation, and Guidance-free open-vocabulary semantic segmentation. TAG utilizes pre-trained models such as CLIP and DINO to segment images into meaningful categories without additional training or dense annotations. It retrieves class labels from an external database, providing flexibility to adapt to new scenarios. Our TAG achieves state-of-the-art results on PascalVOC, PascalContext and ADE20K for open-vocabulary segmentation without given class names, i.e. improvement of +15.3 mIoU on PascalVOC.
引用
收藏
页码:88322 / 88331
页数:10
相关论文
共 50 条
  • [1] Class Enhancement Losses With Pseudo Labels for Open-Vocabulary Semantic Segmentation
    Dao, Son Duy
    Shi, Hengcan
    Phung, Dinh
    Cai, Jianfei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8442 - 8453
  • [2] Enhancing Open-Vocabulary Semantic Segmentation with Prototype Retrieval
    Barsellotti, Luca
    Amoroso, Roberto
    Baraldi, Lorenzo
    Cucchiara, Rita
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT II, 2023, 14234 : 196 - 208
  • [3] Image-text aggregation for open-vocabulary semantic segmentation
    Cheng, Shengyang
    Huang, Jianyong
    Wang, Xiaodong
    Huang, Lei
    Wei, Zhiqiang
    NEUROCOMPUTING, 2025, 630
  • [4] SAN: Side Adapter Network for Open-Vocabulary Semantic Segmentation
    Xu, Mengde
    Zhang, Zheng
    Wei, Fangyun
    Hu, Han
    Bai, Xiang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 15546 - 15561
  • [5] LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation
    Shi, Hengcan
    Dao, Son Duy
    Cai, Jianfei
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (02) : 742 - 759
  • [6] Open-Vocabulary RGB-Thermal Semantic Segmentation
    Zhao, Guoqiang
    Huang, Junjie
    Yan, Xiaoyun
    Wang, Zhaojing
    Tang, Junwei
    Ou, Yangjun
    Hu, Xinrong
    Peng, Tao
    COMPUTER VISION - ECCV 2024, PT LXXIV, 2025, 15132 : 304 - 320
  • [7] Purify Then Guide: A Bi-Directional Bridge Network for Open-Vocabulary Semantic Segmentation
    Pan, Yuwen
    Sun, Rui
    Wang, Yuan
    Yang, Wenfei
    Zhang, Tianzhu
    Zhang, Yongdong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 343 - 356
  • [8] Generalization Boosted Adapter for Open-Vocabulary Segmentation
    Xu, Wenhao
    Wang, Changwei
    Feng, Xuxiang
    Xu, Rongtao
    Huang, Longzhao
    Zhang, Zherui
    Guo, Li
    Xu, Shibiao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 520 - 533
  • [9] FreeMix: Open-Vocabulary Domain Generalization of Remote-Sensing Images for Semantic Segmentation
    Wu, Jingyi
    Shi, Jingye
    Zhao, Zeyong
    Liu, Ziyang
    Zhi, Ruicong
    REMOTE SENSING, 2025, 17 (08)
  • [10] Open-Vocabulary Camouflaged Object Segmentation
    Pang, Youwei
    Zhao, Xiaoqi
    Zuo, Jiaming
    Zhang, Lihe
    Lu, Huchuan
    COMPUTER VISION - ECCV 2024, PT XLVII, 2025, 15105 : 476 - 495