TAG: Guidance-Free Open-Vocabulary Semantic Segmentation

被引:0
|
作者
Kawano, Yasufumi [1 ]
Aoki, Yoshimitsu [1 ]
机构
[1] Keio Univ, Grad Sch Integrated Design Engn, Yokohama, Kanagawa 2238522, Japan
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Semantic segmentation; Training; Databases; Annotations; Task analysis; Semantics; Vocabulary; Computer vision; Classification algorithms; open-vocabulary; zero-guidance;
D O I
10.1109/ACCESS.2024.3418210
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Semantic segmentation is a crucial task in computer vision, where each pixel in an image is classified into a category. However, traditional methods face significant challenges, including the need for pixel-level annotations and extensive training. Furthermore, because supervised learning uses a limited set of predefined categories, models typically struggle with rare classes and cannot recognize new ones. Unsupervised and open-vocabulary segmentation, proposed to tackle these issues, faces challenges, including the inability to assign specific class labels to clusters and the necessity of user-provided text queries for guidance. In this context, we propose a novel approach, TAG which achieves Training, Annotation, and Guidance-free open-vocabulary semantic segmentation. TAG utilizes pre-trained models such as CLIP and DINO to segment images into meaningful categories without additional training or dense annotations. It retrieves class labels from an external database, providing flexibility to adapt to new scenarios. Our TAG achieves state-of-the-art results on PascalVOC, PascalContext and ADE20K for open-vocabulary segmentation without given class names, i.e. improvement of +15.3 mIoU on PascalVOC.
引用
收藏
页码:88322 / 88331
页数:10
相关论文
共 50 条
  • [31] OV-NeRF: Open-Vocabulary Neural Radiance Fields With Vision and Language Foundation Models for 3D Semantic Understanding
    Liao, Guibiao
    Zhou, Kaichen
    Bao, Zhenyu
    Liu, Kanglin
    Li, Qing
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 12923 - 12936
  • [32] Open-Set Tattoo Semantic Segmentation
    Brilhador, Anderson
    da Silva, Rodrigo Tchalski
    Modinez-Junior, Carlos Roberto
    Spadafora, Gabriel de Almeida
    Lopes, Heitor Silverio
    Lazzaretti, Andre Eugenio
    IEEE ACCESS, 2024, 12 : 107181 - 107200
  • [33] Source-Free Open Compound Domain Adaptation in Semantic Segmentation
    Zhao, Yuyang
    Zhong, Zhun
    Luo, Zhiming
    Lee, Gim Hee
    Sebe, Nicu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 7019 - 7032
  • [34] Open-Vocabulary Category-Level Object Pose and Size Estimation
    Cai, Junhao
    He, Yisheng
    Yuan, Weihao
    Zhu, Siyu
    Dong, Zilong
    Bo, Liefeng
    Chen, Qifeng
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (09): : 7661 - 7668
  • [35] OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields With Fine-Grained Understanding
    Deng, Yinan
    Wang, Jiahui
    Zhao, Jingyu
    Dou, Jianyu
    Yang, Yi
    Yue, Yufeng
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (01): : 652 - 659
  • [36] Open-Vocabulary Keyword Spotting With Audio And Text Embeddings
    Sacchi, Niccolo
    Nanchen, Alexandre
    Jaggi, Martin
    Cernak, Milos
    INTERSPEECH 2019, 2019, : 3362 - 3366
  • [37] SegLD: Achieving universal, zero-shot and open-vocabulary segmentation through multimodal fusion via latent diffusion processes
    Zheng, Hongtao
    Ding, Yifei
    Wang, Zilong
    Huang, Xinyan
    INFORMATION FUSION, 2024, 111
  • [38] Open-Vocabulary Object Detection via Scene Graph Discovery
    Shi, Hengcan
    Hayat, Munawar
    Cai, Jianfei
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4012 - 4021
  • [39] DeTAL: Open-Vocabulary Temporal Action Localization With Decoupled Networks
    Li, Zhiheng
    Zhong, Yujie
    Song, Ran
    Li, Tianjiao
    Ma, Lin
    Zhang, Wei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 7728 - 7741
  • [40] OV-VG: A benchmark for open-vocabulary visual grounding
    Wang, Chunlei
    Feng, Wenquan
    Li, Xiangtai
    Cheng, Guangliang
    Lyu, Shuchang
    Liu, Binghao
    Chen, Lijiang
    Zhao, Qi
    NEUROCOMPUTING, 2024, 591