TAG: Guidance-Free Open-Vocabulary Semantic Segmentation

被引：0

作者：

Kawano, Yasufumi ^{[1
]}

Aoki, Yoshimitsu ^{[1
]}

机构：

[1] Keio Univ, Grad Sch Integrated Design Engn, Yokohama, Kanagawa 2238522, Japan

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Semantic segmentation; Training; Databases; Annotations; Task analysis; Semantics; Vocabulary; Computer vision; Classification algorithms; open-vocabulary; zero-guidance;

D O I：

10.1109/ACCESS.2024.3418210

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Semantic segmentation is a crucial task in computer vision, where each pixel in an image is classified into a category. However, traditional methods face significant challenges, including the need for pixel-level annotations and extensive training. Furthermore, because supervised learning uses a limited set of predefined categories, models typically struggle with rare classes and cannot recognize new ones. Unsupervised and open-vocabulary segmentation, proposed to tackle these issues, faces challenges, including the inability to assign specific class labels to clusters and the necessity of user-provided text queries for guidance. In this context, we propose a novel approach, TAG which achieves Training, Annotation, and Guidance-free open-vocabulary semantic segmentation. TAG utilizes pre-trained models such as CLIP and DINO to segment images into meaningful categories without additional training or dense annotations. It retrieves class labels from an external database, providing flexibility to adapt to new scenarios. Our TAG achieves state-of-the-art results on PascalVOC, PascalContext and ADE20K for open-vocabulary segmentation without given class names, i.e. improvement of +15.3 mIoU on PascalVOC.

引用

页码：88322 / 88331

页数：10

共 50 条

[21] CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation
Zhu, Wenqi
Cao, Jiale
Xie, Jin
Yang, Shuangming
Pang, Yanwei
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1098 - 1110
[22] Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation
Fang, Hao
Wu, Peng
Li, Yawei
Zhang, Xinxin
Lu, Xiankai
COMPUTER VISION - ECCV 2024, PT LXX, 2025, 15128 : 225 - 241
[23] OV-VIS: Open-Vocabulary Video Instance Segmentation
Wang, Haochen
Yan, Cilin
Chen, Keyan
Jiang, Xiaolong
Tang, Xu
Hu, Yao
Kang, Guoliang
Xie, Weidi
Gavves, Efstratios
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (11) : 5048 - 5065
[24] Expanding the Horizons: Exploring Further Steps in Open-Vocabulary Segmentation
Wang, Xihua
Ji, Lei
Yan, Kun
Sun, Yuchong
Song, Ruihua
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT X, 2024, 14434 : 407 - 419
[25] Open-Vocabulary Animal Keypoint Detection with Semantic-Feature Matching
Zhang, Hao
Xu, Lumin
Lai, Shenqi
Shao, Wenqi
Zheng, Nanning
Luo, Ping
Qiao, Yu
Zhang, Kaipeng
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (12) : 5741 - 5758
[26] Subword-Based Compact Reconstruction for Open-Vocabulary Neural Word Embeddings
Sasaki, Shota
Suzuki, Jun
Inui, Kentaro
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3551 - 3564
[27] Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
Xu, Yifan
Zhang, Mengdan
Yang, Xiaoshan
Xu, Changsheng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6253 - 6267
[28] Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation
Shao, Tong
Tian, Zhuotao
Zhao, Hang
Su, Jingyong
COMPUTER VISION - ECCV 2024, PT LXXXVI, 2025, 15144 : 139 - 156
[29] Depth Guidance and Intradomain Adaptation for Semantic Segmentation
Lu, Jiawen
Shi, Jinlong
Zhu, Haowei
Ni, Jun
Shu, Xin
Sun, Yunhan
Cheng, Zhigang
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
[30] Exploration of an Open Vocabulary Model on Semantic Segmentation for Street Scene Imagery
Zeng, Zichao
Boehm, Jan
ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2024, 13 (05)

← 1 2 3 4 5 →